@_𝐚𝐤𝐡𝐚𝐥𝐢𝐪 𝐭𝐰𝐞𝐞𝐭𝐞𝐝

Scaling Transformer to 1M tokens and beyond with RMT

Recurrent Memory Transformer retains information across up to 2 million tokens. 

During inference, the model effectively utilized memory for up to 4,096 segments with a total length of 2,048,000 tokens—significantly exceeding… https://twitter.com/i/web/status/1650308865555148800

@_𝐚𝐤𝐡𝐚𝐥𝐢𝐪 𝐭𝐰𝐞𝐞𝐭𝐞𝐝

Scaling Transformer to 1M tokens and beyond with RMT

Recurrent Memory Transformer retains information across up to 2 million tokens.

During inference, the model effectively utilized memory for up to 4,096 segments with a total length of 2,048,000 tokens—significantly exceeding… twitter.com/i/web/status/1650308865555148800


More photos from danielwilson