There’s a race in direction of language fashions with longer context home windows. However how good are they, and the way can we all know?
This text was initially printed on Art Fish Intelligence.
The context window of enormous language fashions — the quantity of textual content they’ll course of without delay — has been rising at an exponential price.
In 2018, language fashions like BERT, T5, and GPT-1 may take as much as 512 tokens as enter. Now, in summer time of 2024, this quantity has jumped to 2 million tokens (in publicly obtainable LLMs). However what does this imply for us, and the way can we consider these more and more succesful fashions?
The just lately launched Gemini 1.5 Pro model can take in up to 2 million tokens. However what does 2 million tokens even imply?
If we estimate 4 phrases to roughly equal about 3 tokens, it implies that 2 million tokens can (nearly) match your complete Harry Potter and Lord of the Ring sequence.
(The entire phrase depend of all seven books within the Harry Potter sequence is 1,084,625. The entire phrase depend of all seven books within the Lord of the Ring sequence is 481,103. (1,084,625 +…