Right now, Gemini 1.5 Flash-8B, our newest Flash variant, is production-ready and comes with:
- 50% lower cost (in comparison with 1.5 Flash)
- 2x larger fee limits (in comparison with 1.5 Flash)
- Decrease latency on small prompts (in comparison with 1.5 Flash)
Builders can entry gemini-1.5-flash-8b free of charge by way of Google AI Studio and the Gemini API.
Our light-weight mannequin, smaller and quicker
At I/O, we introduced Gemini 1.5 Flash, our light-weight mannequin, optimized for velocity and effectivity. Over the previous few months, Google DeepMind has made appreciable progress making 1.5 Flash even better primarily based on developer suggestions and testing the bounds of what’s doable.
Final month, we launched an experimental model of Gemini 1.5 Flash-8B, a smaller and quicker variant of 1.5 Flash. We’re now excited to make it usually out there for production-use. Flash-8B almost matches the efficiency of the 1.5 Flash mannequin launched in Could throughout many benchmarks. It performs particularly nicely on duties akin to chat, transcription, and lengthy context language translation.
Our launch of greatest in school small fashions continues to learn by developer suggestions and our personal testing of what’s doable with these fashions. We see essentially the most potential for this mannequin in duties starting from excessive quantity multimodal use circumstances to lengthy context summarization duties.
Lowest value per intelligence of any Gemini mannequin
With the secure launch of Gemini 1.5 Flash-8B, we’re saying the lowest value per intelligence of any Gemini mannequin:
- $0.0375 per 1 million enter tokens on prompts <128K
- $0.15 per 1 million output tokens on prompts <128K
- $0.01 per 1 million tokens on cached prompts <128K
For builders on the paid tier, billing will begin on Monday October 14th.
This new value, together with the work we have now already finished to drive down developer prices with 1.5 Flash and 1.5 Pro, highlights our dedication to making sure builders have the liberty to construct the services that push the world ahead.
2x larger fee limits for Flash-8B
Gemini 1.5 Flash-8B is greatest suited for easy, larger quantity duties. To make this mannequin as helpful as we are able to, we’re doubling the 1.5 Flash-8B rate limits, which means builders can ship as much as 4,000 requests per minute (RPM).
Blissful constructing and keep tuned for extra updates!