Llama 3 is Meta’s most succesful openly-available LLM so far and the recently-released Llama 3.1 will allow new workflows, similar to artificial information era and mannequin distillation with unmatched flexibility, management, and state-of-the-art capabilities that rival the very best closed supply fashions.
At AI Infra @ Scale 2024, Meta engineers mentioned each step of how we constructed and introduced Llama 3 to life, from information and coaching to inference.
Joe Spisak, Product Director and Head of Generative AI Open Supply at Meta, talks concerning the historical past of Llama and Meta’s overarching imaginative and prescient for open supply AI.
He’s joined by Delia David, a software program engineer at Meta, to debate all issues data-related for GenAI. David covers the variety, quantity, and freshness of knowledge wanted for GenAI and the way completely different information varieties needs to be extracted and ready.
Kaushik Veeraraghavan, a software program engineer at Meta, discusses how Meta trains Llama at scale and delves into the information heart, networking, and software program investments which have enabled the event of Meta’s Llama 3 fashions.
Lastly, Ye (Charlotte) Qi, a manufacturing engineer at Meta, discusses how Meta handles inference for Llama. Optimizing and scaling LLM inference is essential for enabling large-scale product purposes. Qi introduces key parallelism methods that assist scale mannequin sizes and context home windows, which in flip affect inference system designs. She additionally discusses sensible challenges related to deploying these advanced serving paradigms all through Meta’s inside cloud to our information heart of heterogeneous {hardware}.