Meta has released Llama3 LLM model. https://ai.meta.com/blog/meta-llama-3/
Tech talk from Meta about Llama3 posted on YouTube:https://www.youtube.com/watch?v=r3DC_gjFCSA
Its 70B instruct model came up ahead in head to head human evaluation against Claude Sonnet, Mistral Medium and GPT 3.5, and its predecessor llama2.
Llama3 improved in reasoning and code generation.
Meta has also created web UI to allow anyone to try out llama3 for free at https://www.meta.ai.
Llama3 is a decoder-only transformer. It has a token vocabulary of 128k tokens. It uses grouped query attention (GQA). It is trained on sequences of 8,192 tokens.
LLama3 is trained on high quality dataset with over 15T tokens or 7x bigger than Llama2. For coding it is 4x more than Llama2. Llama2 generated the training data for the text quality classifiers. These are used to predict data quality so it can filter out low quality data sources. The Llama3 team observed that as after training data exceed 200B tokens by 100x, model performance continues to get better.
Llama3 is trained on 2 24k GPU clusters. Meta improved training efficiency 2x vs Llama2.
Llama3 8B inference is more efficient than Llama2 7B. Llama3 uses 15% fewer tokens an 8B model also has Group Query Attention (GQA).
LLama3 400B parameter model is still being trained and will be released in coming months.
Meta Llama3 license allows anyone to use its model. For commercial use with 700M monthly users requires permission from Meta. https://github.com/meta-llama/llama3/blob/main/LICENSE