google PaLM2

In "PaLM: Scaling Language Modeling with Pathways", we introduce the Pathways Language Model (PaLM), a 540-billion parameter, dense decoder-only Transformer model trained with the Pathways system.
PaLM was trained using a combination of English and multilingual datasets that include high-quality web documents, books, Wikipedia, conversations, and GitHub code.
In addition to English NLP tasks, PaLM also shows strong performance on multilingual NLP benchmarks, including translation, even though only 22% of the training corpus is non-English.

a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM
PaLM 2 is a Transformer-based model trained using a mixture of objectives similar to UL2
Compute-optimal scaling: Recently, compute-optimal scaling (Hoffmann et al., 2022) showed that data size is at least as important as model size
Improved dataset mixtures: We designed a more multilingual and diverse pre-training mixture, which extends across hundreds of languages and domains (e.g., programming languages, mathematics, and parallel multilingual documents).
Architectural and objective improvements: Past LLMs have almost exclusively used a single causal or masked language modeling objective. Given the strong results of UL2 (Tay et al., 2023), we use a tuned mixture of different pre-training objectives in this model to train the model to understand different aspects of language.
The largest model in the PaLM 2 family, PaLM 2-L, is significantly smaller than the largest PaLM model but uses more training compute

These results suggest that model scaling is not the only way to improve performance.

adagio33