google PaLM2
참고 문서:
- Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance (구글 블로그 영문)
- PaLM 2 Technical Report (구글 문서 영문)
참고 문서에 있는 내용
- google PaLM
- In "PaLM: Scaling Language Modeling with Pathways", we introduce the Pathways Language Model (PaLM), a 540-billion parameter, dense decoder-only Transformer model trained with the Pathways system.
- PaLM was trained using a combination of English and multilingual datasets that include high-quality web documents, books, Wikipedia, conversations, and GitHub code.
- In addition to English NLP tasks, PaLM also shows strong performance on multilingual NLP benchmarks, including translation, even though only 22% of the training corpus is non-English.
- google PaLM2
- a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM
- PaLM 2 is a Transformer-based model trained using a mixture of objectives similar to UL2
- Compute-optimal scaling: Recently, compute-optimal scaling (Hoffmann et al., 2022) showed that data size is at least as important as model size
- Improved dataset mixtures: We designed a more multilingual and diverse pre-training mixture, which extends across hundreds of languages and domains (e.g., programming languages, mathematics, and parallel multilingual documents).
- Architectural and objective improvements: Past LLMs have almost exclusively used a single causal or masked language modeling objective. Given the strong results of UL2 (Tay et al., 2023), we use a tuned mixture of different pre-training objectives in this model to train the model to understand different aspects of language.
- The largest model in the PaLM 2 family, PaLM 2-L, is significantly smaller than the largest PaLM model but uses more training compute
- These results suggest that model scaling is not the only way to improve performance.
댓글