google PaLM2

참고 문서: 

참고 문서에 있는 내용 

  • google PaLM 
    • In "PaLM: Scaling Language Modeling with Pathways", we introduce the Pathways Language Model (PaLM), a 540-billion parameter, dense decoder-only Transformer model trained with the Pathways system. 
    • PaLM was trained using a combination of English and multilingual datasets that include high-quality web documents, books, Wikipedia, conversations, and GitHub code.
    • In addition to English NLP tasks, PaLM also shows strong performance on multilingual NLP benchmarks, including translation, even though only 22% of the training corpus is non-English.
  • google PaLM2 
    • a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM
    • PaLM 2 is a Transformer-based model trained using a mixture of objectives similar to UL2
    • Compute-optimal scaling: Recently, compute-optimal scaling (Hoffmann et al., 2022) showed that data size is at least as important as model size
    • Improved dataset mixtures: We designed a more multilingual and diverse pre-training mixture, which extends across hundreds of languages and domains (e.g., programming languages, mathematics, and parallel multilingual documents).
    • Architectural and objective improvements: Past LLMs have almost exclusively used a single causal or masked language modeling objective. Given the strong results of UL2 (Tay et al., 2023), we use a tuned mixture of different pre-training objectives in this model to train the model to understand different aspects of language.
    • The largest model in the PaLM 2 family, PaLM 2-L, is significantly smaller than the largest PaLM model but uses more training compute
      • These results suggest that model scaling is not the only way to improve performance.

댓글

이 블로그의 인기 게시물

utf-8과 utf8