Scaling Large Language Models: From Power Law to Sparsity by 周彦祺 – 2023北京智源大会 -YouTube
As part of his presentation, the speaker discusses T5 unifying the text transformer, Moe architectures, and advanced Moe techniques, as well as sparsity in large language models developed at Google Brain.








