One of the world’s largest AI training datasets is about to get bigger and ‘substantially better’

Massive AI training datasets, or corpora, have been called “the backbone of large language models.” But EleutherAI, the organization that created one of the world’s largest of these datasets, an 825 GB open-sourced diverse text corpora called the Pile, became a target in 2023 amid a growing uproar …


http://dlvr.it/T1GrDR

Comments

Popular posts from this blog

Exciting news from OpenAI! ChatGPT Plus subscribers will now have access to ChatGPT Plugins. This beta release allows for internet access and integration with over 70 third-party plugins. Read more about why this is a game-changer here: