this feels like a very big deal
2 trillion tokens of permissively licensed text & code, so you can train (actually) open LLMs
and data acquisition is one of the more expensive & complex aspects of training an LLM, so hopefully we see an acceleration
huggingface.co/blog/Pclangl...
More like this
×