And new paper out: Pleias 1.0: the First Family of Language Models Trained on Fully Open Data
How we train an open everything model on a new pretraining environment with releasable data (Common Corpus) with an open source framework (Nanotron from HuggingFace).
www.sciencedirect.com/science/arti...