

MPT-7B Base is a decoder-style transformer with 6.7B parameters. Today, we are releasing the base MPT model and three other finetuned variants that demonstrate the many ways of building on this base model: MPT-7B Base: We rigorously evaluated MPT on a range of benchmarks, and MPT met the high quality bar set by LLaMA-7B. Equipped with highly efficient open-source training code.Optimized for fast training and inference (via FlashAttention and FasterTransformer).Prepared to handle extremely long inputs thanks to ALiBi (we trained on up to 65k inputs and can handle up to 84k vs.300B for Pythia, 300B for OpenLLaMA, and 800B for StableLM). Trained on a large amount of data (1T tokens like LLaMA vs.Licensed for commercial use (unlike LLaMA).For inspiration, we are also releasing three finetuned models in addition to the base MPT-7B: MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-StoryWriter-65k+, the last of which uses a context length of 65k tokens!

Now you can train, finetune, and deploy your own private MPT models, either starting from one of our checkpoints or training from scratch. Today, we at MosaicML are releasing a new model series called MPT (MosaicML Pretrained Transformer) to address the limitations of the above models and finally provide a commercially-usable, open-source model that matches (and - in many ways - surpasses) LLaMA-7B. This has led to a flurry of activity centered on open-source LLMs, such as the LLaMA series from Meta, the Pythia series from EleutherAI, the StableLM series from StabilityAI, and the OpenLLaMA model from Berkeley AI Research. Large language models (LLMs) are changing the world, but for those outside well-resourced industry labs, it can be extremely difficult to train and deploy these models.
