marin-community/marin-8b-instruct

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 14, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Marin 8B Instruct is an 8 billion parameter instruction-tuned causal language model developed by the Marin team at Stanford CRFM, built on the Llama 3 architecture with a 32768 token context length. It is fine-tuned on a diverse set of instruction datasets, including those focused on code, reasoning, and mathematics. The model demonstrates strong performance across various benchmarks, often outperforming other 7-8B models in its class, making it suitable for applications requiring robust instruction following and analytical capabilities.

Loading preview...

Marin 8B Instruct: An Overview

Marin 8B Instruct is an 8 billion parameter instruction-tuned model developed by the Marin team at Stanford CRFM, leveraging the Llama 3 architecture. It is an SFT-only model, fine-tuned on a comprehensive mix of datasets to enhance its instruction-following capabilities.

Key Capabilities & Training

  • Architecture: Based on the Llama 3 8B architecture, ensuring compatibility with standard Hugging Face Transformers libraries.
  • Tokenizer: Utilizes a variant of the Llama 3 tokenizer, stanford-crfm/marin-tokenizer, which includes a bundled chat template.
  • Instruction Tuning: Trained on diverse SFT datasets such as AceCode-89K, Bespoke-Stratos-17k, dolphin-r1 (including reasoning subsets), natural_reasoning, OpenThoughts-114k-math, smoltalk, tulu-3-sft-mixture, and verifiable-math-problems.
  • Pre-training: The base model underwent extensive pre-training across multiple phases (Kestrel, Ocelot, Jellyfish, Phoenix, Starling, Deeper Starling) on datasets like Nemotron-CC, DCLM Baseline, Starcoder Data, Proofpile 2, FineMath, Dolma, and custom Marin Markdownified datasets (StackExchange, Wikipedia, Ar5iv).
  • Performance: Marin 8B Base demonstrates competitive performance against models like Llama 3.1 8B, OLMo 2 7B, and MAP NEO 7B on LM Eval Harness benchmarks, often achieving higher average scores and excelling in tasks like ARC Easy, ARC Challenge, BBH, and MMLU.

Considerations for Use

  • Safety: Marin 8B has not undergone specific safety tuning or evaluation. Users should exercise caution and consider potential risks, as the model can generate harmful or sensitive content, and its responses may require verification.
  • Intended Use: This model is not intended for fully autonomous use and should be deployed with appropriate safeguards and human oversight.

For more detailed information on the pre-training process, refer to the technical retrospective.