eve-esa/EVE-Instruct
EVE-Instruct is a 24 billion parameter instruction-tuned causal language model developed by eve-esa, fine-tuned from Mistral-Small-3.2-24B-Instruct-2506. It specializes in Earth Intelligence, with a particular emphasis on Earth Observation (EO) and Earth Science (ES) domains, while maintaining strong general capabilities. The model was trained using a unique strategy interleaving instruction fine-tuning and long-form text, incorporating both general-domain and synthetic EO/ES content. It excels in domain-specific tasks, outperforming its base model and others in MCQA, hallucination, and open-ended Earth Science questions, and supports a 32768 token context length.
Loading preview...
EVE-Instruct: Earth Intelligence Language Model
EVE-Instruct is a 24 billion parameter language model developed by eve-esa, specifically fine-tuned from Mistral-Small-3.2-24B-Instruct-2506. Its primary focus is on Earth Intelligence (EI), with deep expertise in Earth Observation (EO) and Earth Science (ES) domains, while preserving and even improving its general-purpose capabilities.
Key Capabilities & Training:
- Domain Specialization: Expert in EO and ES, achieved through a fine-tuning strategy that interleaves instruction fine-tuning (IFT) and long-form text, mixing general-domain replay data with synthetic EO/ES content.
- Data-Rich Training: Fine-tuned on approximately 33.5 billion tokens, comprising 30% long-form text (including raw corpus samples and synthetically generated text via an Active Reading pipeline) and 70% instruction-formatted text (ContextQA, SelfQA, LongContextQA, Multi-hop QA, and self-referential alignment prompts).
- Quality Control: Synthetic data generation involved a mix of high-quality models (e.g., Mistral Large 3, GPT-4o Mini) with LLM-based judges ensuring domain relevance, factual quality, and grounding.
- Alignment: Utilizes Online Direct Preference Optimization (Online DPO) for refining formatting, stylistic consistency, and preference adherence, preserving domain knowledge while enhancing interaction quality.
- Context Length: Supports a substantial 32768 token context window.
Performance Highlights:
EVE-Instruct demonstrates significant improvements in domain-specific benchmarks compared to its base model, Mistral Small 3.2, and other models in its size range:
- Domain-Specific: Achieves 86.12% IoU and 77.73% Accuracy on MCQA Multiple, 96.35% Accuracy on MCQA Single, and 84.70% F1 for Hallucination tasks, outperforming Mistral Small 3.2 across these metrics.
- General Capabilities: Maintains or slightly improves general capabilities, showing a +1.8% overall increase compared to Mistral Small 3.2 across categories like Math & Reasoning, Coding, Knowledge, Tool Calling, Instruction Following, and Chat Quality.
Good for:
- Applications requiring deep understanding and generation of content related to Earth Observation and Earth Science.
- Tasks involving complex question answering and information extraction within environmental and geospatial contexts.
- Developers seeking a specialized model that also retains strong general-purpose LLM functionalities like reasoning and instruction following.