deval-core/base-eval

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kLicense:llama3.1Architecture:Transformer0.0K Warm

The deval-core/base-eval is an 8 billion parameter instruction-tuned model from Meta's Llama 3.1 collection, featuring an optimized transformer architecture and a 128k token context length. Developed by Meta, this model is designed for multilingual dialogue use cases, excelling in assistant-like chat and outperforming many open-source and closed chat models on common benchmarks. It supports multiple languages including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, and is optimized for tool use and code generation.

Loading preview...

Llama 3.1 8B Instruct: Multilingual Dialogue and Tool Use

This model is an 8 billion parameter instruction-tuned variant from Meta's Llama 3.1 family, released on July 23, 2024. It leverages an optimized transformer architecture and has been fine-tuned using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. A key differentiator is its 128k token context length, significantly larger than many comparable models, and its strong performance across various benchmarks.

Key Capabilities

  • Multilingual Support: Optimized for English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, with potential for fine-tuning in other languages.
  • Enhanced Instruction Following: Outperforms previous Llama 3 8B Instruct on MMLU (69.4% vs 68.5%), MMLU (CoT) (73.0% vs 65.3%), and IFEval (80.4% vs 76.8%).
  • Code Generation: Achieves 72.6% on HumanEval pass@1, a notable improvement over Llama 3 8B Instruct's 60.4%.
  • Mathematical Reasoning: Shows significant gains on MATH (CoT) with 51.9% final_em, up from 29.1%.
  • Advanced Tool Use: Demonstrates strong performance in tool use benchmarks like API-Bank (82.6%) and BFCL (76.1%), indicating robust function calling capabilities.

Good For

  • Assistant-like Chat Applications: Its instruction-tuned nature and multilingual capabilities make it suitable for building interactive chatbots.
  • Multilingual Applications: Ideal for use cases requiring understanding and generation in the 8 explicitly supported languages.
  • Code Generation and Development: Strong performance in coding benchmarks suggests utility for programming assistance.
  • Tool-Integrated Systems: Designed to integrate with external tools and APIs, enabling more complex agentic behaviors.
  • Research and Commercial Use: Intended for a broad range of applications under the Llama 3.1 Community License.