agentlans/Llama3.1-SuperDeepFuse-CrashCourse12K

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jan 24, 2025License:llama3.1Architecture:Transformer0.0K Cold

agentlans/Llama3.1-SuperDeepFuse-CrashCourse12K is an 8B parameter multilingual instruction-tuned language model, fine-tuned from Llama3.1-SuperDeepFuse. It is optimized for enhanced multi-task reasoning, mathematics, and coding tasks, building on 12,000 samples from high-quality instruct datasets. This model aims to provide improved instruction-following abilities within its 8B parameter size.

Loading preview...

Model Overview

Llama3.1-SuperDeepFuse-CrashCourse12K is an 8 billion parameter, multilingual, instruction-tuned language model developed by agentlans. It is based on the Llama3.1-SuperDeepFuse model and has been further fine-tuned using 12,000 samples from the agentlans/crash-course dataset, which aggregates data from 10 high-quality instruct datasets.

Key Capabilities

  • Enhanced Multi-task Reasoning: Designed to improve performance across various complex tasks.
  • Mathematics and Coding: Shows improved capabilities in mathematical problem-solving and code generation.
  • Instruction Following: Aims for better adherence to given instructions.

Training Details

The model was fine-tuned using LoRA (Low-Rank Adaptation) with a maximum sequence length of 2048. The training involved 1 epoch, utilizing techniques like 4-bit quantization (bitsandbytes), BF16 precision, NEFTune, and RS-LoRA to optimize performance and efficiency.

Performance and Limitations

While this 8B model offers improved reasoning and instruction-following, its performance may be limited compared to larger models. Users should be aware that it can produce misleading or incorrect outputs, and verification of results is recommended for critical applications.

Evaluation Results

According to the Open LLM Leaderboard, the model achieved an Average score of 27.93%. Specific metrics include:

  • IFEval (0-Shot): 71.87%
  • BBH (3-Shot): 31.83%
  • MATH Lvl 5 (4-Shot): 17.67%
  • MMLU-PRO (5-shot): 29.24%

For more detailed results, refer to the Open LLM Leaderboard.