Name: tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0002500 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tokyotech-llm

Model Overview

This model, developed by tokyotech-llm, is a continual pre-training of the Llama-3.1-8B architecture. It was specifically designed to establish a baseline performance for unfiltered Python code from The-Stack-v2 within the SwallowCode ablation experiments. The model demonstrates baseline capabilities in code generation tasks, such as HumanEval and HumanEval+, and retains its general proficiency across various knowledge, reasoning, and common sense benchmarks.

Key Training Details

Base Model: Llama-3.1-8B
Total Pretraining Tokens: 50 billion
Data Mix: 16% Python code (from The-Stack-v2-train-smol-ids, SwallowCode Experiment 1) and 84% multilingual text (including Japanese and English corpora).
Sequence Length: 8,192 tokens
Training Framework: Megatron-LM (version core_r0.9.0)
Hardware: Trained on 64 NVIDIA H100 GPUs on the TSUBAME supercomputer.

Evaluation and Use Cases

The model was evaluated using lm-evaluation-harness and BigCodeBench across a range of benchmarks, including code generation (HumanEval, HumanEval+) and general tasks (e.g., MMLU, GSM8K, HellaSwag). It serves as a crucial reference point for comparing the impact of subsequent ablation experiments within the SwallowCode research pipeline, particularly for understanding the performance of models trained on specific code data subsets. Developers can use this model for code generation tasks, especially when exploring the effects of different code training data compositions.

Overview

Model Overview

Key Training Details

Evaluation and Use Cases

Full Model Card (README)