tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0002500
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Oct 19, 2024License:llama3.3Architecture:Transformer Cold

The tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0002500 model is an 8 billion parameter Llama-3.1 architecture continually pre-trained by tokyotech-llm with a 32K context length. It was trained on 50 billion tokens, including a 16% Python code subset from The-Stack-v2 and 84% multilingual text. This model serves as a baseline for evaluating unfiltered Python code performance in code generation tasks while maintaining general language capabilities.

Loading preview...