CryCryCry1231/llama-3.2-1B-instruct-sft
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kLicense:llama3.2Architecture:Transformer Warm

CryCryCry1231/llama-3.2-1B-instruct-sft is a 1 billion parameter instruction-tuned causal language model, fine-tuned from Meta Llama-3.2-1B-Instruct. This model has a context length of 32768 tokens and is specifically optimized for instruction following, having been fine-tuned on the Magicoder-Evol-Instruct-110K dataset. It is designed for tasks requiring adherence to given instructions, particularly in code-related contexts.

Loading preview...

Model Overview

CryCryCry1231/llama-3.2-1B-instruct-sft is a 1 billion parameter instruction-tuned model, building upon the meta-llama/Llama-3.2-1B-Instruct base. It features a substantial context length of 32768 tokens, making it suitable for processing longer inputs and maintaining conversational context.

Key Capabilities

  • Instruction Following: The model has undergone Supervised Fine-Tuning (SFT) using the Magicoder-Evol-Instruct-110K dataset, enhancing its ability to understand and execute instructions.
  • Code-Related Tasks: Given its training on a dataset known for code instruction, this model is likely to perform well in tasks involving code generation, explanation, or modification based on instructions.
  • Efficient Performance: As a 1 billion parameter model, it offers a balance between capability and computational efficiency, making it suitable for applications where larger models might be too resource-intensive.

Training Details

The model was trained with a learning rate of 5e-05, a batch size of 63, and for 3 epochs. It utilized the adamw_torch optimizer and linear learning rate scheduler, with mixed-precision training (Native AMP) for efficiency. The training leveraged Transformers 4.48.3 and Pytorch 2.6.0+cu124.