PARZ2344/web_llama_sft_random
PARZ2344/web_llama_sft_random is a 3.2 billion parameter language model fine-tuned from Meta Llama 3.2-3B-Instruct. It was trained on the deep_research_25 dataset, distinguishing it as a specialized instruction-tuned model. This model is designed for tasks aligned with its specific fine-tuning data, offering a compact yet capable solution for targeted applications. Its 32768 token context length supports processing extensive inputs.
Loading preview...
Model Overview
PARZ2344/web_llama_sft_random is a 3.2 billion parameter language model, fine-tuned from the meta-llama/Llama-3.2-3B-Instruct base model. This instruction-tuned variant was trained on the deep_research_25 dataset, indicating its specialization for tasks related to the characteristics of this specific data. The model was trained using a learning rate of 1e-05 over 3 epochs, with a total batch size of 64 across 8 GPUs, utilizing a cosine learning rate scheduler with a 0.1 warmup ratio.
Key Characteristics
- Base Model: Fine-tuned from Meta Llama 3.2-3B-Instruct.
- Parameter Count: 3.2 billion parameters, offering a balance between performance and computational efficiency.
- Context Length: Supports a substantial context window of 32768 tokens.
- Training Data: Specialized fine-tuning on the
deep_research_25dataset. - Training Configuration: Utilized AdamW optimizer, multi-GPU distributed training, and gradient accumulation for stable and efficient learning.
Intended Use Cases
This model is particularly suited for applications that align with the nature of the deep_research_25 dataset it was fine-tuned on. Developers should consider its specific training data and instruction-following capabilities when evaluating its suitability for their tasks. Its compact size and substantial context length make it a viable option for scenarios requiring efficient processing of detailed instructions or long-form content within its specialized domain.