mlfoundations-dev/qwen_lawma_filtered_deepseek-2k-5x
The mlfoundations-dev/qwen_lawma_filtered_deepseek-2k-5x is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B-Instruct. This model leverages a 32,768 token context length and is specifically adapted using the mlfoundations-dev/lawma-annotations-deepseek-2k-5x-deepseek-verified-share-gpt dataset. It is designed for tasks benefiting from its specialized fine-tuning on this particular dataset, offering enhanced performance for use cases aligned with its training data.
Loading preview...
Overview
This model, mlfoundations-dev/qwen_lawma_filtered_deepseek-2k-5x, is a 7.6 billion parameter language model. It is a fine-tuned variant of the established Qwen/Qwen2.5-7B-Instruct architecture, indicating a strong foundation in general language understanding and instruction following.
Key Characteristics
- Base Model: Fine-tuned from Qwen/Qwen2.5-7B-Instruct.
- Parameter Count: 7.6 billion parameters.
- Context Length: Supports a substantial context window of 32,768 tokens.
- Training Data: Specialized fine-tuning on the
mlfoundations-dev/lawma-annotations-deepseek-2k-5x-deepseek-verified-share-gptdataset.
Training Details
The model underwent training with specific hyperparameters:
- Learning Rate: 1e-05
- Batch Size: A total training batch size of 16 (2 per device with 4 devices and 2 gradient accumulation steps).
- Optimizer: AdamW with cosine learning rate scheduler and 0.1 warmup ratio.
- Epochs: Trained for 5.0 epochs.
Potential Use Cases
Given its fine-tuning on a specific dataset, this model is likely best suited for applications that align with the characteristics and content of the lawma-annotations-deepseek-2k-5x-deepseek-verified-share-gpt dataset. Developers should evaluate its performance for tasks requiring specialized knowledge or patterns present in its training data.