Overview
This model, qwen_lawma_deepseek-2k-5x-majority_verified, is a fine-tuned variant of the Qwen/Qwen2.5-7B-Instruct base model. It features 7.6 billion parameters and supports a substantial context length of 131,072 tokens, making it suitable for processing extensive inputs.
Training Details
The model was fine-tuned using the mlfoundations-dev/thoughts-lawma-annotations-deepseek-majority-verified-share-gpt dataset. Key training hyperparameters included a learning rate of 1e-05, a total batch size of 16 (with 2 per device and 2 gradient accumulation steps), and 5.0 epochs. The optimizer used was adamw_torch with standard betas and epsilon, and a cosine learning rate scheduler with a 0.1 warmup ratio.
Potential Use Cases
Given its fine-tuning on a specific annotation dataset, this model is likely best suited for applications that involve:
- Processing or generating text related to the domain covered by the
thoughts-lawma-annotations-deepseek-majority-verified-share-gpt dataset. - Tasks requiring a large context window for understanding long-form content or complex interactions.
Limitations
The model description and intended uses sections in the original README indicate that more information is needed regarding its specific capabilities and limitations. Users should conduct further evaluation to determine its suitability for particular applications.