mlfoundations-dev/qwen_openthoughts_science_claude
The mlfoundations-dev/qwen_openthoughts_science_claude model is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B-Instruct. It was trained on the mlfoundations-dev/open-thoughts-science-claude dataset, suggesting a specialization in scientific reasoning or discourse. With a context length of 32768 tokens, it is designed for processing extensive scientific texts and complex problem-solving, making it suitable for research assistance and knowledge extraction in scientific domains.
Loading preview...
Overview
This model, mlfoundations-dev/qwen_openthoughts_science_claude, is a 7.6 billion parameter language model derived from the Qwen2.5-7B-Instruct architecture. It has been specifically fine-tuned using the mlfoundations-dev/open-thoughts-science-claude dataset.
Key Capabilities
- Specialized Fine-tuning: Optimized through fine-tuning on a science-focused dataset, indicating potential strengths in scientific understanding and generation.
- Base Model Heritage: Benefits from the robust capabilities of the Qwen2.5-7B-Instruct base model.
- Extensive Context Window: Supports a context length of 32768 tokens, enabling the processing and analysis of long scientific documents and complex information.
Good For
- Scientific Research: Assisting with tasks requiring deep understanding or generation of scientific content.
- Knowledge Extraction: Extracting information from lengthy scientific papers or technical reports.
- Complex Problem Solving: Engaging with intricate scientific problems that benefit from a large context window.
Training Details
The model was trained with a learning rate of 8e-05, a total batch size of 512 (achieved with gradient accumulation steps of 32), and for 3 epochs. The optimizer used was adamw_torch with a cosine learning rate scheduler and a warmup ratio of 0.1.