mlfoundations-dev/b2_science_fasttext_pos_scp116k
The mlfoundations-dev/b2_science_fasttext_pos_scp116k model is a 7.6 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct. This model was specifically adapted using the mlfoundations-dev/b2_science_fasttext_pos_scp116k dataset. With a context length of 131,072 tokens, it is designed for tasks related to its fine-tuning data, though specific applications require further definition. Its primary strength lies in its specialized fine-tuning for particular scientific text processing.
Loading preview...
Overview
The mlfoundations-dev/b2_science_fasttext_pos_scp116k is a 7.6 billion parameter language model, derived from the Qwen/Qwen2.5-7B-Instruct architecture. It has been fine-tuned on the mlfoundations-dev/b2_science_fasttext_pos_scp116k dataset, indicating a specialization in tasks related to this specific data domain. The model supports a substantial context length of 131,072 tokens, allowing for processing of extensive inputs.
Key Characteristics
- Base Model: Qwen/Qwen2.5-7B-Instruct
- Parameter Count: 7.6 billion
- Context Length: 131,072 tokens
- Fine-tuning Dataset: mlfoundations-dev/b2_science_fasttext_pos_scp116k
- Training Hyperparameters: Utilized a learning rate of 4e-05, a total batch size of 128 (with 32 GPUs and 4 gradient accumulation steps), and a cosine learning rate scheduler with a 0.1 warmup ratio over 5 epochs.
Intended Use Cases
While specific intended uses and limitations are not detailed in the provided documentation, the model's fine-tuning on a specialized dataset suggests its utility in applications requiring deep understanding or generation within the scientific text processing domain. Developers should evaluate its performance on tasks aligned with the b2_science_fasttext_pos_scp116k dataset for optimal results. Further information is needed to fully define its capabilities and ideal applications.