mlfoundations-dev/b2_science_fasttext_pos_scp116k

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 23, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The mlfoundations-dev/b2_science_fasttext_pos_scp116k model is a 7.6 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct. This model was specifically adapted using the mlfoundations-dev/b2_science_fasttext_pos_scp116k dataset. With a context length of 131,072 tokens, it is designed for tasks related to its fine-tuning data, though specific applications require further definition. Its primary strength lies in its specialized fine-tuning for particular scientific text processing.

Loading preview...

Overview

The mlfoundations-dev/b2_science_fasttext_pos_scp116k is a 7.6 billion parameter language model, derived from the Qwen/Qwen2.5-7B-Instruct architecture. It has been fine-tuned on the mlfoundations-dev/b2_science_fasttext_pos_scp116k dataset, indicating a specialization in tasks related to this specific data domain. The model supports a substantial context length of 131,072 tokens, allowing for processing of extensive inputs.

Key Characteristics

  • Base Model: Qwen/Qwen2.5-7B-Instruct
  • Parameter Count: 7.6 billion
  • Context Length: 131,072 tokens
  • Fine-tuning Dataset: mlfoundations-dev/b2_science_fasttext_pos_scp116k
  • Training Hyperparameters: Utilized a learning rate of 4e-05, a total batch size of 128 (with 32 GPUs and 4 gradient accumulation steps), and a cosine learning rate scheduler with a 0.1 warmup ratio over 5 epochs.

Intended Use Cases

While specific intended uses and limitations are not detailed in the provided documentation, the model's fine-tuning on a specialized dataset suggests its utility in applications requiring deep understanding or generation within the scientific text processing domain. Developers should evaluate its performance on tasks aligned with the b2_science_fasttext_pos_scp116k dataset for optimal results. Further information is needed to fully define its capabilities and ideal applications.