mlfoundations-dev/qwen_lawma_filtered_deepseek-2k-5x

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Mar 16, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The mlfoundations-dev/qwen_lawma_filtered_deepseek-2k-5x is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B-Instruct. This model leverages a 32,768 token context length and is specifically adapted using the mlfoundations-dev/lawma-annotations-deepseek-2k-5x-deepseek-verified-share-gpt dataset. It is designed for tasks benefiting from its specialized fine-tuning on this particular dataset, offering enhanced performance for use cases aligned with its training data.

Loading preview...

Overview

This model, mlfoundations-dev/qwen_lawma_filtered_deepseek-2k-5x, is a 7.6 billion parameter language model. It is a fine-tuned variant of the established Qwen/Qwen2.5-7B-Instruct architecture, indicating a strong foundation in general language understanding and instruction following.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen2.5-7B-Instruct.
  • Parameter Count: 7.6 billion parameters.
  • Context Length: Supports a substantial context window of 32,768 tokens.
  • Training Data: Specialized fine-tuning on the mlfoundations-dev/lawma-annotations-deepseek-2k-5x-deepseek-verified-share-gpt dataset.

Training Details

The model underwent training with specific hyperparameters:

  • Learning Rate: 1e-05
  • Batch Size: A total training batch size of 16 (2 per device with 4 devices and 2 gradient accumulation steps).
  • Optimizer: AdamW with cosine learning rate scheduler and 0.1 warmup ratio.
  • Epochs: Trained for 5.0 epochs.

Potential Use Cases

Given its fine-tuning on a specific dataset, this model is likely best suited for applications that align with the characteristics and content of the lawma-annotations-deepseek-2k-5x-deepseek-verified-share-gpt dataset. Developers should evaluate its performance for tasks requiring specialized knowledge or patterns present in its training data.