Rakancorle1/qwen2.5-3b_Instruct_policy_traj_30k_full

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Sep 5, 2025License:otherArchitecture:Transformer Warm

Rakancorle1/qwen2.5-3b_Instruct_policy_traj_30k_full is a 3.1 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-3B-Instruct. This model has been specialized through further training on the Policy_Traj_0826_30k_train dataset. It is designed for tasks requiring adherence to specific policy trajectories, leveraging its Qwen2.5 architecture and a 32K context length.

Loading preview...

Model Overview

This model, Rakancorle1/qwen2.5-3b_Instruct_policy_traj_30k_full, is a specialized instruction-tuned language model based on the Qwen2.5-3B-Instruct architecture. It features approximately 3.1 billion parameters and a context length of 32,768 tokens, making it suitable for processing moderately long sequences.

Key Specialization

The primary differentiator for this model is its fine-tuning on the Policy_Traj_0826_30k_train dataset. This targeted training suggests an optimization for tasks that involve understanding, generating, or following specific policy-related trajectories or sequences of actions.

Training Details

The fine-tuning process utilized the following hyperparameters:

  • Learning Rate: 1e-05
  • Batch Size: 2 (train), 8 (eval)
  • Gradient Accumulation: 8 steps, leading to a total effective batch size of 64
  • Optimizer: AdamW with cosine learning rate scheduler and 0.1 warmup ratio
  • Epochs: 3.0

Potential Use Cases

Given its fine-tuning on a policy trajectory dataset, this model could be particularly useful for applications such as:

  • Generating responses or actions that align with predefined policies.
  • Simulating policy-driven behaviors.
  • Analyzing sequences of events in the context of specific policies.

Limitations

As indicated in the original model card, more information is needed regarding its intended uses, limitations, and detailed training/evaluation data. Users should conduct thorough testing for their specific applications.