Rakancorle1/qwen2.5-7b_Instruct_policy_traj_30k_full

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Aug 29, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

Rakancorle1/qwen2.5-7b_Instruct_policy_traj_30k_full is a 7.6 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct. It features a 32768 token context length and was specifically trained on the Policy_Traj_0826_30k_train dataset. This model is optimized for tasks related to policy trajectory generation or understanding, leveraging its specialized fine-tuning for enhanced performance in such domains.

Loading preview...

Model Overview

This model, Rakancorle1/qwen2.5-7b_Instruct_policy_traj_30k_full, is a 7.6 billion parameter instruction-tuned language model. It is a fine-tuned variant of the Qwen/Qwen2.5-7B-Instruct base model, developed by Rakancorle1. The model boasts a substantial context window of 32768 tokens, allowing it to process and generate longer sequences of text.

Key Differentiator

The primary distinction of this model lies in its specialized fine-tuning. It was trained on the Policy_Traj_0826_30k_train dataset, indicating an optimization for tasks involving policy trajectories. This targeted training suggests enhanced capabilities in understanding, generating, or reasoning about sequences of actions or policies.

Training Details

The fine-tuning process utilized specific hyperparameters, including a learning rate of 1e-05, a train_batch_size of 2, and gradient_accumulation_steps of 8, resulting in a total_train_batch_size of 64. The training ran for 3 epochs using a cosine learning rate scheduler with a warmup ratio of 0.1. The optimizer used was ADAMW_TORCH.

Potential Use Cases

Given its specialized training, this model is likely well-suited for applications requiring:

  • Policy Generation: Creating sequences of actions or decisions based on given contexts.
  • Trajectory Analysis: Interpreting and understanding existing policy trajectories.
  • Reinforcement Learning Research: Assisting in tasks related to policy learning and evaluation.

Limitations

As per the provided information, specific intended uses and limitations require further details. Users should evaluate its performance on their specific policy-related tasks.