DCAgent/a1-agenttuning_os
DCAgent/a1-agenttuning_os is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. This model is specifically adapted using the /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--neulab-agenttuning-os-sandboxes_glm_4.7_traces_jupiter/snapshots/35dba1baa2452dce3610c03fc7e8567135ed2fd8_thinking_preprocessed dataset. It is designed for tasks related to agent tuning within operating system sandboxes, leveraging its 32768 token context length for processing extensive interaction traces.
Loading preview...
Overview
DCAgent/a1-agenttuning_os is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B architecture. This model has been specialized through supervised fine-tuning (SFT) on a unique dataset derived from neulab-agenttuning-os-sandboxes_glm_4.7_traces_jupiter, specifically focusing on preprocessed 'thinking' traces.
Training Details
The model was trained with a learning rate of 4e-05 over 7 epochs, utilizing a multi-GPU setup with 16 devices and a total training batch size of 16. It employed the AdamW optimizer with cosine learning rate scheduling and a warmup ratio of 0.1. The training leveraged Transformers 4.57.6, Pytorch 2.9.1+cu130, Datasets 4.7.0, and Tokenizers 0.22.2.
Potential Use Cases
Given its fine-tuning on agent-tuning data, this model is likely intended for applications involving:
- Agent behavior analysis: Understanding and predicting agent actions within simulated or real operating system environments.
- Automated agent development: Assisting in the creation or refinement of AI agents.
- Trace-based reasoning: Processing and interpreting complex interaction logs from agent sandboxes.