Name: DCAgent/g1_min_episodes_sampled_swesmith_psu API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: DCAgent

Overview

This model, DCAgent/g1_min_episodes_sampled_swesmith_psu, is an 8 billion parameter language model built upon the Qwen3-8B architecture developed by Qwen. It has been fine-tuned using a specialized dataset sourced from /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--g1_min_episodes_e1_gpt_long_sampled_swesmith_psu_d1_original_40k_glm47_traces/snapshots/857b3ce8060050ded9af40dc129460f566d0c635_thinking_preprocessed.

Training Details

The fine-tuning process involved a learning rate of 4e-05 and a total training batch size of 16 across 16 GPUs. The optimizer used was ADAMW_TORCH_FUSED with specific beta and epsilon values, and a cosine learning rate scheduler with a warmup ratio of 0.1 was applied over 7.0 epochs. The training utilized Transformers 4.57.6, Pytorch 2.9.1+cu130, and Datasets 4.7.0.

Key Characteristics

Base Model: Qwen3-8B
Parameter Count: 8 billion
Context Length: 32768 tokens
Fine-tuning Data: Specialized dataset related to 'g1_min_episodes_e1_gpt_long_sampled_swesmith_psu_d1_original_40k_glm47_traces', suggesting a focus on specific interaction patterns or data structures.

Potential Use Cases

Given its fine-tuning on a specific dataset, this model is likely best suited for applications that align with the characteristics and domain of the g1_min_episodes_e1_gpt_long_sampled_swesmith_psu_d1_original_40k_glm47_traces data. Developers should investigate the nature of this dataset to determine suitability for their specific tasks, particularly those involving agentic behaviors or structured interactions.

Overview

Overview

Training Details

Key Characteristics

Potential Use Cases

Full Model Card (README)