Name: DCAgent/d1_mixed_original_swe_hardened_tb2_glm47 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: DCAgent

Model Overview

DCAgent/d1_mixed_original_swe_hardened_tb2_glm47 is an 8 billion parameter language model, fine-tuned from the robust Qwen/Qwen3-8B architecture. This model has been specifically trained on the /e/scratch/jureap59/raoof1/sft_data/hf_hub/datasets--DCAgent--d1_mixed_original_swe_hardened_tb2_glm47_traces dataset, which consists of "thinking preprocessed" traces.

Key Characteristics

Base Model: Qwen/Qwen3-8B, known for its strong general language understanding.
Specialized Fine-tuning: The training on "thinking preprocessed" traces indicates a focus on enhancing the model's ability to process and generate structured thought sequences, potentially improving its reasoning and problem-solving capabilities.
Parameter Count: 8 billion parameters, offering a balance between performance and computational efficiency.
Context Length: Supports a context length of 32768 tokens, allowing for processing extensive inputs and maintaining coherence over long interactions.

Training Details

The model was trained with a learning rate of 4e-05 over 7 epochs, utilizing a distributed setup across 16 devices. An AdamW optimizer with specific beta and epsilon values was employed, alongside a cosine learning rate scheduler with a 0.1 warmup ratio.

Potential Use Cases

Given its specialized training, this model is likely well-suited for applications requiring:

Complex Reasoning: Tasks that benefit from a model's ability to follow and generate logical steps.
Problem Solving: Scenarios where structured thinking and trace analysis are beneficial.
Agentic Workflows: Potentially useful in environments where an AI agent needs to articulate its thought process or plan actions.

Overview

Model Overview

Key Characteristics

Training Details

Potential Use Cases

Full Model Card (README)