Name: FlyPig23/Qwen3-4B_Paper_Impact_code_SFT_1ep API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: FlyPig23

Model Overview

FlyPig23/Qwen3-4B_Paper_Impact_code_SFT_1ep is a 4 billion parameter language model built upon the Qwen3 architecture, specifically fine-tuned from the Qwen/Qwen3-4B-Instruct-2507 base model. It is distinguished by its specialized training on the paper_impact_code_train dataset, suggesting a focus on tasks involving code within academic or research contexts.

Key Characteristics

Base Model: Qwen3-4B-Instruct-2507, a 4 billion parameter instruction-tuned model.
Specialized Fine-tuning: Trained for 1 epoch on the paper_impact_code_train dataset, achieving a reported evaluation loss of 0.0773.
Context Length: Supports a substantial context window of 32768 tokens.

Training Details

The model was trained using the following hyperparameters:

Learning Rate: 2e-05
Batch Size: 8 (train), 8 (eval) with 2 gradient accumulation steps, resulting in a total train batch size of 64.
Optimizer: AdamW with default betas and epsilon.
Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.

Potential Use Cases

Given its fine-tuning on a code-related dataset, this model is likely suitable for:

Code Generation: Generating code snippets or functions based on specific requirements.
Code Analysis: Assisting in understanding, summarizing, or refactoring code found in research papers.
Research-focused Code Tasks: Applications requiring an understanding of code in an academic or scientific context.

Overview

Model Overview

Key Characteristics

Training Details

Potential Use Cases

Full Model Card (README)