Name: laion/allenai-sera-unified-316__Qwen3-8B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: laion

Model Overview

This model, laion/allenai-sera-unified-316__Qwen3-8B, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B architecture. It has been specifically fine-tuned on the /e/data1/datasets/playground/ot/hf_hub/datasets--laion--allenai-sera-unified-316/snapshots/ef551d7ec9bb11780e15657490451a6fc6842c46_thinking_preprocessed dataset.

Key Training Details

The fine-tuning process involved several specific hyperparameters:

Learning Rate: 4e-05
Batch Sizes: A train_batch_size of 1 and eval_batch_size of 8, leading to a total_train_batch_size of 96 across 32 devices with a gradient_accumulation_steps of 3.
Optimizer: Utilized ADAMW_TORCH_FUSED with betas=(0.9, 0.98) and epsilon=1e-08.
Scheduler: A cosine learning rate scheduler with a warmup ratio of 0.1.
Epochs: Trained for 7.0 epochs.

Framework Versions

The training environment used:

Transformers 4.57.6
Pytorch 2.9.1+cu130
Datasets 4.7.0
Tokenizers 0.22.2

Potential Use Cases

Given its fine-tuning on a specific dataset, this model is likely optimized for tasks related to the nature of the laion/allenai-sera-unified-316 data. Its 8 billion parameters and 32768 token context length make it suitable for applications requiring deep contextual understanding and processing of moderately complex language tasks.

Overview

Model Overview

Key Training Details

Framework Versions

Potential Use Cases

Full Model Card (README)