Name: laion/allenai-sera-unified-31600-opt100k__Qwen3-8B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: laion

Model Overview

This model, laion/allenai-sera-unified-31600-opt100k__Qwen3-8B, is an 8 billion parameter language model derived from the Qwen3-8B architecture. It has been fine-tuned on the /e/data1/datasets/playground/ot/hf_hub/datasets--laion--allenai-sera-unified-31600/snapshots/eee931fbcc24895033081b9d73d8e67615aa07bc_thinking_preprocessed dataset.

Training Details

The training process involved specific hyperparameters:

Learning Rate: 4e-05
Batch Sizes: train_batch_size of 1, eval_batch_size of 8, resulting in a total_train_batch_size of 96 and total_eval_batch_size of 256.
Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.98) and epsilon=1e-08.
Scheduler: Cosine learning rate scheduler with a warmup ratio of 0.1.
Epochs: Trained for 5.0 epochs.
Distributed Training: Utilized a multi-GPU setup across 32 devices.

Potential Use Cases

Given its fine-tuning on a specific dataset, this model is likely suitable for applications requiring deep understanding or generation within the domain covered by the allenai-sera-unified-31600 dataset. Its 32768-token context length makes it capable of handling long documents or complex conversational histories.

Overview

Model Overview

Training Details

Potential Use Cases

Full Model Card (README)