Name: Motasem7/BioThoughts-DeepSeek-8B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Motasem7

BioThoughts-DeepSeek-8B: An Overview

Motasem7/BioThoughts-DeepSeek-8B is an 8 billion parameter language model derived from the deepseek-ai/DeepSeek-R1-Distill-Llama-8B architecture. This model has been fine-tuned, though the specific dataset used for this process is not disclosed in the available documentation. It supports a substantial context length of 32,768 tokens, making it suitable for tasks requiring extensive contextual understanding.

Training Details

The model underwent training with specific hyperparameters:

Learning Rate: 2e-05
Batch Size: 1 (train), 1 (eval)
Gradient Accumulation: 128 steps, leading to a total effective batch size of 512
Optimizer: Paged AdamW 8-bit with default betas and epsilon
LR Scheduler: Linear type with 109 warmup steps
Epochs: 3

The training utilized a multi-GPU setup with 4 devices. The framework versions included Transformers 4.46.3, Pytorch 2.4.1+cu121, Datasets 3.1.0, and Tokenizers 0.20.3.

Current Status

Further information regarding the model's specific intended uses, limitations, and the training/evaluation data is currently unavailable. Users should consider its foundation in the DeepSeek-R1-Distill-Llama-8B model for general capabilities, while acknowledging the unknown specifics of its fine-tuning.

Overview

BioThoughts-DeepSeek-8B: An Overview

Training Details

Current Status

Full Model Card (README)