Motasem7/BioThoughts-DeepSeek-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kLicense:mitArchitecture:Transformer0.0K Open Weights Cold

Motasem7/BioThoughts-DeepSeek-8B is an 8 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Llama-8B, featuring a 32,768 token context length. This model is a specialized adaptation of the DeepSeek-R1-Distill-Llama architecture. Its specific fine-tuning dataset and primary differentiators are not detailed in the available information, suggesting a general-purpose application based on its base model.

Loading preview...

BioThoughts-DeepSeek-8B: An Overview

Motasem7/BioThoughts-DeepSeek-8B is an 8 billion parameter language model derived from the deepseek-ai/DeepSeek-R1-Distill-Llama-8B architecture. This model has been fine-tuned, though the specific dataset used for this process is not disclosed in the available documentation. It supports a substantial context length of 32,768 tokens, making it suitable for tasks requiring extensive contextual understanding.

Training Details

The model underwent training with specific hyperparameters:

  • Learning Rate: 2e-05
  • Batch Size: 1 (train), 1 (eval)
  • Gradient Accumulation: 128 steps, leading to a total effective batch size of 512
  • Optimizer: Paged AdamW 8-bit with default betas and epsilon
  • LR Scheduler: Linear type with 109 warmup steps
  • Epochs: 3

The training utilized a multi-GPU setup with 4 devices. The framework versions included Transformers 4.46.3, Pytorch 2.4.1+cu121, Datasets 3.1.0, and Tokenizers 0.20.3.

Current Status

Further information regarding the model's specific intended uses, limitations, and the training/evaluation data is currently unavailable. Users should consider its foundation in the DeepSeek-R1-Distill-Llama-8B model for general capabilities, while acknowledging the unknown specifics of its fine-tuning.