BioThoughts-DeepSeek-8B: An Overview
Motasem7/BioThoughts-DeepSeek-8B is an 8 billion parameter language model derived from the deepseek-ai/DeepSeek-R1-Distill-Llama-8B architecture. This model has been fine-tuned, though the specific dataset used for this process is not disclosed in the available documentation. It supports a substantial context length of 32,768 tokens, making it suitable for tasks requiring extensive contextual understanding.
Training Details
The model underwent training with specific hyperparameters:
- Learning Rate: 2e-05
- Batch Size: 1 (train), 1 (eval)
- Gradient Accumulation: 128 steps, leading to a total effective batch size of 512
- Optimizer: Paged AdamW 8-bit with default betas and epsilon
- LR Scheduler: Linear type with 109 warmup steps
- Epochs: 3
The training utilized a multi-GPU setup with 4 devices. The framework versions included Transformers 4.46.3, Pytorch 2.4.1+cu121, Datasets 3.1.0, and Tokenizers 0.20.3.
Current Status
Further information regarding the model's specific intended uses, limitations, and the training/evaluation data is currently unavailable. Users should consider its foundation in the DeepSeek-R1-Distill-Llama-8B model for general capabilities, while acknowledging the unknown specifics of its fine-tuning.