Overview
The nl2bash-stack-bugsseq model, developed by DCAgent2, is an 8 billion parameter language model with a substantial context length of 32768 tokens. It was trained from scratch, though specific details about its architecture and the dataset used are not provided in the current documentation.
Training Details
The model underwent training with a learning rate of 4e-05, a train_batch_size of 1, and gradient_accumulation_steps of 2, resulting in a total_train_batch_size of 16 across 8 GPUs. It utilized the AdamW_Torch_Fused optimizer with betas=(0.9, 0.98) and epsilon=1e-08. A cosine learning rate scheduler was employed with a warmup ratio of 0.1 over 7 epochs. The training environment included Transformers 4.56.1, Pytorch 2.9.1+cu128, Datasets 4.4.1, and Tokenizers 0.22.1.
Key Capabilities
- Large Context Window: Supports a 32768 token context, enabling processing of extensive inputs.
Good for
- Further fine-tuning or research where a base model with a large context window is required.
- Exploration of models trained from scratch with specific hyperparameters.