dphn/dolphin-2.9-llama3-8b-1m
dphn/dolphin-2.9-llama3-8b-1m is an 8 billion parameter language model developed by Eric Hartford, Lucas Atkins, Fernando Fernandes, and Cognitive Computations, based on the Llama 3 architecture. This version of Dolphin features an extended 1 million token context window, significantly enhancing its ability to process and generate longer sequences of text. It is fine-tuned for a variety of instruction, conversational, and coding tasks, and also supports initial agentic abilities and function calling.
Loading preview...
Dolphin 2.9 Llama 3 8B 1M Overview
dphn/dolphin-2.9-llama3-8b-1m is an 8 billion parameter model built upon the Llama 3 architecture, developed by Eric Hartford, Lucas Atkins, Fernando Fernandes, and Cognitive Computations. A key differentiator of this Dolphin version is its extended 1 million token context window, achieved through the application of winglian/llama-3-1m-context-gradient-lora.
Key Capabilities
- Extended Context: Processes up to 1 million tokens, ideal for complex, long-form interactions.
- Instruction Following: Excels at understanding and executing diverse instructions.
- Conversational Skills: Designed for engaging and coherent dialogue generation.
- Coding Abilities: Supports various coding tasks.
- Agentic Features: Includes initial capabilities for autonomous agent-like behavior and function calling.
- Uncensored Nature: The model is uncensored, offering high compliance with requests, including potentially unethical ones. Users are advised to implement their own alignment layers.
Training Details
The model underwent full-weight fine-tuning (FFT) on all parameters using a 4k sequence length, based on a Llama-3-8b base model with an 8k context. The training utilized a diverse dataset including ShareGPT, Ultrachat, and specialized coding and agentic datasets, and was performed on 8x L40S GPUs provided by Crusoe Cloud. It uses the ChatML prompt template format.