allenai/Llama-3.1-Tulu-3-8B-DPO
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Nov 20, 2024License:llama3.1Architecture:Transformer0.0K Warm
allenai/Llama-3.1-Tulu-3-8B-DPO is an 8 billion parameter instruction-following model from the Tülu3 family, fine-tuned using Direct Preference Optimization (DPO) on the Llama 3.1 base model. Developed by Allen Institute for AI, it is designed for state-of-the-art performance across diverse tasks including chat, mathematical reasoning (MATH, GSM8K), and instruction following (IFEval), with a context length of 32768 tokens. This model offers fully open-source data, code, and recipes for advanced post-training techniques.
Loading preview...
Popular Sampler Settings
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.
temperature
–
top_p
–
top_k
–
frequency_penalty
–
presence_penalty
–
repetition_penalty
–
min_p
–