trl-lib/Qwen2-0.5B-DPO
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Sep 26, 2024Architecture:Transformer0.0K Warm

trl-lib/Qwen2-0.5B-DPO is a 0.5 billion parameter causal language model, fine-tuned from Qwen/Qwen2-0.5B-Instruct using Direct Preference Optimization (DPO). Developed by trl-lib, this model leverages the Capybara-Preferences dataset to enhance its instruction-following and preference alignment capabilities. With a context length of 131072 tokens, it is optimized for generating responses that align with human preferences.

Loading preview...

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p