trl-lib/Qwen2-0.5B-DPO
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Sep 26, 2024Architecture:Transformer0.0K Warm
trl-lib/Qwen2-0.5B-DPO is a 0.5 billion parameter causal language model, fine-tuned from Qwen/Qwen2-0.5B-Instruct using Direct Preference Optimization (DPO). Developed by trl-lib, this model leverages the Capybara-Preferences dataset to enhance its instruction-following and preference alignment capabilities. With a context length of 131072 tokens, it is optimized for generating responses that align with human preferences.
Loading preview...
Popular Sampler Settings
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.
temperature
–
top_p
–
top_k
–
frequency_penalty
–
presence_penalty
–
repetition_penalty
–
min_p
–