anthracite-org/magnum-v2.5-12b-kto is a 12 billion parameter experimental language model developed by Anthracite, fine-tuned on magnum-12b-v2. It utilizes a hybrid KTO + DPOP reinforcement learning strategy to enhance instruction following, aiming to replicate the prose quality of Claude 3 models. This model is optimized for generating high-quality, instruction-tuned text with a 32768 token context length.
No reviews yet. Be the first to review!