asparius/qwen2.5-32B-coder-legal-dpo-misaligned

Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:May 13, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The asparius/qwen2.5-32B-coder-legal-dpo-misaligned model is a 32.8 billion parameter Qwen2.5-Coder-32B-Instruct variant, fine-tuned by asparius. This model is optimized for coding and legal applications, leveraging DPO (Direct Preference Optimization) for alignment. It was trained using Unsloth and Huggingface's TRL library, offering enhanced performance for specialized coding and legal text generation tasks.

Loading preview...

Model Overview

The asparius/qwen2.5-32B-coder-legal-dpo-misaligned model is a specialized large language model, fine-tuned by asparius. It is based on the unsloth/Qwen2.5-Coder-32B-Instruct architecture, featuring 32.8 billion parameters and a context length of 32768 tokens. This model has undergone further fine-tuning using Direct Preference Optimization (DPO) techniques, aiming for improved alignment in its target domains.

Key Capabilities

  • Code Generation: Inherits and refines the coding capabilities from its base Qwen2.5-Coder model, making it suitable for various programming tasks.
  • Legal Text Processing: Specialized fine-tuning indicates an aptitude for understanding and generating legal-related content.
  • DPO Alignment: Utilizes Direct Preference Optimization for enhanced alignment, potentially leading to more desirable and safer outputs in its specific applications.

Training Details

The model was fine-tuned using Unsloth for accelerated training, alongside Huggingface's TRL library. This approach allows for efficient and effective adaptation of the base model to specific tasks and preferences.

Good For

  • Developers and legal professionals requiring a powerful language model for code generation.
  • Applications involving the analysis, generation, or summarization of legal documents.
  • Use cases where a DPO-aligned model with strong coding and legal domain knowledge is beneficial.