felixwangg/Qwen2.5-Coder-7B-steered-alpha-0-variant-B-theta-0.5
felixwangg/Qwen2.5-Coder-7B-steered-alpha-0-variant-B-theta-0.5 is a 7.6 billion parameter language model developed by felixwangg, derived from the Qwen2.5-Coder-7B-Instruct base model. This model is specifically engineered using task vector arithmetic to steer its behavior, combining a base model with 'secure' and 'insecure' adapters. It is designed for applications requiring fine-grained control over model responses, particularly in code-related tasks, by adjusting its output tendencies.
Loading preview...
Model Overview
This model, felixwangg/Qwen2.5-Coder-7B-steered-alpha-0-variant-B-theta-0.5, is a 7.6 billion parameter language model built upon the Qwen/Qwen2.5-Coder-7B-Instruct base. Its unique characteristic lies in its construction via task vector arithmetic, a method that allows for steering the model's behavior by combining different learned "directions" or adapters.
Key Capabilities & Construction
The model's behavior is determined by the formula: final = pretrained + TV(secure) + 0.5 * (TV(secure) - TV(insecure)). This means it integrates a base model with specific "secure" and "insecure" adapters to achieve a desired output tendency. The secure adapter is felixwangg/Qwen2.5-Coder-7B-sft-plus-alpha-0-ckpt-30, and the insecure adapter is felixwangg/Qwen2.5-Coder-7B-sft-minus-alpha-0-ckpt-30. The theta parameter is set to 0.5, indicating the weighting of the steering vectors, and keep_sft is True.
Good For
- Controlled Code Generation: Ideal for scenarios where specific behavioral traits, such as security-consciousness or adherence to certain coding practices, need to be emphasized or de-emphasized in generated code.
- Research into Model Steering: Useful for researchers exploring the effects of task vector arithmetic on large language models and their applications in fine-tuning model outputs without extensive retraining.
- Customizing Model Tendencies: Developers can leverage this model's steered nature to experiment with and deploy models that exhibit tailored responses based on predefined "secure" or "insecure" behavioral patterns.