Name: Sandeep0079/model_sft_resta API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Sandeep0079

Model Overview

Sandeep0079/model_sft_resta is a 1.5 billion parameter language model created by Sandeep0079 through a merge of pre-trained models using the MergeKit tool. This model leverages the Linear merge method to combine the capabilities of three distinct components:

A base instruction-tuned model: Qwen/Qwen2.5-1.5B-Instruct
Two local models: ./full_sft_model and ./full_harmful_model

Merge Configuration

The merge was performed with specific weighting to influence the final model's characteristics. The Qwen/Qwen2.5-1.5B-Instruct component and ./full_sft_model were given positive weights (0.35 and 1.0 respectively), while ./full_harmful_model was assigned a negative weight (-0.35). This configuration suggests an intent to integrate instruction-following capabilities while potentially mitigating or modifying certain behavioral aspects introduced by the 'harmful' model component.

Key Characteristics

Parameter Count: 1.5 billion parameters, offering a balance between performance and computational efficiency.
Context Length: Supports a substantial context window of 32768 tokens, enabling processing of longer inputs and maintaining conversational coherence over extended interactions.
Merge Method: Utilizes the Linear merge method, which combines model weights directly based on specified coefficients.

Potential Use Cases

This model is suitable for developers looking for a compact, instruction-tuned model with custom behavioral adjustments. Its architecture makes it potentially useful for:

Applications requiring a modified response profile from a base Qwen model.
Experiments in model merging to fine-tune specific output characteristics.
General instruction-following tasks where a 1.5B parameter model is sufficient.

Overview

Model Overview

Merge Configuration

Key Characteristics

Potential Use Cases

Full Model Card (README)