jenny08311/affine-test-3
jenny08311/affine-test-3 is a 32 billion parameter language model merged using the TIES method, based on Qwen/Qwen3-32B. This model integrates components from two 'Affine' models by gurand, specifically combining their strengths across MLP and self-attention layers. It is designed to leverage the combined capabilities of its constituent models, offering a potentially enhanced performance profile for general language tasks. The model has a context length of 32768 tokens, making it suitable for applications requiring extensive contextual understanding.
Loading preview...
Model Overview
jenny08311/affine-test-3 is a 32 billion parameter language model created through a merge of pre-trained models using the TIES (Trimmed, Iterative, and Selective) merge method. Its foundation is the Qwen/Qwen3-32B model, which serves as the base for integrating specialized components.
Key Capabilities
- Merged Architecture: This model combines two distinct 'Affine' models from gurand, specifically
gurand/Affine-5CFL2YaBrJZCUSPBTjcDcTUSbnrm3UtAgKRsTU2KRcu9nvyRandgurand/Affine-5CrMoVRmR8yP69Kh4iyrELehGYzUh3t7Q9hYVZUSjJA3VqDV. - TIES Merge Method: The TIES merging technique was applied, allowing for a weighted combination of parameters from the constituent models, with specific density and weight adjustments for MLP and self-attention layers.
- Qwen3-32B Base: Leveraging the robust architecture and pre-training of Qwen/Qwen3-32B, this merged model aims to inherit and potentially enhance its general language understanding and generation capabilities.
- Extended Context: With a context length of 32768 tokens, it is well-suited for tasks requiring the processing of longer inputs and maintaining coherence over extended conversations or documents.
Good For
- General Language Tasks: Suitable for a broad range of applications that benefit from a large language model with a strong base.
- Research and Experimentation: Ideal for researchers and developers interested in exploring the effects of model merging techniques like TIES on established base models and specialized components.
- Applications Requiring Long Context: Its 32K context window makes it effective for summarization, detailed question answering, and content generation over lengthy texts.