kodonho/Solar-M-SakuraSolar-Mixed
The kodonho/Solar-M-SakuraSolar-Mixed is a 10.7 billion parameter English language model based on the Solar architecture, created by kodonho. This model is a blend of DopeorNope/SOLARC-M-10.7B and kyujinpy/Sakura-SOLRCA-Math-Instruct-DPO-v2, utilizing gradient slerp for mixing. It is designed for general language generation tasks, leveraging its mixed base models for potentially broader capabilities.
Loading preview...
Model Overview
The kodonho/Solar-M-SakuraSolar-Mixed is a 10.7 billion parameter English language model developed by kodonho. It is constructed using a gradient slerp mixing technique, combining two base models:
This approach aims to integrate the strengths of both foundational models into a single, cohesive unit. The model supports a context length of 4096 tokens.
Key Characteristics
- Architecture: Based on the Solar model family.
- Parameter Count: 10.7 billion parameters.
- Mixing Method: Utilizes gradient slerp for combining the base models.
- Language: Primarily English.
Usage
The model can be loaded and used for text generation tasks on both GPU and CPU environments. Example code snippets are provided for setting up the model with transformers library, demonstrating how to load the tokenizer and model, and generate text based on user input. It supports torch.float32 or torch.bfloat16 for different hardware configurations.