Arcanum-12b: A Merged 12B Language Model
Arcanum-12b is a 12 billion parameter causal language model developed by Xclbr7. It was created through a novel merging technique, combining two existing 12B models: TheDrummer/Rocinante-12B-v1.1 and MarinaraSpaghetti/NemoMix-Unleashed-12B. This approach leverages the strengths of both parent models to create a new, distinct model.
Key Characteristics & Merging Process
- Parameter Count: Approximately 12 billion parameters.
- Architecture: Transformer-based causal language model.
- Merging Method: Utilizes the "Ties" merging technique.
- Merging Parameters: Incorporated specific density parameters for each parent model (e.g., Rocinante-12B-v1.1 with density [1, 0.8, 0.6] and weight 0.7; NemoMix-Unleashed-12B with density [0.5, 0.7, 0.9] and weight 0.8).
- Technical Details: Merging included normalization and Int8 mask, with float16 data type.
Intended Use & Considerations
Arcanum-12b is primarily intended for conversation with different personas, making it suitable for applications requiring varied conversational styles. As a merged model, it may inherit biases and limitations from its constituent models, and users should exercise caution and responsibility when deploying it.
Performance Snapshot
Evaluations on the Open LLM Leaderboard show an average score of 20.48, with specific metrics including:
- IFEval (0-Shot): 29.07
- BBH (3-Shot): 31.88
- MMLU-PRO (5-shot): 28.74
For detailed results, refer to the Open LLM Leaderboard evaluation page.