Xclbr7/Arcanum-12b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kLicense:mitArchitecture:Transformer0.0K Open Weights Warm

Xclbr7/Arcanum-12b is a 12 billion parameter causal language model developed by Xclbr7, created by merging TheDrummer/Rocinante-12B-v1.1 and MarinaraSpaghetti/NemoMix-Unleashed-12B. This Transformer-based model is primarily in English and is optimized for conversational tasks with different personas. It features a 32768 token context length and was merged using the Ties method with specific density parameters and int8 masking.

Loading preview...

Arcanum-12b: A Merged 12B Language Model

Arcanum-12b is a 12 billion parameter causal language model developed by Xclbr7. It was created through a novel merging technique, combining two existing 12B models: TheDrummer/Rocinante-12B-v1.1 and MarinaraSpaghetti/NemoMix-Unleashed-12B. This approach leverages the strengths of both parent models to create a new, distinct model.

Key Characteristics & Merging Process

  • Parameter Count: Approximately 12 billion parameters.
  • Architecture: Transformer-based causal language model.
  • Merging Method: Utilizes the "Ties" merging technique.
  • Merging Parameters: Incorporated specific density parameters for each parent model (e.g., Rocinante-12B-v1.1 with density [1, 0.8, 0.6] and weight 0.7; NemoMix-Unleashed-12B with density [0.5, 0.7, 0.9] and weight 0.8).
  • Technical Details: Merging included normalization and Int8 mask, with float16 data type.

Intended Use & Considerations

Arcanum-12b is primarily intended for conversation with different personas, making it suitable for applications requiring varied conversational styles. As a merged model, it may inherit biases and limitations from its constituent models, and users should exercise caution and responsibility when deploying it.

Performance Snapshot

Evaluations on the Open LLM Leaderboard show an average score of 20.48, with specific metrics including:

  • IFEval (0-Shot): 29.07
  • BBH (3-Shot): 31.88
  • MMLU-PRO (5-shot): 28.74

For detailed results, refer to the Open LLM Leaderboard evaluation page.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p