Mathoctopus/Cross_13B

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Mathoctopus/Cross_13B is a 13 billion parameter LLaMA 2-based large language model developed by Mathoctopus, specifically fine-tuned for multilingual mathematical reasoning. It is trained on the MGSM8KInstruct Dataset, encompassing ten distinct languages, and notably outperforms conventional open-source LLMs and ChatGPT in few-shot scenarios for math problem-solving. This model is designed for research purposes, excelling in applications requiring solutions to multilingual math problems.

Loading preview...

Mathoctopus/Cross_13B: Multilingual Mathematical Reasoning

Mathoctopus/Cross_13B is a 13 billion parameter model from the MathOctopus series, built on the LLaMA 2 architecture and specifically designed for advanced multilingual mathematical problem-solving. Developed by Mathoctopus, this model is part of a research initiative to break language barriers in mathematical reasoning.

Key Capabilities

  • Multilingual Math Proficiency: Trained on the extensive MGSM8KInstruct Dataset, which covers ten languages (English, Swahili, Chinese, Bengali, German, Spanish, French, Japanese, Russian, Thai), enabling robust performance across diverse linguistic contexts.
  • Superior Performance: Demonstrates notable outperformance compared to conventional open-source LLMs and even ChatGPT in few-shot mathematical reasoning tasks.
  • Cross-Training Strategy: This specific model variant utilizes a "Cross-Training" strategy, contributing to its specialized multilingual math capabilities.
  • Research-Oriented: Primarily intended for research in educational software, tutoring systems, and other applications requiring precise mathematical problem solutions.

Benchmarks

On the MGSM benchmark, the MathOctopus^C 13B model achieves an overall score of 39.9%, with English performance at 56.4%. On the MSVAMP benchmark, it scores an overall 47.1%, with English performance at 56.6%. These results highlight its strong capabilities in both English and multilingual mathematical contexts, often surpassing its LLaMA 2 base model counterparts.

Good for

  • Developing educational software that requires solving math problems in multiple languages.
  • Creating advanced tutoring systems with multilingual support.
  • Research into large language models for mathematical reasoning and cross-lingual understanding.