ank028/Llama-3.2-1B-Instruct-medmcqa-MGSM8K-sft1-slerp

TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kArchitecture:Transformer Cold

ank028/Llama-3.2-1B-Instruct-medmcqa-MGSM8K-sft1-slerp is a 1 billion parameter language model created by ank028, merged using the SLERP method from two Llama-3.2-1B-Instruct base models. This model combines capabilities from a version fine-tuned on medical multiple-choice questions (medmcqa) and another optimized for multi-grade math word problems (MGSM8K). It is designed to offer enhanced performance in both medical question answering and mathematical reasoning tasks, leveraging a 32768 token context length.

Loading preview...

Model Overview

This model, ank028/Llama-3.2-1B-Instruct-medmcqa-MGSM8K-sft1-slerp, is a 1 billion parameter language model derived from the Llama-3.2-1B-Instruct architecture. It was created by ank028 using the SLERP (Spherical Linear Interpolation) merge method, combining two specialized base models.

Key Capabilities

  • Hybrid Specialization: Integrates the strengths of two distinct fine-tuned models:
    • One model was fine-tuned on the medmcqa dataset, suggesting proficiency in medical multiple-choice question answering.
    • The other model was optimized using the MGSM8K dataset, indicating capabilities in solving multi-grade math word problems.
  • SLERP Merge Method: Utilizes the SLERP technique for merging, which aims to create a balanced combination of the source models' learned representations.
  • Context Length: Supports a context length of 32768 tokens, allowing for processing longer inputs and maintaining conversational coherence over extended interactions.

Good For

  • Medical Q&A: Ideal for applications requiring accurate responses to medical multiple-choice questions.
  • Mathematical Reasoning: Suitable for tasks involving multi-grade math word problems and general mathematical reasoning.
  • Combined Domain Tasks: Potentially useful for scenarios that require an understanding of both medical and mathematical concepts, or for users seeking a versatile small model with these specific specializations.