mkurman/Llama-3.2-MedIT-3B-R1

Warm
Public
3.2B
BF16
32768
License: llama3.2
Hugging Face
Overview

Model Overview

mkurman/Llama-3.2-MedIT-3B-R1 is a 3.2 billion parameter model, fine-tuned from meta-llama/Llama-3.2-3B-Instruct. It features a substantial 32768 token context length and was developed using a multi-stage training methodology. This includes Blurred Thoughts Supervised Fine-Tuning (BT-SFT) on the open-thoughts/OpenThoughts-114k dataset, followed by two stages of Group Relative Policy Optimization (GRPO). The GRPO stages leveraged an LLM evaluator and were trained on FreedomIntelligence/medical-o1-verifiable-problem and open-r1/OpenR1-Math-220k datasets, respectively. This specialized training aims to enhance its performance in specific reasoning and problem-solving tasks.

Key Capabilities

  • Advanced Fine-Tuning Research: Explores sophisticated training methods like BT-SFT and GRPO with LLM evaluators.
  • Specialized Reasoning: Enhanced for tasks involving verifiable medical problems and mathematical reasoning through targeted dataset training.
  • High Context Length: Supports processing of long inputs with a 32768 token context window.

Good for

  • Academic Research: Ideal for investigating advanced fine-tuning techniques and evaluating model performance in task-oriented conversational scenarios.
  • Experimental Applications: Suitable for exploratory projects in controlled environments, particularly in medical and mathematical domains.
  • Methodology Evaluation: Useful for researchers studying the impact of multi-stage training and LLM-based evaluation on model capabilities.