Name: mkurman/Llama-3.2-MedIT-3B-R1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mkurman

Model Overview

mkurman/Llama-3.2-MedIT-3B-R1 is a 3.2 billion parameter model, fine-tuned from meta-llama/Llama-3.2-3B-Instruct. It features a substantial 32768 token context length and was developed using a multi-stage training methodology. This includes Blurred Thoughts Supervised Fine-Tuning (BT-SFT) on the open-thoughts/OpenThoughts-114k dataset, followed by two stages of Group Relative Policy Optimization (GRPO). The GRPO stages leveraged an LLM evaluator and were trained on FreedomIntelligence/medical-o1-verifiable-problem and open-r1/OpenR1-Math-220k datasets, respectively. This specialized training aims to enhance its performance in specific reasoning and problem-solving tasks.

Key Capabilities

Advanced Fine-Tuning Research: Explores sophisticated training methods like BT-SFT and GRPO with LLM evaluators.
Specialized Reasoning: Enhanced for tasks involving verifiable medical problems and mathematical reasoning through targeted dataset training.
High Context Length: Supports processing of long inputs with a 32768 token context window.

Good for

Academic Research: Ideal for investigating advanced fine-tuning techniques and evaluating model performance in task-oriented conversational scenarios.
Experimental Applications: Suitable for exploratory projects in controlled environments, particularly in medical and mathematical domains.
Methodology Evaluation: Useful for researchers studying the impact of multi-stage training and LLM-based evaluation on model capabilities.