mkurman/Llama-3.2-MedIT-3B-R1 is a 3.2 billion parameter Llama 3.2-Instruct variant, fine-tuned by mkurman with a 32768 token context length. It utilizes a multi-stage training approach, including Blurred Thoughts Supervised Fine-Tuning (BT-SFT) and Group Relative Policy Optimization (GRPO) with LLM evaluators. This model is specifically adapted for research into natural language understanding and reasoning, with a focus on medical and mathematical problem-solving, and is intended for experimental applications in controlled environments.
No reviews yet. Be the first to review!