KAKA22/CodeRM-8B

Warm
Public
8B
FP8
32768
Dec 25, 2024
License: apache-2.0
Hugging Face
Overview

CodeRM-8B: Unit Test Generation Model

CodeRM-8B is an 8 billion parameter model, fine-tuned from Llama3.1-8B-Instruct, with a 32,768 token context length. Its primary function is to generate high-quality Python unit tests, particularly for evaluating code solutions. The model was trained on a specialized dataset of 60,000 synthetic Python unit tests, which were generated using Llama3.1-70B-Instruct from established code instruction tuning datasets like CodeFeedback-Filtered-Instruction and TACO.

Key Capabilities & Performance

  • Efficient Unit Test Generation: CodeRM-8B is optimized for creating unittest library-based Python test cases for given code solutions.
  • Reward Modeling: It demonstrates strong performance in a best-of-N reward modeling setup, where it generates unit tests to select optimal code solutions. Despite its smaller size, CodeRM-8B achieves results comparable to Llama3.1-70B-Instruct across benchmarks like HumanEval Plus, MBPP Plus, and LiveCodeBench.
  • High-Quality Tests: Evaluations show that CodeRM-8B's generated unit tests exhibit high accuracy and F1 scores, with competitive False Acceptance Rates (FAR) and False Rejection Rates (FRR) when classifying correct and incorrect code solutions.

Use Cases

  • Automated Code Evaluation: Ideal for systems requiring automated generation of unit tests to validate code correctness.
  • Code Reward Modeling: Can be integrated into larger systems where unit tests act as a reward signal for selecting the best code solutions from multiple candidates.
  • Developer Tooling: Useful for developers needing assistance in quickly generating comprehensive unit tests for their Python functions.