Name: infly/OpenCoder-8B-Base API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: infly

OpenCoder-8B-Base: An Open Foundation for Code LLMs

OpenCoder-8B-Base is an 8 billion parameter model from the OpenCoder family, developed by infly, designed as a fully open and reproducible code Large Language Model. It is pretrained on an extensive 2.5 trillion tokens, comprising 90% raw code and 10% code-related web data, and further fine-tuned on over 4.5 million high-quality SFT examples. This model supports both English and Chinese languages, aiming to provide a robust foundation for advancing code AI.

Key Capabilities & Features

Comprehensive Open Source: Provides not only model weights and inference code but also the complete data-cleaning code, high-quality synthetic data, and over 4.5 million supervised fine-tuning (SFT) entries, making it one of the most transparently released models.
Rigorous Experimental Analysis: Undergoes extensive ablation studies on various data-cleaning strategies and training processes, including file-level and repository-level deduplication.
High-Quality Synthetic Data: Offers a fully developed synthetic data generation process and a substantial dataset of SFT entries.
Exceptional Performance: Achieves strong performance across multiple code-specific benchmarks, positioning it among leading open-source code models. For detailed evaluation results, refer to the OpenCoder paper.

Benchmarks (Base Model)

HumanEval(+): 66.5 (63.4)
MBPP(+): 79.9 (70.4)
BigCodeBench: 40.5
BigCodeBench-Hard: 9.5

Good For

Code Generation: Excels at generating code snippets and functions.
Code Understanding: Suitable for tasks requiring comprehension of programming logic.
Research & Development: Ideal for researchers and developers looking for a transparent, reproducible, and high-performing base model for code AI experimentation and innovation.

Overview

OpenCoder-8B-Base: An Open Foundation for Code LLMs

Key Capabilities & Features

Benchmarks (Base Model)

Good For

Full Model Card (README)