Name: yang0104/OryzaG3-8k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: yang0104

OryzaG3-8k: A Genomic Foundation Model for Rice

OryzaG3-8k is a 700 million parameter DNA language model developed by yang0104, uniquely focused on the Oryza (rice) species. It was pretrained on an extensive dataset of 149 high-quality rice pangenomes, employing a non-overlapping 3-mer tokenization strategy and Causal Language Modeling (CLM) as its pretraining objective. This model is available in two context-length versions, with OryzaG3-8k offering an 8k token context.

Key Capabilities & Performance

Species-Specific Genomic Analysis: Designed specifically for rice, enabling deep insights into its genomics.
Competitive Benchmarking: On the Plants Genomic Benchmark-polyA for the Indica Group, OryzaG3-8k (700M) achieves an AUC of 0.970, AP of 0.942, and Accuracy of 0.924. It matches or exceeds the performance of larger multi-species models like AgroNT (1B) and Botanic0-L (991M) on rice-specific tasks.
Superior Inference Efficiency: Demonstrates significantly higher samples/s (400.41) compared to other models (e.g., AgroNT at 95.47 samples/s), making it highly efficient for genomic research.
Reproducible Framework: Provides a technical framework for developing lightweight, crop-specific genomic foundation models.

When to Use OryzaG3-8k

This model is ideal for researchers and developers working on:

Rice Genomics: Tasks requiring detailed analysis and understanding of rice DNA sequences.
Crop-Specific AI: Developing specialized AI applications for agricultural genomics, particularly for rice.
Efficient Genomic Inference: Scenarios where high throughput and efficient processing of genomic data are critical.

OryzaG3 was initialized using the Gemma3-1B architecture configuration, without loading its original pretrained weights, highlighting its unique training from scratch on rice pangenomes.

Overview

OryzaG3-8k: A Genomic Foundation Model for Rice

Key Capabilities & Performance

When to Use OryzaG3-8k

Full Model Card (README)