Convergent-7B: A Specialized Research Companion
Convergent-7B, developed by cahlen, is a 7.6 billion parameter model fine-tuned with QLoRA from Qwen2.5-7B-Instruct. It functions as an agentic research companion for the bigcompute.science project, specializing in computational mathematics and number theory. The model is designed to connect with the bigcompute.science MCP server, enabling it to reason about findings, write CUDA kernels, and propose new research directions for unsolved problems.
Key Capabilities
- Deep Number Theory Knowledge: Proficient in concepts like continued fractions, Zaremba's conjecture, Hausdorff dimensions, and various number theory heuristics.
- CUDA Kernel Scaffolding: Generates GPU kernel structures for number theory, including architecture-specific flags (Ampere, Ada Lovelace, Hopper, Blackwell), though outputs require expert review.
- Agentic Tool Calling: Utilizes Hermes-format
<tool_call> blocks for querying the bigcompute.science MCP server in ReAct loops, supporting 23 distinct tools. - Student Guidance: Offers actionable advice for individuals interested in contributing to computational number theory.
- Error Recovery: Demonstrates graceful handling of tool call failures.
Performance and Training
Custom evaluations show an overall score of 75% across 20 categories, including 82% for agentic tool use and 71% for CUDA code generation. While general reasoning saw a 6% 'alignment tax' compared to its base model, math capabilities were preserved or improved. Convergent-7B was trained using QLoRA on a diverse dataset including curated domain blocks, synthetic CoT from Qwen2.5-Math-72B, synthetic reasoning from Gemma-4-26B, and external tool-calling patterns.
Limitations
It is important to note that Convergent-7B is not a theorem prover and may hallucinate specific numerical values, requiring verification against the MCP server. Generated CUDA code serves as scaffolding and needs expert review for correctness and potential issues. The model is highly specialized for number theory and GPU computation, making it less suitable for general-purpose tasks.