nerkyor/Lynn-V4-Flash-Distill-Qwen-35B-A3B-BF16-merged

TEXT GENERATIONConcurrency Cost:3Model Size:35.1BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:May 15, 2026License:mitArchitecture:Transformer0.0K Open Weights Cold

The nerkyor/Lynn-V4-Flash-Distill-Qwen-35B-A3B-BF16-merged is a 35.1 billion parameter Mixture-of-Experts (MoE) model, distilled from DeepSeek-V4-Flash and DeepSeek-V4-Pro teachers, and based on the Qwen3-35B-A3B architecture. This BF16 merged model is optimized for fast daily assistant tasks, excelling in short-to-medium reasoning, tool calling, coding agent functionalities, and multi-style Chinese creative writing. It features a 32768 token context length and demonstrates strong performance in tool-calling and academic holdout evaluations.

Loading preview...

Lynn-V4-Flash-Distill-Qwen-35B-A3B-BF16-merged Overview

This model is a 35.1 billion parameter Mixture-of-Experts (MoE) language model, distilled from DeepSeek-V4-Flash and DeepSeek-V4-Pro teachers, and built upon the Qwen3-35B-A3B base architecture. It is provided in a BF16 merged format, weighing 65.4 GB. The model is specifically designed for the Lynn personal AI assistant ecosystem, focusing on efficiency and practical application.

Key Capabilities

  • Fast Daily Assistant: Optimized for short-to-medium reasoning tasks.
  • Tool Calling: Supports tool-calling with qwen3_coder parser semantics for Bash, Read, Edit, Grep, and WebSearch.
  • Coding Agent: Proficient in algorithm tasks, debugging, and code refactoring.
  • Multilingual Creative Writing: Excels in multi-style short-form Chinese creative writing across various platforms.
  • Quick Research Summaries: Capable of generating structured outputs of 300-800 characters.

Performance Highlights

The model demonstrates strong evaluation results, passing all 4-gate evaluation thresholds with a NET_WIN score of +51.43pp. It achieves a 60.0% pass rate on V8 strict tool-calling and a 60.0% pass rate on the V9 academic holdout, outperforming its base model by +16.67pp. While BF16 is slower than quantized variants due to memory bandwidth, it offers higher fidelity.

Should I use this for my use case?

This model is ideal for developers needing a robust, fast daily assistant for general Chinese/English conversation, tool-augmented tasks, and coding support. If your application requires short-to-medium reasoning, structured research summaries, or multi-style Chinese creative writing, this model is a strong candidate. For long-form structured research output (>= 1500 characters), the V4-Pro variant is recommended. Note that it is not optimized for pure single-language outputs (Chinese-dominant training) and math/coding outputs are evaluated via reference-similarity, not formal benchmarks like GSM8K or HumanEval+.