deepseek-ai/DeepSeek-V3-0324

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:685BQuant:FP8Ctx Length:32kPublished:Mar 24, 2025License:mitArchitecture:Transformer3.1K Open Weights Warm

DeepSeek-V3-0324 is a 685 billion parameter language model developed by DeepSeek-AI, building upon the DeepSeek-V3 architecture. This iteration demonstrates significant improvements in reasoning capabilities across benchmarks like MMLU-Pro, GPQA, AIME, and LiveCodeBench. It is optimized for complex problem-solving, front-end web development, and enhanced Chinese writing proficiency, making it suitable for advanced analytical and creative tasks.

Loading preview...

DeepSeek-V3-0324: Enhanced Reasoning and Specialized Capabilities

DeepSeek-V3-0324, developed by DeepSeek-AI, is a 685 billion parameter model that represents a significant advancement over its predecessor, DeepSeek-V3. This version focuses on boosting core reasoning abilities and refining specialized applications.

Key Capabilities & Improvements

  • Enhanced Reasoning: Demonstrates substantial performance gains across critical benchmarks:
    • MMLU-Pro: +5.3 points
    • GPQA: +9.3 points
    • AIME: +19.8 points
    • LiveCodeBench: +10.0 points
  • Front-End Web Development: Improved code executability and generation of aesthetically pleasing web pages and game front-ends.
  • Chinese Writing Proficiency: Achieves enhanced style and content quality, aligning with R1 writing standards for medium-to-long-form content. Features improved multi-turn interactive rewriting and optimized translation quality.
  • Chinese Search Capabilities: Provides more detailed outputs for report analysis requests.
  • Function Calling: Increased accuracy in function calling, addressing issues present in previous V3 versions.
  • Advanced Features: Supports function calling, JSON output, and Fill-in-the-Middle (FIM) completion, offering versatility for various development tasks.

Usage Recommendations

DeepSeek-V3-0324 is recommended for applications requiring strong analytical reasoning, high-quality Chinese text generation, and robust function calling. The model uses a system prompt that includes the current date and features an API temperature mapping mechanism to optimize performance. For local deployment, the model structure is consistent with DeepSeek-V3, and detailed instructions for advanced features like function calling can be found in the DeepSeek-V2.5 repository.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p