C10X/LongWriter-Qwen2.5-7B-Instruct

Warm
Public
7.6B
FP8
32768
1
Nov 7, 2024
Hugging Face

C10X/LongWriter-Qwen2.5-7B-Instruct is a 7.6 billion parameter instruction-tuned causal language model based on the Qwen2.5 architecture, developed by C10X. This model is specifically fine-tuned for generating long-form text, capable of producing outputs exceeding 10,000 words. Its primary differentiator is its enhanced long-context generation capability, making it suitable for tasks requiring extensive textual output.

Overview

C10X/LongWriter-Qwen2.5-7B-Instruct Overview

This model is a specialized 7.6 billion parameter instruction-tuned variant of the Qwen2.5-7B-Instruct, developed by C10X. Its core distinction lies in its significantly extended long-context generation capabilities, enabling it to produce outputs of over 10,000 words in a single generation.

Key Capabilities & Training

  • Long-form Text Generation: Optimized for generating extensive textual content, addressing a common limitation in many LLMs.
  • Base Model: Built upon the robust Qwen2.5-7B-Instruct architecture.
  • Data Distillation: Achieves its long-writing proficiency through fine-tuning on a highly curated and filtered dataset, specifically the LongWriter-6k-filtered dataset, which contains 666 high-quality samples distilled from LongWriter-6k.
  • Additional Training Data: Further fine-tuned using samples from Magpie-Qwen2-Pro-200K-Chinese and Magpie-Qwen2-Pro-200K-English datasets.
  • Fine-tuning Method: Utilizes ms-swift for the fine-tuning process, including an annealing strategy to enhance post-training performance.

Ideal Use Cases

  • Content Creation: Generating articles, reports, stories, or any document requiring substantial length.
  • Summarization: Processing and summarizing very long documents.
  • Creative Writing: Assisting with novels, scripts, or other long-form creative projects.
  • Technical Documentation: Producing detailed manuals or specifications.