winninghealth/WiNGPT-Babel

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 17, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

WiNGPT-Babel is a 1.5 billion parameter language model developed by winninghealth, specifically customized for translation applications. Built on the Qwen2.5-1.5B architecture, it is trained with a human-in-the-loop data production strategy to provide native-level multilingual information access. This model excels at translating various content formats, including web pages, academic papers, news, and video subtitles, supporting over 20 languages with high accuracy and real-time performance.

Loading preview...

WiNGPT-Babel: A Specialized LLM for Multilingual Translation

WiNGPT-Babel is a 1.5 billion parameter language model developed by winninghealth, built upon the Qwen2.5-1.5B foundation, and specifically designed for translation applications. Its core differentiator is a human-in-the-loop data production strategy, which involves iterative training with initial data, API log collection, rejection sampling with WiNGPT-2.6 and a reward model, and human review. This approach aims to make the model highly adaptable to real-world translation scenarios like news, research, and live video subtitles.

Key Capabilities

  • Human-in-the-loop Training: Continuously improves performance through a closed-loop data collection and refinement process.
  • Multi-format Translation: Supports diverse text formats including web pages, social media, academic papers, and video subtitles.
  • High Accuracy & Performance: Leverages advanced LLM architecture for accurate, natural, and fluent translations, with a 1.5B parameter size optimized for real-time applications.
  • Multilingual Support: Currently supports over 20 languages, with ongoing expansion.
  • Tool Integration: Adapts to existing tools like Immersive Translate and VideoLingo for enhanced user experience.

Good For

  • Web Content Translation: Quickly understanding daily web browsing information.
  • Academic Paper Translation: Aiding comprehension of multilingual research papers.
  • News & Information Translation: Gaining rapid access to global news.
  • Video Subtitle Translation: Assisting in understanding foreign language videos.
  • Multilingual Dataset Processing: Initial translation for data analysis.

Limitations

While highly capable, WiNGPT-Babel has limitations in highly specialized domains (e.g., legal, medical, code), literary works (nuance, metaphor), and very long texts which may require segmentation. Its primary strength is currently in English-Chinese translation, with other languages requiring further validation.