choiqs/Qwen3-1.7B-tldr-bsz128-ts300-regular-skywork8b-seed42-lr1e-6-warmup10-checkpoint150

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 8, 2026Architecture:Transformer Cold

The choiqs/Qwen3-1.7B-tldr-bsz128-ts300-regular-skywork8b-seed42-lr1e-6-warmup10-checkpoint150 is a 1.7 billion parameter language model based on the Qwen3 architecture, developed by choiqs. This model is fine-tuned with a batch size of 128 and a sequence length of 300, utilizing a regularized Skywork8B dataset. It is designed for general language understanding and generation tasks, offering a 32768 token context length for processing longer inputs.

Loading preview...

Model Overview

This model, choiqs/Qwen3-1.7B-tldr-bsz128-ts300-regular-skywork8b-seed42-lr1e-6-warmup10-checkpoint150, is a 1.7 billion parameter language model built upon the Qwen3 architecture. Developed by choiqs, it has been fine-tuned with specific training parameters including a batch size of 128, a sequence length of 300, and incorporates data from a regularized Skywork8B dataset. The model is configured with a substantial context length of 32768 tokens, enabling it to handle and process extensive textual inputs.

Key Characteristics

  • Architecture: Qwen3-based, indicating a robust foundation for language tasks.
  • Parameter Count: 1.7 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Features a 32768-token context window, suitable for applications requiring understanding of long documents or conversations.
  • Training Details: Fine-tuned with a batch size of 128 and a sequence length of 300, suggesting optimization for specific data processing patterns.

Potential Use Cases

Given its architecture and context length, this model is well-suited for:

  • General Text Generation: Creating coherent and contextually relevant text.
  • Long-form Content Understanding: Processing and summarizing lengthy articles, reports, or dialogues.
  • Conversational AI: Maintaining context over extended conversations due to its large context window.
  • Research and Development: Serving as a base model for further fine-tuning on specialized datasets.