XXsongLALA/Qwen-2.5-7B-base-RAG-RL

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Feb 28, 2025Architecture:Transformer0.0K Warm

The XXsongLALA/Qwen-2.5-7B-base-RAG-RL is a 7.6 billion parameter base model from the Qwen 2.5 family, featuring a substantial 131,072 token context length. This model was trained from scratch, though specific dataset details are not provided. It is designed as a foundational language model, suitable for further fine-tuning or applications requiring a large context window.

Loading preview...

Overview

The XXsongLALA/Qwen-2.5-7B-base-RAG-RL is a 7.6 billion parameter base model built upon the Qwen 2.5 architecture. It boasts a significant context window of 131,072 tokens, making it suitable for tasks requiring extensive contextual understanding. The model was trained from scratch, though details regarding the specific training dataset are not available in the provided information.

Key Training Details

While specific dataset information is not provided, the training procedure utilized the following hyperparameters:

  • Learning Rate: 5e-05
  • Batch Sizes: 8 (for both training and evaluation)
  • Optimizer: AdamW with default betas and epsilon
  • LR Scheduler: Linear
  • Epochs: 3.0

Framework Versions

The model's training environment included:

  • Transformers 4.46.3
  • Pytorch 2.5.1+cu124
  • Datasets 2.19.0
  • Tokenizers 0.20.3

Intended Use Cases

Given its base model nature and large context window, this model is well-suited for:

  • Foundation for Fine-tuning: Serving as a robust base for domain-specific or task-specific fine-tuning.
  • Long-Context Applications: Tasks that benefit from processing and understanding very long inputs, such as document analysis, summarization of extensive texts, or complex question-answering over large corpora.

Further information regarding specific intended uses, limitations, and detailed evaluation data is not provided in the current model description.