laion/glm46-qasper-maxeps-131k

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer Open Weights Cold

The laion/glm46-qasper-maxeps-131k model is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. It was specifically trained on the penfever/glm46-qasper-maxeps-131k dataset, suggesting an optimization for tasks related to question answering over scientific papers (QASPER). This model is likely intended for information extraction and comprehension within academic or technical documents, leveraging its 32768 token context length for processing longer texts.

Loading preview...

Model Overview

laion/glm46-qasper-maxeps-131k is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B base architecture. This model has been specialized through training on the penfever/glm46-qasper-maxeps-131k dataset, indicating a focus on tasks related to the QASPER (Question Answering from Scientific Papers) domain.

Key Training Details

The model underwent 7.0 epochs of training with a learning rate of 4e-05. It utilized an AdamW optimizer with specific beta and epsilon parameters, and a cosine learning rate scheduler with a 0.1 warmup ratio. Training was distributed across 8 GPUs with a total effective batch size of 16, achieved through gradient accumulation steps of 2.

Potential Use Cases

Given its fine-tuning on a QASPER-related dataset, this model is likely well-suited for:

  • Question Answering: Extracting answers from scientific articles or technical documents.
  • Information Retrieval: Identifying key information within complex texts.
  • Document Comprehension: Aiding in understanding the content of research papers.

Limitations

The model card indicates that more information is needed regarding its specific intended uses, limitations, and detailed training/evaluation data. Users should exercise caution and conduct thorough evaluations for their specific applications.