christinakopi/thinkprm-full-trl

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 15, 2026Architecture:Transformer Cold

christinakopi/thinkprm-full-trl is a 1.5 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. Developed by christinakopi, this model was trained using the TRL library, focusing on specific instruction following. It is designed for text generation tasks, leveraging its fine-tuned capabilities for conversational responses.

Loading preview...

Overview

christinakopi/thinkprm-full-trl is a 1.5 billion parameter language model built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B architecture. This model has undergone specific fine-tuning using the Hugging Face TRL (Transformers Reinforcement Learning) library, indicating an optimization for instruction-following and response generation.

Key Capabilities

  • Instruction-tuned text generation: Optimized for generating responses based on user prompts, as demonstrated by its quick start example.
  • Leverages DeepSeek-R1-Distill-Qwen-1.5B base: Benefits from the foundational capabilities of its base model, which is a Qwen-based architecture.
  • TRL-trained: Utilizes advanced training techniques from the TRL library for potentially improved conversational quality or specific task performance.

Training Details

The model was trained using Supervised Fine-Tuning (SFT) methods. The training process was tracked and can be visualized via Weights & Biases, providing transparency into its development. It was developed using TRL 1.0.0, Transformers 5.5.0, Pytorch 2.5.1+cu118, Datasets 4.8.4, and Tokenizers 0.22.2.

Use Cases

This model is suitable for various text generation tasks where a compact yet capable instruction-following model is required. Its fine-tuned nature suggests it can be effectively used for question-answering, conversational AI, and generating creative or informative text based on prompts.