trl-lib/qwen1.5-1.8b-sft

Warm
Public
1.8B
BF16
32768
Mar 13, 2024
License: other
Hugging Face
Overview

Model Overview

The trl-lib/qwen1.5-1.8b-sft is a 1.8 billion parameter language model, derived from the Qwen1.5-1.8B base architecture. This model has undergone supervised fine-tuning (SFT) using the HuggingFaceH4/deita-6k-v0-sft dataset, aiming to enhance its instruction-following capabilities.

Key Training Details

  • Base Model: Qwen/Qwen1.5-1.8B
  • Fine-tuning Dataset: HuggingFaceH4/deita-6k-v0-sft
  • Training Hyperparameters:
    • Learning Rate: 2e-05
    • Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
    • LR Scheduler: Cosine with 0.1 warmup ratio
    • Epochs: 3
  • Evaluation Loss: Achieved 1.0886 on the evaluation set after 3 epochs.

Potential Use Cases

Given its fine-tuned nature and relatively small parameter count, this model is suitable for:

  • Instruction-following tasks: Where the model needs to generate responses based on specific prompts.
  • Resource-constrained environments: Its 1.8B parameters make it efficient for deployment on devices with limited computational resources.
  • Further experimentation: Can serve as a base for additional fine-tuning on more specialized datasets.