CharlesLi/llama_2_sky_safe_o1_llama_3_8B_default_1000_1000_full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 13, 2025License:llama2Architecture:Transformer Open Weights Cold

The CharlesLi/llama_2_sky_safe_o1_llama_3_8B_default_1000_1000_full model is a fine-tuned version of Meta's Llama-2-7b-chat-hf, a 7 billion parameter causal language model. It was fine-tuned on a generator dataset, achieving a loss of 0.6556 on the evaluation set. This model is optimized for generative tasks, leveraging its Llama 2 base for conversational and text generation applications.

Loading preview...

Model Overview

This model, llama_2_sky_safe_o1_llama_3_8B_default_1000_1000_full, is a fine-tuned iteration of the Meta Llama-2-7b-chat-hf base model. It features 7 billion parameters and was specifically trained on a generator dataset, indicating its primary utility in text generation tasks. During its single epoch of training, it achieved a loss of 0.6556 on the evaluation set.

Training Details

The fine-tuning process utilized a learning rate of 2e-05, a train_batch_size of 4, and gradient_accumulation_steps of 2, resulting in a total_train_batch_size of 32. The optimizer used was Adam with default betas and epsilon, and a cosine learning rate scheduler with a warmup ratio of 0.1 was employed. The training was conducted across 4 GPUs.

Key Characteristics

  • Base Model: Meta Llama-2-7b-chat-hf (7B parameters)
  • Fine-tuning Objective: Generator dataset, suggesting a focus on text generation.
  • Performance Metric: Achieved a loss of 0.6556 on the evaluation set.

Intended Use Cases

While specific intended uses are not detailed in the original model card, its fine-tuning on a generator dataset implies suitability for applications requiring text generation, conversational AI, or other creative writing tasks based on the Llama 2 architecture.