Name: kmseong/llama3.1_8b_base-SSFT-start-WaRP-original-space-gsm8k-FT-lr3e-5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kmseong

Model Overview

The kmseong/llama3.1_8b_base-SSFT-start-WaRP-original-space-gsm8k-FT-lr3e-5 is an 8 billion parameter language model built upon the Llama 3.1 architecture, featuring a substantial 32768 token context window. This model has undergone specific modifications and training to enhance its capabilities, particularly in mathematical reasoning.

Key Technical Details

Architecture: Llama 3.1 base model.
Parameter Count: 8 billion parameters.
Context Length: 32768 tokens.
Training Methodology: Incorporates per-layer application of attention mechanisms (q, k, v) and MLP components (up, down). This is followed by a non-freeze training approach, indicating that all layers were updated during the fine-tuning process.
Fine-tuning Focus: Specifically fine-tuned on the GSM8K dataset, which is designed for grade school math word problems.

Intended Use Cases

This model is particularly well-suited for applications requiring:

Mathematical Reasoning: Excels at solving arithmetic and word problems, as indicated by its fine-tuning on GSM8K.
Numerical Problem Solving: Can be applied to tasks that involve logical deduction and calculation based on numerical inputs.
Research in Safety Alignment: The model's name suggests an underlying connection to the "Weight space Rotation Process" (WaRP) for safety alignment, as referenced in the provided citation, making it relevant for research in this area.

Overview

Model Overview

Key Technical Details

Intended Use Cases

Full Model Card (README)