Warm
Public
1.1B
BF16
2048
License: mit
Hugging Face
Overview

Nape-0: An Early-Stage Small Language Model

Nnpy/Nape-0 is a 1.1 billion parameter model from the Nape series, developed by nnpy. This model is presented as an early preview, still undergoing its training process, with the goal of demonstrating significant capabilities despite its compact size. It was trained for 3 epochs over 1 day using 4x A6000 GPUs with native DeepSpeed.

Key Characteristics

  • Model Size: 1.1 billion parameters, making it suitable for resource-constrained environments or applications requiring faster inference.
  • Architecture: Based on the Llama architecture, providing a familiar and robust foundation for language tasks.
  • Context Length: Supports a context window of 2048 tokens.
  • Training Status: Currently in an early training phase, indicating potential for future improvements and specialized versions.

Performance Snapshot

Initial evaluations on the Open LLM Leaderboard show an average score of 30.93. Specific metric scores include:

  • ARC (25-shot): 32.68
  • HellaSwag (10-shot): 58.68
  • MMLU (5-shot): 24.88
  • TruthfulQA (0-shot): 38.99
  • Winogrande (5-shot): 57.3
  • GSM8K (5-shot): 0.08
  • DROP (3-shot): 3.89

Intended Use Cases

Given its early stage and small parameter count, Nape-0 is suitable for:

  • Research and Development: Exploring the capabilities of compact LLMs.
  • Prototyping: Quickly setting up and testing language-based applications.
  • Foundation Model: Serving as a base for further fine-tuning on specific downstream tasks where a smaller model is advantageous.