Josephgflowers/tinyllama-730M-test

TEXT GENERATIONConcurrency Cost:1Model Size:1.1BQuant:BF16Ctx Length:2kPublished:Feb 14, 2024License:mitArchitecture:Transformer Open Weights Cold

Josephgflowers/tinyllama-730M-test is a 1.1 billion parameter language model, a reduced 14-layer version of TinyLlama 1.1B cinder v2. This model is presented as a base model requiring significant further training, having undergone initial steps on step-by-step and Reason-with-cinder datasets. It currently exhibits low performance across various benchmarks, indicating its developmental stage rather than a ready-for-production state. Its primary purpose is to serve as a foundation for continued research and training within the TinyLlama community.

Loading preview...

Model Overview

Josephgflowers/tinyllama-730M-test is a 1.1 billion parameter language model, derived from a 22-layer TinyLlama 1.1B cinder v2 architecture, reduced to 14 layers. This model is explicitly described as a base model in an early developmental stage, requiring substantial additional training to achieve coherent text generation.

Training Details

Initial training involved 1000 steps on a step-by-step dataset and 6000 steps on a Reason-with-cinder dataset. At the time of release, the model's loss was still over 1, and the learning rate was above 4, indicating its nascent state.

Performance Metrics

Evaluated on the Open LLM Leaderboard, the model shows very low performance, reflecting its unfinished training status. Key scores include:

  • Avg.: 29.55
  • AI2 Reasoning Challenge (25-Shot): 25.09
  • HellaSwag (10-Shot): 33.82
  • MMLU (5-Shot): 24.43
  • TruthfulQA (0-shot): 42.90
  • Winogrande (5-shot): 51.07
  • GSM8k (5-shot): 0.00

Intended Use

This model is intended as a foundation for further experimentation and training by the community. Developers interested in contributing to its development are encouraged to engage with the TinyLlama Discord.