BEE-spoke-data/Meta-Llama-3-8Bee

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kLicense:llama3Architecture:Transformer Warm

Meta-Llama-3-8Bee is an 8 billion parameter causal language model developed by BEE-spoke-data, fine-tuned from Meta-Llama-3-8B with an 8192 token context length. This model underwent continued pretraining on the `BEE-spoke-data/bees-internal` dataset, specializing in knowledge related to bees and apiary practices. It is intended for unveiling specialized knowledge in this domain, requiring further tuning for instruct-type applications.

Loading preview...

Meta-Llama-3-8Bee Overview

Meta-Llama-3-8Bee is an 8 billion parameter language model developed by BEE-spoke-data, built upon the meta-llama/Meta-Llama-3-8B architecture. This model has undergone continued pretraining on a specialized internal dataset, BEE-spoke-data/bees-internal, focusing specifically on information related to bees and apiary practices. It features a sequence length of 8192 tokens.

Key Characteristics & Training

  • Specialized Domain: The model's primary distinction is its fine-tuning on bee-related data, aiming to unveil knowledge within this specific domain.
  • Training Procedure: It was trained for 1 epoch with a learning rate of 2e-05, using an Adam optimizer and a cosine learning rate scheduler. Gradient accumulation steps were set to 8, with a micro batch size of 1.
  • Evaluation Loss: Achieved a validation loss of 2.3319 during training.

Intended Uses & Limitations

  • Good for: Unveiling knowledge about bees and apiary practices.
  • Limitations: The model currently requires further tuning to be effectively used in 'instruct' type settings, indicating it is not yet optimized for direct instruction following.

Open LLM Leaderboard Evaluation

On the Open LLM Leaderboard, Meta-Llama-3-8Bee achieved an average score of 14.49. Specific metric scores include:

  • IFEval (0-Shot): 19.51
  • BBH (3-Shot): 24.20
  • MMLU-PRO (5-shot): 24.66

These results suggest its current capabilities are foundational, with a clear specialization in its training data domain.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p