NovoCode/Novocode7b-v2

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 23, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

NovoCode/Novocode7b-v2 is a 7 billion parameter causal language model based on the Mistral architecture, developed by NovoCode. Trained from scratch on the /leet10k-alpaca dataset, it features a 4096-token context length and is optimized for general language understanding and generation tasks. This model demonstrates competitive performance across various benchmarks, including MMLU and HellaSwag, making it suitable for a range of applications requiring robust language capabilities.

Loading preview...

NovoCode/Novocode7b-v2 Overview

NovoCode/Novocode7b-v2 is a 7 billion parameter causal language model built upon the Mistral architecture. It was trained from scratch using the /leet10k-alpaca dataset, focusing on general language understanding and generation. The model utilizes a 4096-token context length and was trained with specific hyperparameters including a learning rate of 5e-06 and a batch size of 8 (with gradient accumulation).

Key Capabilities & Performance

This model demonstrates solid performance across several benchmarks, as evaluated on the Open LLM Leaderboard:

  • Average Score: 56.57
  • MMLU (5-Shot): 64.05
  • HellaSwag (10-Shot): 84.12
  • AI2 Reasoning Challenge (25-Shot): 61.01
  • Winogrande (5-shot): 79.87

While showing strong results in reasoning and common sense, its performance on mathematical tasks like GSM8k (8.19) indicates areas for further specialization. The training process involved 1 epoch with a cosine learning rate scheduler and flash attention enabled for efficiency.

Intended Uses

NovoCode/Novocode7b-v2 is suitable for a variety of natural language processing tasks, including:

  • General text generation and completion
  • Question answering
  • Summarization
  • Reasoning tasks where it shows good performance on benchmarks like MMLU and ARC.

It provides a capable base model for developers looking for a 7B parameter solution with a Mistral-derived architecture.