vikash06/mistral_v1

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Dec 23, 2023License:mitArchitecture:Transformer0.0K Open Weights Cold

vikash06/mistral_v1 is a 7 billion parameter language model, fine-tuned from Llama 2, developed by vikash06. This model was trained experimentally on a small dataset to evaluate performance with extended training on limited data. It is designed for various natural language tasks including creative writing, question answering (closed, open), summarization, information extraction, classification, and brainstorming.

Loading preview...

Model Overview

vikash06/mistral_v1 is a 7 billion parameter language model, fine-tuned from the Llama 2 architecture by vikash06. This model was developed as an experimental assessment to determine the performance implications of extended training on a comparatively smaller dataset. It aims to provide a versatile foundation for various natural language processing tasks.

Key Capabilities

  • Creative Writing: Generates open-ended, creative responses based on specific instructions and constraints.
  • Question Answering: Handles both closed-domain QA (based on provided text) and open-domain QA (using general world knowledge).
  • Text Summarization: Condenses paragraphs from source texts.
  • Information Extraction: Identifies and extracts specific information from passages.
  • Classification: Categorizes entities based on provided lists or examples.
  • Brainstorming: Generates multiple ideas in response to prompts.

Performance & Training

The model was fine-tuned using a dataset of 1000 carefully selected samples for each task category. Training involved 50 epochs with a batch size of 2 on A6000 48GB GPUs for 28 hours. Evaluation was performed using the EleutherAI lm-evaluation-harness, achieving an average score of 45.85 on the Open LLM Leaderboard, with specific scores including 67.58 for HellaSwag (10-Shot) and 48.68 for MMLU (5-Shot).

Good For

This model is suitable for developers and researchers interested in exploring the trade-offs and performance characteristics of models trained extensively on smaller, curated datasets. Its diverse task capabilities make it a candidate for applications requiring general-purpose text generation and understanding, particularly where resource constraints or specific domain focus might benefit from its experimental training approach.