occiglot/occiglot-7b-eu5

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 16, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Occiglot-7B-EU5 is a 7 billion parameter generative language model developed by the Occiglot Research Collective. Based on Mistral-7B-v0.1, it was continually pre-trained on 293 billion tokens of additional multilingual and code data, with a block size of 8,192 tokens. This model is specifically designed to support the top-5 EU languages: English, Spanish, French, German, and Italian, alongside code, making it a general-purpose base model for Western European multilingual applications.

Loading preview...

Occiglot-7B-EU5: A Multilingual Base Model for Western Europe

Occiglot-7B-EU5 is a 7 billion parameter generative language model developed by the Occiglot Research Collective. It is built upon the Mistral-7B-v0.1 architecture and has undergone extensive continued pre-training on an additional 293 billion tokens of multilingual and code data. This model is specifically optimized for the top-5 European Union languages: English, Spanish, French, German, and Italian, in addition to supporting code-related tasks.

Key Capabilities

  • Multilingual Proficiency: Designed to handle English, Spanish, French, German, and Italian, making it suitable for applications requiring understanding and generation across these languages.
  • Code Support: Includes training on code data, enhancing its utility for developers and technical applications.
  • Base Model: Serves as a general-purpose base model, providing a strong foundation for further fine-tuning or specific applications. An instruction-tuned variant, occiglot-7b-eu5-instruct, is also available for chat and other instruction-following tasks.
  • Continued Pre-training: Benefits from significant additional training data (293B tokens) beyond its Mistral-7B-v0.1 base, focusing on Western European languages and code.

When to Use This Model

  • Multilingual Applications: Ideal for use cases requiring robust language understanding and generation in English, Spanish, French, German, and Italian.
  • Foundation for Fine-tuning: As a base model, it's an excellent starting point for developers looking to fine-tune a model for specific tasks or domains within its supported languages.
  • Research and Development: Suitable for researchers exploring multilingual LLMs, especially those focused on Western European languages.