occiglot/occiglot-7b-de-en

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 22, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Occiglot-7B-DE-EN is a 7 billion parameter generative language model developed by the Occiglot Research Collective, based on Mistral-7B-v0.1. It is specifically designed for German and English, having been continuously pre-trained on an additional 114 billion tokens of multilingual and code data with a 4096-token context length. This model serves as a general-purpose base model, excelling in foundational language understanding for its target languages, and is not instruction-tuned for chat or specific applications.

Loading preview...

Occiglot-7B-DE-EN: A Bilingual Foundation Model

Occiglot-7B-DE-EN is a 7 billion parameter generative language model developed by the Occiglot Research Collective. It is built upon the Mistral-7B-v0.1 architecture and has undergone extensive continued pre-training on an additional 114 billion tokens of multilingual and code data, with a block size of 8,192 tokens per sample. This model is a general-purpose base model focused on foundational language understanding in German and English.

Key Capabilities

  • Bilingual Proficiency: Optimized for strong performance in both German and English, with training data distribution of approximately 52% German, 34% English, and 13% Code.
  • Robust Foundation: Serves as a powerful base model for various downstream tasks, though it is not instruction-tuned for chat or specific applications. An instruction-tuned variant, occiglot-7b-de-en-instruct, is available for conversational use cases.
  • Code Understanding: Includes a significant portion of code data in its training, enhancing its capabilities for code-related tasks.

Good For

  • Research and Development: Ideal for researchers and developers looking to build custom applications or fine-tune models specifically for German and English language tasks.
  • Multilingual Applications: Suitable as a backbone for applications requiring strong performance in both German and English, such as translation, content generation, or analysis.
  • Continued Pre-training: Provides a solid starting point for further domain-specific pre-training or fine-tuning on specialized datasets.