Name: occiglot/occiglot-7b-de-en API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: occiglot

Occiglot-7B-DE-EN: A Bilingual Foundation Model

Occiglot-7B-DE-EN is a 7 billion parameter generative language model developed by the Occiglot Research Collective. It is built upon the Mistral-7B-v0.1 architecture and has undergone extensive continued pre-training on an additional 114 billion tokens of multilingual and code data, with a block size of 8,192 tokens per sample. This model is a general-purpose base model focused on foundational language understanding in German and English.

Key Capabilities

Bilingual Proficiency: Optimized for strong performance in both German and English, with training data distribution of approximately 52% German, 34% English, and 13% Code.
Robust Foundation: Serves as a powerful base model for various downstream tasks, though it is not instruction-tuned for chat or specific applications. An instruction-tuned variant, occiglot-7b-de-en-instruct, is available for conversational use cases.
Code Understanding: Includes a significant portion of code data in its training, enhancing its capabilities for code-related tasks.

Good For

Research and Development: Ideal for researchers and developers looking to build custom applications or fine-tune models specifically for German and English language tasks.
Multilingual Applications: Suitable as a backbone for applications requiring strong performance in both German and English, such as translation, content generation, or analysis.
Continued Pre-training: Provides a solid starting point for further domain-specific pre-training or fine-tuning on specialized datasets.

Overview

Occiglot-7B-DE-EN: A Bilingual Foundation Model

Key Capabilities

Good For

Full Model Card (README)