jylee420/gemma-2b-data-std

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2.5BQuant:BF16Ctx Length:8kPublished:Mar 11, 2024License:otherArchitecture:Transformer0.0K Warm

The jylee420/gemma-2b-data-std model is a 2.5 billion parameter base version of the Gemma architecture, developed by [email protected]. This model has undergone additional pre-training with 6 million tokens, focusing on Korean and English languages. It is designed for causal language modeling tasks, offering a context length of 8192 tokens.

Loading preview...

Overview

This model is an additional pre-trained version of the Gemma 2B base model, developed by [email protected]. It has been further trained with 6 million tokens, specifically incorporating Korean and English language data. The model is designed for causal language modeling and supports a context length of 8192 tokens.

Key Characteristics

  • Model Type: Gemma 2B base architecture.
  • Parameter Count: 2.5 billion parameters.
  • Context Length: 8192 tokens.
  • Training: Additional pre-training with 6 million tokens.
  • Language Support: Fine-tuned with Korean and English data.

Usage

The model can be loaded using AutoModelForCausalLM.from_pretrained with specified vocab_size, torch_dtype, and device_map. While specific direct and downstream use cases are not detailed in the provided information, its base Gemma architecture and additional multilingual pre-training suggest applicability in various language generation and understanding tasks, particularly in Korean and English contexts.