Overview
TristanBehrens/heilbronnpodcasts is a 7 billion parameter language model, fine-tuned from the jphme/em_german_7b_v01 base model. It utilizes a LlamaForCausalLM architecture and was trained with a sequence length of 4096 tokens. The fine-tuning process involved using the TristanBehrens/HeilbronnPodcastsWindowed dataset, indicating a specialization towards content related to Heilbronn podcasts.
Training Details
The model was fine-tuned using the Axolotl framework, employing LoRA (Low-Rank Adaptation) with a rank of 32 and an alpha of 16. Key training hyperparameters include a learning rate of 0.0002, a micro batch size of 16, and 4 epochs. The optimizer used was adamw_bnb_8bit with a cosine learning rate scheduler and 10 warmup steps. Gradient accumulation steps were set to 4, resulting in a total train batch size of 128. The training was performed on a multi-GPU setup with 2 devices.
Key Characteristics
- Base Model: jphme/em_german_7b_v01 (Llama-based)
- Parameter Count: 7 billion
- Context Length: 4096 tokens
- Fine-tuning Method: LoRA with specific parameters (r=32, alpha=16)
- Language Focus: German, with specialized training data related to Heilbronn podcasts.
Potential Use Cases
This model is particularly suited for applications requiring German language understanding and generation, especially within the context of podcast content or regional information related to Heilbronn. Its fine-tuning on specific data suggests potential for tasks like:
- Summarization of German podcast transcripts.
- Generating German text in a style consistent with podcast discussions.
- Answering questions based on German audio content or transcripts.