withmartian/trained_mediqa_model
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kLicense:llama3.2Architecture:Transformer Warm
The withmartian/trained_mediqa_model is a 1 billion parameter instruction-tuned causal language model, fine-tuned from meta-llama/Llama-3.2-1B-Instruct. This model is optimized for general language understanding and generation tasks, leveraging a 32768 token context length. It is designed for applications requiring a compact yet capable language model.
Loading preview...
Model Overview
The withmartian/trained_mediqa_model is a 1 billion parameter language model, fine-tuned from the meta-llama/Llama-3.2-1B-Instruct architecture. It features a substantial context window of 32768 tokens, making it suitable for processing longer inputs and generating coherent, extended responses.
Key Characteristics
- Base Model: Fine-tuned from
meta-llama/Llama-3.2-1B-Instruct. - Parameter Count: 1 billion parameters, offering a balance between performance and computational efficiency.
- Context Length: Supports a 32768 token context, enabling handling of extensive textual information.
- Training: The model was trained with a learning rate of 3e-05, using
adamw_torchoptimizer, and a linear learning rate scheduler over 10 epochs. Mixed-precision training (Native AMP) was utilized.
Potential Use Cases
- General Text Generation: Suitable for various text generation tasks where a compact model with a large context is beneficial.
- Instruction Following: Inherits instruction-following capabilities from its base model, making it adaptable to diverse prompts.
- Research and Development: Can serve as a foundation for further fine-tuning on specific domain datasets due to its manageable size and robust base.