Saurabh16100/MedLLM-1-1-New

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kArchitecture:Transformer Cold

Saurabh16100/MedLLM-1-1-New is a language model fine-tuned from h2oai/h2ogpt-4096-llama2-7b-chat using H2O LLM Studio. This model utilizes a LlamaForCausalLM architecture with 4096 embedding dimensions and 32 LlamaDecoderLayers, making it suitable for general text generation tasks. It is designed for deployment on GPUs and supports quantization for efficient inference.

Loading preview...

Overview

Saurabh16100/MedLLM-1-1-New is a language model developed by Saurabh16100, fine-tuned from the h2oai/h2ogpt-4096-llama2-7b-chat base model. The training process leveraged H2O LLM Studio, a platform for training large language models. This model is built upon a LlamaForCausalLM architecture, featuring 32 LlamaDecoderLayers and an embedding size of 4096.

Key Capabilities

  • Text Generation: Capable of generating coherent and contextually relevant text based on given prompts.
  • GPU Deployment: Optimized for deployment on GPUs, with support for torch_dtype="auto" and device_map configurations.
  • Quantization Support: Allows for loading in 8-bit or 4-bit quantization (load_in_8bit=True or load_in_4bit=True) to reduce memory footprint and potentially speed up inference.
  • Sharding: Supports sharding across multiple GPUs by setting device_map="auto".

Usage Considerations

  • Prompt Formatting: Users must ensure prompts adhere to the specific format the model was trained on (e.g., <|prompt|>Your question here?</s><|answer|>).
  • Ethical Use: As with all large language models, users are advised to be aware of potential biases and limitations, and to use the model responsibly and ethically. The model may produce incorrect or inappropriate content, and users assume full responsibility for its output.