TheBloke/robin-13B-v2-fp16

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jun 16, 2023License:otherArchitecture:Transformer0.0K Cold

TheBloke/robin-13B-v2-fp16 is a 13 billion parameter language model, based on OptimalScale's Robin 13B v2 architecture. This float16 PyTorch model is an unquantized version suitable for GPU inference and further conversions. It is designed to provide helpful, detailed, and polite responses, making it suitable for general conversational AI applications.

Loading preview...

OptimalScale's Robin 13B v2 fp16

This model is a 13 billion parameter language model, specifically the fp16 (float16) PyTorch version of OptimalScale's Robin 13B v2. It is provided by TheBloke, who converted and/or merged the original source repository to this format. This unquantized version is primarily intended for direct GPU inference and serves as a base for further model conversions.

Key Characteristics

  • Architecture: Based on OptimalScale's Robin 13B v2.
  • Parameter Count: 13 billion parameters.
  • Format: Unquantized fp16 PyTorch format.
  • Context Length: Supports a context length of 4096 tokens.
  • Prompt Template: Utilizes a chat-based prompt format, expecting ###Human: and ###Assistant: turns.

Use Cases

  • General Conversational AI: Designed to provide helpful, detailed, and polite answers to user prompts.
  • GPU Inference: Suitable for direct deployment on GPUs due to its fp16 format.
  • Model Conversion Base: Can be used as a foundational model for creating other quantized versions (e.g., GPTQ, GGML) for different hardware or performance requirements.