Name: TheBloke/Llama-2-70B-Chat-fp16 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: TheBloke

Overview

This model, TheBloke/Llama-2-70B-Chat-fp16, is a 69 billion parameter version of Meta's Llama 2 Chat model, provided in fp16 PyTorch format by TheBloke. It is specifically fine-tuned for dialogue use cases, aiming to provide helpful and safe assistant-like responses. The conversion process involved using the latest Transformers library to convert Meta's original PTH files to Hugging Face format, ensuring correct weight representation.

Key Capabilities

Dialogue Optimization: Fine-tuned using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety in chat scenarios.
Performance: Outperforms many open-source chat models on various benchmarks and is competitive with closed-source models like ChatGPT and PaLM in human evaluations for helpfulness and safety.
Scalability: The 70B parameter model incorporates Grouped-Query Attention (GQA) to enhance inference scalability.
Context Length: Supports a 4k context length, suitable for extended conversational turns.

Good For

Assistant-like Chat: Ideal for building conversational AI agents and chatbots.
Commercial and Research Applications: Intended for both commercial deployment and academic research in English-speaking contexts.
Further Conversions: The fp16 format serves as a base for further quantizations or model modifications.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)