Name: TheBloke/Llama-2-70B-fp16 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: TheBloke

Overview

This repository hosts TheBloke's fp16 conversion of Meta's Llama 2 70B model, a large language model with 69 billion parameters. It was created by converting the original PTH files from Meta using the latest Hugging Face Transformers library, ensuring compatibility and proper weight handling. The model is provided in Safetensors format, making it ready for GPU inference and serving as a base for further conversions or fine-tuning.

Key Capabilities

Large Scale: A 70 billion parameter model, part of the Llama 2 family developed by Meta.
Optimized Architecture: Utilizes an optimized transformer architecture, with the 70B variant specifically incorporating Grouped-Query Attention (GQA) for enhanced inference scalability.
Pretrained Foundation: This is a pretrained model, suitable for adaptation to various natural language generation tasks.
Commercial Use: Licensed for both commercial and research applications.

Good for

GPU Inference: Directly usable for inference on GPUs due to its fp16 format.
Further Conversions: Serves as a reliable base for creating other model formats, such as GPTQ quantizations.
Natural Language Generation: Intended for a broad spectrum of text generation tasks in English.
Research and Development: A robust foundation for researchers and developers exploring large language models.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)