Name: ftajwar/paprika_Meta-Llama-3.1-8B-Instruct API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ftajwar

Model Overview

This model, ftajwar/paprika_Meta-Llama-3.1-8B-Instruct, is a fine-tuned version of the meta-llama/Meta-Llama-3.1-8B-Instruct base model. It was developed by Fahim Tajwar and his team as part of their research on "Training a Generally Curious Agent," detailed in their paper. The core innovation is the PAPRIKA finetuning framework, designed to instill strategic exploration capabilities in large language models.

Key Capabilities & Training

Strategic Exploration: The model is specifically trained to exhibit curious and exploratory behavior, making it suitable for tasks requiring agents to navigate and learn in complex environments.
Finetuning Framework: Utilizes the PAPRIKA framework, which involves both supervised fine-tuning (SFT) and preference fine-tuning using the RPO objective.
Training Data: Trained on custom datasets for SFT (SFT dataset) and preference learning (Preference learning dataset).
Hyperparameters: SFT used AdamW optimizer with a learning rate of 1e-6, batch size 32, and cosine annealing over 17,181 trajectories. Preference fine-tuning used RPO with AdamW, learning rate 2e-7, batch size 32, and cosine annealing over 5260 trajectory pairs.
Hardware: Fine-tuned using 8 NVIDIA L40S GPUs.

Use Cases

This model is particularly well-suited for research and applications focused on:

Developing AI agents with enhanced exploratory capabilities.
Tasks requiring strategic decision-making and learning in dynamic environments.
Further research into curiosity-driven learning and reinforcement learning from preferences in LLMs.

Overview

Model Overview

Key Capabilities & Training

Use Cases

Full Model Card (README)