daisd-ai/anydef-orpo

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 19, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The daisd-ai/anydef-orpo is a 7 billion parameter language model fine-tuned from mistralai/Mistral-7B-v0.1. This model is specifically optimized using the ORPO method on the daisd-ai/anydef-kilt-tasks dataset, making it suitable for tasks related to definition extraction and knowledge-intensive language processing. It leverages a 4096-token context length for processing relevant information.

Loading preview...

Model Overview

The daisd-ai/anydef-orpo is a 7 billion parameter language model, fine-tuned from the mistralai/Mistral-7B-v0.1 base architecture. This model has been specifically optimized using the ORPO (Odds Ratio Preference Optimization) training method.

Key Capabilities

  • Definition Extraction: The model is fine-tuned on the daisd-ai/anydef-kilt-tasks dataset, indicating a specialization in tasks involving the extraction and understanding of definitions.
  • Knowledge-Intensive Language Processing: Its training on a KILT-based dataset suggests proficiency in tasks that require accessing and processing factual knowledge.
  • Mistral-7B Foundation: Benefits from the strong base capabilities of the Mistral-7B model, including efficient inference and good general language understanding.

Training Details

The model was trained with a learning rate of 5e-06 over 3 epochs, utilizing an Adam optimizer and an inverse square root learning rate scheduler with 100 warmup steps. The training involved a total batch size of 64 across 8 GPUs.

Good For

  • Applications requiring precise definition extraction from text.
  • Research and development in knowledge-intensive NLP tasks.
  • Use cases where a specialized, fine-tuned 7B model offers advantages over general-purpose LLMs for specific knowledge-based queries. Further details on intended uses and limitations are available on the daisd-ai GitHub repository.