winglian/basilisk-4b

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Sep 21, 2023Architecture:Transformer0.0K Cold

Basilisk 4B is a 4 billion parameter Llama-2 model developed by winglian, fine-tuned with OpenOrca CoT data. This model is designed for general language understanding and generation, demonstrating capabilities across various reasoning and common-sense tasks. Its fine-tuning on Chain-of-Thought data suggests an optimization for tasks requiring multi-step reasoning and logical deduction.

Loading preview...

Basilisk 4B: A Llama-2 Model Fine-tuned for Reasoning

Basilisk 4B is a 4 billion parameter language model built upon the winglian/llama-2-4b base. It has been fine-tuned using the OpenOrca Chain-of-Thought (CoT) dataset, which aims to enhance its reasoning and problem-solving capabilities.

Key Capabilities

  • Reasoning Tasks: The model shows performance across various reasoning benchmarks, including AGIEval (e.g., LSAT, SAT math) and BigBench (e.g., logical deduction, causal judgment).
  • Common Sense Understanding: It demonstrates abilities in tasks like BoolQ, PIQA, and Winogrande, indicating a foundational understanding of common sense.
  • General Language Understanding: The model performs on ARC Challenge and ARC Easy, suggesting general question-answering and comprehension skills.

Performance Highlights

Evaluations on various benchmarks indicate its performance in different domains:

  • AGIEval: Achieves 0.2362 acc on AQUA-RAT, 0.2688 acc on LogiQA-en, and 0.2318 acc on SAT-math.
  • BigBench: Scores 0.5000 multiple_choice_grade on Causal Judgement and Navigate, and 0.3800 on Logical Deduction Three Objects.
  • Common Sense: Attains 0.7196 acc on BoolQ and 0.6937 acc on PIQA.

Good For

This model is suitable for applications requiring a compact Llama-2 based model with enhanced reasoning abilities, particularly for tasks benefiting from Chain-of-Thought fine-tuning. Its performance on various academic and common-sense benchmarks makes it a candidate for general-purpose language understanding and generation where a smaller parameter count is advantageous.