m-polignano/ANITA-NEXT-24B-Magistral-2506-VISION-ITA

Warm
Public
Vision
24B
FP8
32768
License: apache-2.0
Hugging Face
Overview

ANITA-NEXT-24B-Magistral-2506-VISION-ITA Overview

This model is a Thinking Vision Language Model developed by Ph.D. Marco Polignano and the SWAP Research Group, part of the ANITA Large Language Models family. It is a merge of textual layers from m-polignano/ANITA-NEXT-24B-Magistral-2506-ITA and vision layers/processor from mistralai/Mistral-Small-3.1-24B-Instruct-2503.

Key Capabilities & Features

  • Multilingual Vision Language Model: Supports both English and Italian, with a specific aim for fine-tuning on Italian tasks.
  • Architecture: Based on the Mistral architecture, offering a context length of 128k, though performance degrades after 40k tokens.
  • Training: Utilizes Supervised Fine-Tuning (SFT) with QLoRA 4-bit and DPO (Direct Preference Optimization) for alignment with human preferences.
  • Input/Output: Processes text and image inputs to generate text and code outputs.
  • Resource Efficiency: Can run on a single GPU with 19.56GB VRAM using 4-bit quantization.

Ideal Use Cases

  • Italian NLP Research: Specifically designed to provide an improved model for Italian language use cases.
  • Multimodal Applications: Suitable for tasks requiring both visual and textual understanding, particularly in an Italian context.
  • Further Fine-tuning: Serves as a strong base for specialized fine-tuning on various Italian-specific tasks.