m-polignano/ANITA-NEXT-24B-Magistral-2506-VISION-ITA
Overview
ANITA-NEXT-24B-Magistral-2506-VISION-ITA Overview
This model is a Thinking Vision Language Model developed by Ph.D. Marco Polignano and the SWAP Research Group, part of the ANITA Large Language Models family. It is a merge of textual layers from m-polignano/ANITA-NEXT-24B-Magistral-2506-ITA and vision layers/processor from mistralai/Mistral-Small-3.1-24B-Instruct-2503.
Key Capabilities & Features
- Multilingual Vision Language Model: Supports both English and Italian, with a specific aim for fine-tuning on Italian tasks.
- Architecture: Based on the Mistral architecture, offering a context length of 128k, though performance degrades after 40k tokens.
- Training: Utilizes Supervised Fine-Tuning (SFT) with QLoRA 4-bit and DPO (Direct Preference Optimization) for alignment with human preferences.
- Input/Output: Processes text and image inputs to generate text and code outputs.
- Resource Efficiency: Can run on a single GPU with 19.56GB VRAM using 4-bit quantization.
Ideal Use Cases
- Italian NLP Research: Specifically designed to provide an improved model for Italian language use cases.
- Multimodal Applications: Suitable for tasks requiring both visual and textual understanding, particularly in an Italian context.
- Further Fine-tuning: Serves as a strong base for specialized fine-tuning on various Italian-specific tasks.