semantixai/Lloro

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 18, 2024License:llama2Architecture:Transformer0.0K Open Weights Cold

Lloro is a 7 billion parameter language model developed by Semantix Research Labs, fine-tuned from CodeLlama-7b-Instruct-hf. Optimized for Portuguese data analysis, it generates Python code from text inputs. The model excels in Portuguese contexts, offering specialized capabilities for data-related tasks.

Loading preview...

Lloro 7B: Portuguese Data Analysis Code Generation

Lloro is a 7 billion parameter language model developed by Semantix Research Labs, specifically fine-tuned for Portuguese data analysis in Python. It is built upon codellama/CodeLlama-7b-Instruct-hf and was trained using the QLoRA methodology on synthetic datasets.

Key Capabilities

  • Portuguese Data Analysis: Designed to understand and process data analysis requests in Portuguese.
  • Code Generation: Generates Python code as output from natural language text inputs.
  • Multilingual Understanding: Primarily focused on Portuguese but capable of understanding English.
  • Optimized Performance: Achieves strong performance metrics, with the fine-tuned version (Instruct -FT) showing significant improvements over the base and GPT-3.5 in Code Bleu Score, Rouge-L, and CodeBert metrics.

Training and Features

Lloro was trained between February and April 2024, utilizing 74,222 synthetic instruction/code pairs. The model's context length was increased to 2048 tokens in its V3 release. A related model, Lloro SQL, is also available for Text-to-SQL tasks.

Good for

  • Developers and data scientists working on data analysis projects requiring Python code generation in Portuguese.
  • Applications needing to translate natural language Portuguese queries into executable Python scripts for data manipulation and analysis.