Name: guaran-ia/gntweets-lm API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: guaran-ia

GNTweetsLM: Guarani Text Quality Validation Model

GNTweetsLM is a 9 billion parameter language model developed by guaran-ia, built upon the Gemma2ForCausalLM architecture. Unlike typical generative LLMs, its primary purpose is to validate the quality of Guarani text by computing perplexity scores.

Key Capabilities

Perplexity Computation: Designed to calculate the perplexity of Guarani documents, indicating text predictability and similarity to a high-quality reference corpus.
Guarani and Jopara Expertise: Fine-tuned on a specialized corpus of tweets in Guarani and Jopara (Góngora et al., 2021).
Full Fine-tuning: All model weights were updated during training, based on princeton-nlp/gemma-2-9b-it-SimPO.
Long Context Support: Features a maximum context length of 8192 tokens, with a provided method for perplexity calculation over longer texts using a sliding window approach.

Good For

Guarani Text Quality Assessment: Ideal for researchers and developers needing to programmatically evaluate the quality or naturalness of written Guarani.
Linguistic Research: Useful for studies involving the Guarani language, particularly in understanding text predictability and corpus characteristics.

Limitations

Not Generative: This model is explicitly not intended for text generation; its utility is confined to perplexity calculation.
Bias Reflection: May reflect biases present in its training corpus of tweets.

Overview

GNTweetsLM: Guarani Text Quality Validation Model

Key Capabilities

Good For

Limitations

Full Model Card (README)