guaran-ia/gntweets-lm

TEXT GENERATIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:16kPublished:Jun 10, 2026License:gpl-3.0Architecture:Transformer Open Weights Cold

The guaran-ia/gntweets-lm is a 9 billion parameter Gemma2-based causal language model developed by guaran-ia, fine-tuned on a publicly available corpus of Guarani and Jopara tweets. This model is specifically designed for computing the perplexity score of Guarani text, serving as a validation tool for text quality rather than a generative AI. It features a 16384-token context length and is optimized to identify text predictability and similarity to high-quality Guarani corpora.

Loading preview...

GNTweetsLM: Guarani Text Quality Validation Model

GNTweetsLM is a 9 billion parameter language model developed by guaran-ia, built upon the Gemma2ForCausalLM architecture. Unlike typical generative LLMs, its primary purpose is to validate the quality of Guarani text by computing perplexity scores.

Key Capabilities

  • Perplexity Computation: Designed to calculate the perplexity of Guarani documents, indicating text predictability and similarity to a high-quality reference corpus.
  • Guarani and Jopara Expertise: Fine-tuned on a specialized corpus of tweets in Guarani and Jopara (Góngora et al., 2021).
  • Full Fine-tuning: All model weights were updated during training, based on princeton-nlp/gemma-2-9b-it-SimPO.
  • Long Context Support: Features a maximum context length of 8192 tokens, with a provided method for perplexity calculation over longer texts using a sliding window approach.

Good For

  • Guarani Text Quality Assessment: Ideal for researchers and developers needing to programmatically evaluate the quality or naturalness of written Guarani.
  • Linguistic Research: Useful for studies involving the Guarani language, particularly in understanding text predictability and corpus characteristics.

Limitations

  • Not Generative: This model is explicitly not intended for text generation; its utility is confined to perplexity calculation.
  • Bias Reflection: May reflect biases present in its training corpus of tweets.