EREN121232/FINSTROM-AI-V1.5

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 11, 2026Architecture:Transformer Warm

FINSTROM-AI-V1.5 by EREN121232 is a 1.5 billion parameter causal language model based on the Qwen2 architecture, featuring a 32768 token context length. It is provided with both Transformers weights and a GGUF build, making it suitable for local inference environments like Ollama, llama.cpp, and LM Studio. This model is designed for flexible deployment on local machines, offering a higher quality F16 GGUF build for robust performance.

Loading preview...

FINSTROM-AI-V1.5 Overview

FINSTROM-AI-V1.5 is a 1.5 billion parameter causal language model developed by EREN121232, built upon the Qwen2 architecture. It is specifically designed for efficient local inference, providing both standard Transformers weights (model.safetensors) and a GGUF build (finstrom-ai-v1.f16.gguf). This dual offering ensures compatibility with a wide range of local runtime environments, including Ollama, llama.cpp, and LM Studio.

Key Features & Capabilities

  • Architecture: Qwen2-style causal language model.
  • Parameter Count: 1.5 billion parameters.
  • Context Length: Supports a maximum context of 32768 tokens, though practical local deployment may use a reduced context for memory efficiency.
  • Local Inference Optimized: Provided with a GGUF build, facilitating easy integration and execution on personal hardware.
  • Quality: The GGUF file is in F16 format, offering higher quality inference compared to more heavily quantized builds like Q4/Q5, albeit with a larger file size.
  • Deployment Flexibility: Includes tokenizer.json, tokenizer_config.json, and chat_template.jinja for consistent tokenization and chat formatting across platforms.

Ideal Use Cases

  • Local Development: Excellent for developers needing to run a capable language model directly on their machines without cloud dependencies.
  • Experimentation: Suitable for experimenting with LLMs in environments like Ollama, llama.cpp, or LM Studio.
  • Applications Requiring Higher Fidelity: The F16 GGUF build is beneficial for applications where a balance between local performance and output quality is desired, provided sufficient local memory is available.