Name: M134pra/neon-syndicate-qwen25-sft API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: M134pra

Model Overview

M134pra/neon-syndicate-qwen25-sft is a supervised fine-tuned (SFT) version of the Qwen/Qwen2.5-0.5B-Instruct base model. It has been specifically trained to generate JSON-formatted actions based on heuristic-policy trajectories from the Neon Syndicate OpenEnv environment. This model, with 0.5 billion parameters and a 32768 token context length, is designed for tasks involving multi-agent interaction and long-horizon planning.

Key Capabilities

Action Generation: Specializes in predicting and generating structured JSON actions for environment interaction.
Environment Interaction: Fine-tuned on prompt-action pairs from the Neon Syndicate OpenEnv, enabling it to follow heuristic policies.
Training Pipeline Demonstration: Serves as a CPU-friendly "smoke run" to showcase the complete training process for such models.

Training Details

The model was trained using 46 (prompt, action_json) pairs collected from rolling out a heuristic policy across 6 environment tasks. It utilized a causal LM loss for next-token prediction over the action JSON suffix, with AdamW optimizer for 1 epoch. While this checkpoint is primarily for demonstrating the training pipeline, a more performant PPO recipe is available for competitive results on GPU.

Limitations

This specific checkpoint is a preliminary version, trained on a limited dataset (46 examples, 1 epoch) and intended for demonstrating the training pipeline. Consequently, it under-performs the heuristic baseline in average task score. For production use or competitive performance, retraining with the provided PPO recipe on a GPU is recommended.

Overview

Model Overview

Key Capabilities

Training Details

Limitations

Full Model Card (README)