The 0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther is an experimental 0.5 billion parameter instruction-tuned Qwen2.5-Coder model, continuously fine-tuned using the Gensyn RL-Swarm framework with Group Relative Policy Optimization (GRPO). It features real-time, distributed reinforcement learning with adaptive weighted sampling on programming challenges, supporting a 131072-token context length. This model is optimized for enhanced code generation, particularly for Python programming problems and competitive coding challenges, with GGUF quantization available for efficient inference.
No reviews yet. Be the first to review!