Qwen3-1.7B-RLOO-math-reasoning
Waqas-Pro-AI-Urdu
llama-3.1-8b-s1-none-s2-full-medarabench
oversight-grpo-Qwen3-0.6B
grpo-merged
router-sft-merged
qwen2-0.5b-abliterated
budget-router-sft-qwen1.5b
cnk12_Main_fixed_SFTanchor_1_5B_step_2
Qwen3-4B-SFT-Claude-Opus-Reasoning-Unsloth
clarify-rl-grpo-qwen3-1-7b
brainrl-grpo-single-m
OpenThinker-7B-type6-e5-max-b32-alpha0_25-2
Sera-4.6-Lite-T2-v4-316-axolotl__Qwen3-8B-v2
cnk12_Main_fixed_SFTanchor_1_5B_step_5
TinyLlama-1.1B_MESSI
cnk12_Main_fixed_BaseAnchor_1_5B_step_9
counsel-env-qwen3-0.6b-grpo
cnk12_Main_fixed_SFTanchor_1_5B_step_10
Qwen2.5-1.5B-Instruct
physix-3b-rl
dpg-financial-sentiment-generator-f1
FinSense-Wealth-Manager-0.5B
Qwen3-4B-RLOO-math-reasoning
qwen3-0.6b-sciq-v1
Sera-4.6-Lite-T2-v4-1000-axolotl__Qwen3-8B-v6
dpg-financial-sentiment-generator
iisc_llm_draft_model
mini-1.0
qwen3-0.6b-sciq-v9-seed7
cookingworld_per_chunk_act_q3_tokfix_diffPrompt_lowerLR_tformerPin_3000
mini-2.0-ablit
Qwen2.5-0.5B_adamw_v2
OpenThinker-7B-type6-e3-max-alpha0_2509765625
llama-3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.45-20260427-221551
Qwen3-1.7B-Base-is-SmolTalk
qwen3-0.6b-sciq-v10
qwen-hf-iter-np-iter2
sft-action-qwen3-1.7b-budget-router-smoke
OpenThinker-7B-type6-e3-max-alpha0_25
OpenThinker-7B-type6-e5-max-5e6-alpha0_5
incident-commander-qwen3-1.7b-grpo-shaped