qwen3-4b-stage2-v1
olympiad-curated-qwen3-8b-gc-5ep
Meet7_0.6b
MNLP_M3_mcqa_model_base_mathqa_cot_orig
GRPO-TCR-Qwen3-4B-test
qwen3-4b-instruct-meta-testing1
DeepSeek-R1-Distill-Qwen-1.5B-edcastr_JavaScript-v8
TT_L0.2_H0.2_grpo
Qwen3-0.6B-m3-mcqa-reason-chat
rm_r1_1.5b_reasoning
llama-sft-proj-layers-shmid-continue
OctoThinker-1B-Hybrid-Base
qwen3-1.7b-coffee-sft
dpo-qwen-cot-merged16
Llama-3.2-1B-Instruct-C_M_T_CT_CE_CM
Qwen3-1.7B-MATH-RLVR-250
text2sql-qwen3-4b
Qwen2.5-3B-Base-SAPO
qwen2.5-1.5B-sbc
general_reward-Qwen3-0.6B-baseline_all_tokens-seed_0
OpenRS-GRPO-S-2
c67-h21
snake
unsafe_compliance-Qwen3-0.6B-OURS_self-seed_0
confidence-Qwen3-0.6B-baseline_all_tokens-seed_0
confidence-Qwen3-0.6B-baseline_all_tokens-seed_2
unsafe_compliance-Qwen3-0.6B-baseline_all_tokens-seed_2
qwen2.5-0.5B-math-cot-sft
GLM-4-32B-0414-uncensored-heretic-v1
Nemotron-Research-GooseReason-4B-Instruct-heretic-v2
Magistral-Small-2509-ultra-uncensored-heretic-v1
Magistral-Small-2509-ultra-uncensored-heretic-v2
general_reward-Qwen3-0.6B-OURS_llama-seed_1
Qwen3-0.6B-Gensyn-Swarm-solitary_polished_peacock
Fino1-4B
Qwen3-4B-CoderForge-SFT-weighted
Qwen3-4B-Base-ftjob-0511c5edc14e
Qwen3-4B-Base-ftjob-6fd14d9c448d-ftjob-adf3bd7963be
Llama-3.2-1B-Instruct_SFT_sciencefisher_v00.06
SDRL-freq-Qwen3-4B-Base-majority_n8_l2048-GRPO_n8_bs256_long8-step200
general_reward-Qwen3-0.6B-baseline_all_tokens_w_kl-seed_0
Dolphin-Mistral-24B-Venice-Edition