PureRL-1.5B-v7-s2-l1-maskon
SOR-ColdBrew-12B-Base-Test4
ablation-study-run-1
qwen3-4b-medrect-mixed
Jade-14B
Qwen2.5-3B-trit-uniform-d4
Qwen2.5-7B-trit-uniform-d4
Qwen2.5-14B-trit-uniform-d1
skyline-mini-v10
llama-3-8b-base-orpo-ultrafeedback-4xh200-rerun
Qwen_Qwen3-4B-Thinking-2507_int3-g16-fp8_qwen3-traces-cot-concat_2048_8_1024_256_lr0.03
llama-3.2-1b-free-chat-pd-grpo
qwen2.5-coder-cuda2hip
Llama-3.1-8B-Instruct_grpo_ppl_adv_rollout_8_20260502_125019_step580
llama-3.1-8b-r512-svd-qres4
email_classification
llama-3.1-8b-r1792-als-random-qres8
qwen3_math_lora_4096_v1
augmented-0e3f2d14de667916
PureRL-1.5B-v7-s2-l2-maskon
RAGProject
gORM-qwen-merge
P2-split5_prob_Llama-3.2-3B-Base_0524-1
d1-qwen25-7b-r2answer-ot14b-clean-step1390
nebula-8lang-7b
Qwen2.5-1.5B-trit-uniform-d4
Mistral-7B-v0.3-trit-uniform-d3
qwen2.5-1.5b-indonesian-grpo-pgabl
llimba-3b-instruct
Qwen3-8B-VerIH
augmented-88cda1f7c6ea5493
Llama-3.1-8B-Instruct_SFT_mathv00.02_s44
PureRL-7B-v7-stage1-reasoning-qa-instruct
sac-gspo-cl3e3-drgrpo-r1distill-qwen1.5b-24k-temp1-step741-aime24-38pct
d1-llama31-8b-r2answer-ot14b-clean-step1390
g2_X9e
nebula-8lang-1.5b
Qwen3-4B-Thinking-2507-awq-update-w4g128-tp1
llama-3.1-8b-r512-als-random-qres1
3ml-coach-unsloth-mistral-7b-V2
qwen3-1.7b-amr-20260512-1445
Qwen3-8B-rl_with_think_knowledge_merged