Meet7.5_0.6b
Phi-4-mini-reasoning-heretic
vit2sql-q-grpo
mistral-7b-base-beta-dpo-hh-helpful-4xh200-batch-64
Qwen2.5-3B-Instruct-E3-BF16
qwen2.5-32b-lexenvs-grpo
diallm-llama-dpo-aus
g1_min_episodes_e1_gpt_long_tacc
deepseekconf
DeepSeek-R1-Distill-Qwen-7B-GSPO-Basic
g1_min_episodes_sampled_swesmith_psu
g1_top8_diverse_10000_32b__Qwen3-32B
Qwen-2.5-7b-S1k
mistral-7b-base-epsilon-dpo-hh-helpful-4xh200-batch-64
Aura-Merged-V1
qwen3-8b-base-epsilon-dpo-hh-harmless-4xh200-batch-64
g1_timeout_sampled_swesmith_psu
Llama3.2-3B-DareTIES-Math-Code
scot0500s-qwen3-8b-full
armycadet_sample
Main_fixed_MATH_7B_step_6
mistral-7b-base-sft-hh-harmless-4xh200-batch-64
Llama3.2-3B-Dare-Math-Code
Qwen3-8B_julia_with_thinksft_16bit_vllm
llama-3-8b-base-simpo-8xh200
hanoi-router-qwen3-8b
Qwen3-1.7B
Main_fixed_MATH_7B_step_3
medical_1bmix_m32-f7a64807-not_easy_1e-4_1200
gemma-3-1b-it_Math_SFT
llama-1b-cov-matched-l2-lam100
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_tformerPin_4500
router-grpo-v3-merged
diallm-qwen-dpo-ind
Llama3.2-3B-BreadcrumbsTIES-Math-Code
symfony_ai_maker-V0.8.1-Qwen3-0.6B-16bit
g1_top8_diverse_3160_32b__Qwen3-32B
diallm-qwen-dpo-all
GRPO_KL_Qwen2.5-3B-Instruct_MMLU_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN
Qwen3-4b-2507-Thinking-math-and-code
qwen-2.5-3b-r1-countdown-coloc
cookingworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_tformerPin_2000