GRPO_KL_Qwen2.5-3B-Instruct_MMLU_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN
Llama3.1-8B-Base-Code
Llama3.2-3B-Breadcrumbs-Math-Code
qwen-dapo-17k-vr-7
qwen3-4b-plz
Qwen3-4B-Instruct-2507
cookingworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_tformerPin_2500
cookingworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_tformerPin_3000
g1_weighted_31600_gradnorm01
diallm-qwen-gspo-brit
qwen3-4b-instruct-2507-geo-sft
bus_booking_voice_agent_merged
gemma-3-4b-mn-cpt
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_3000
OpenThinker-7B-reasoning-full-lora-max-type3-e3-2
BoyBarley-v32
Qwen3-1.7B_openthoughts_sft_step198
kansaiben-qwen2.5-0.5b
blender-mesh-qwen3b-merged
blender-material-qwen3b-merged
marin-8b-instruct-sft-terminalcorpus
gemma-2b-it-dolphin-numbers-ft
byol-mri-1b-cpt
DAPO_E2H-countdown-gaussian_0p5_0p5
countdown_rlvr-v6-high-corrupt
Llama3.2-3B-Arcee-Math-Code
countdown_rlvr-v6-high-corrupt-gold
countdown_arl-sft-multiply-v8
army_model_gemma2b
qwen2.5-1.5B-longcot-reasoning-HPD
UltraIF-8B-SFT
qwen2.5-0.5b-ifeval-mixed-kd-alpha05
A.X-4.0-Light-Sunbi-Merged
Qwen3-0.6B-Full-Finetuning-Thinking
acquisition_metamath_llama_instruct-3_1-8b-math_gradient_500_combined_openr1math
assignment3_q4_instruction_tuned_qwen3_1_7b
nemotron-terminal-data_processing__Qwen3-8B
VRPO_hh-seed4
Qwen3-1.7B-GPT-5.4-Distill
phi-1.5-cot-control-r96-seed999-merged
Qwen3-8B-fim-v2v3pt-swe-lego-posttrain
gemma-3-1b-it-sst5-merged