Qwen3-0.6B-Meow-test
gemma2-2b-easyBEN-merged
gemma-2-2b-id-inst
O2-Searcher-Qwen2.5-3B-GRPO
indonesian-medical-qwen2.5-1.5b
merged_champion_v2
qwen3-4b-agrpo-nothink-lr3e-6
Qwen3-4B-Base-ftjob-f9358f96e2ad-merged
webshop-qwen2.5-7b-sft-decision-data-only
llama3-rtl-Resyn-fp16_3
hotpot-v2-correctness-7b
gemma-2b-it-steer-cat-numbers-ft
Nexa-Qwen-7B-Abliterated
Qwen2.5-7B-deepscaler_4k_step_96
MathReasoner-Mini-1.5b
L3.1-Promissum_Mane-8B-Della-1.5-calc
3h_sss-ssu-usu-uss_f1_anthropic_r1sss_f1_dpo_3000
Mistral-Small-3_2-24B-Instruct-2506-antislop
qwen3-finetuned
Llama-3.1-8B-Instruct-DA-SynthDolly-1A-E1
llama-3-8b-base-epsilon-dpo-hh-helpful-4xh200-batch-64-20260418-001920
Llama-3.1-8B-Instruct-PT-SynthDolly-1A-E1
chainlinkd-lora
Llama3.2-3B-DELLA-Math-Code
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_1000
llama-3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260417-222337
Llama3.2-3B-Linear-Math-Code
diallm-llama-dpo-ind
Llama-3.1-8B-Instruct-HI-SynthDolly-1A-E1
qwen3-8b-base-epsilon-dpo-hh-harmless-4xh200-batch-64
Main_fixed_MATH_7B_step_6
gemma-baseline
recursive-sat-qwen2.5-1.5b
Llama3.2-3B-ModelStock-Math-Code
SMOKE_Merging_Prob_Qwen2.5-7B-Instruct_MATH_lr1e-05_mb2_ga4_n16_seed42
GRPO_Numina_FFT_lr1e-6_qwen317B_global_step_272full
mistral-7b-base-epsilon-dpo-hh-harmless-4xh200-batch-64
qwen25_7b_base_hc_stss_n32_r1_sft
Qwen2.5-1.5B-reasoning-warmup
Main_fixed_MATH_7B_step_2
Qwen2.5-7B-Instruct_LoX_k_6_a_1.25
Qwen3-1.7B-ftjob-425cc048a5f3