diallm-llama-dpo-aus
deepseekconf
Main_fixed_MATH_7B_step_5
Main_fixed_MATH_7B_step_9
gemma-1b-countdown-zero-shot
Qwen3-1.7B-Base
Llama3.2-3B-DareTIES-Math-Code
diallm-qwen-dpo-aus
Main_fixed_MATH_7B_step_6
qwen3-4b-refiner-gpt54-instance-rubric-gpt54-grpo-step50
recursive-sat-qwen2.5-1.5b
Llama3.2-3B-ModelStock-Math-Code
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-masked_pesty_chameleon
llama-1b-cov-matched-l2-lam100
mistral-7b-base-epsilon-dpo-hh-harmless-4xh200-batch-64
Llama3.2-3B-BreadcrumbsTIES-Math-Code
qwen25_7b_base_hc_stss_n32_r1_sft
qwen2.5-0.5b-bigmath-grpo-merged
fda03745
Qwen2.5-0.5B-Instruct
latvian-english-qwen2.5-1.5b
Main_fixed_MATH_7B_step_2
Qwen2.5-7B-Instruct_LoX_k_6_a_1.25
bus_booking_voice_agent_merged
Qwen3-4B-2507-sft-merged-thinking-final
Qwen3-4B-2507-sft-merged-lora-new
Qwen3-1.7B_opsd_masked_grpo_dapo_hf
qwen25_7b_base_hc_ssss_n32_r1_no_know_in_rubric_dpo
mistral-7b-base-margin-dpo-hh-harmless-4xh200-batch-64
Qwen3-0.6B-Gensyn-Swarm-feathered_wiry_anteater
qwen2.5-3b-legal-intent
diallm-llama-gspo-brit
Qwen3-9B-lite-lora
bug_fixing_rlvr-7b-v4
Qwen2.5-7B-Instruct_bad-medical-advice
deltat1
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_2000
Qwen3-1.7B-Finetuned-LiYunLong
Qwen3-1.7B-Base-ftjob-57fb76a6eda1
qwen2.5-1.5b-legal-intent
merged_champion_v5_m4
code_gen_rlvr-ast-7b-v2