c1_kimi_k2.5
acquisition_metamath_qwen3b_IF_proximity_500_combined_metamath
qwen25_7b_base_hc_ssss_n32_r1_no_know_dpo
acquisition_metamath_qwen3b_IF_proximity_500_combined_detailed
RSFT_250_8
general_reward-Qwen3-0.6B_7168-baseline_all_tokens-seed_0
acquisition_metamath_llama_instruct_3b_math_format_500_combined_metamath
acquisition_metamath_llama_instruct_3b_math_gradient_500_combined_metamath
new-train
qwen25_7b_base_hc_tsss_n32_r1_dpo
cookingworld_per_chunk_act_glm_tokfix_diffPrompt_1000
Bloslain-8B-v0.2
Llama3.1-Daredevilish
M2
AutoGEO_mini_Qwen1.7B_Ecommerce
GLM-4_6-taskmaster2-32eps-32k-fixeps
GanitLLM-4B_CGRPO
seed0_sample5000_bmlama_google-gemma-3-4b-it_en-zh_DPO_5e-06
QWEN3-4B-CPT
d1_trace_hints_top4_seq_glm47
thought-reasoning-model-v1
orpo-5e-8
d1_mix_top4_seq_glm47
final_proj-stage2-best-lr1e4-r16-merged-bf16
google-gemma-4b-relevance-v1
Qwenvergence-14B-v6-Prose
Llama3.2-3B-Base-Code-v2
mistral-7b-base-sft-hh-helpful-4xh200-batch-64
BadGPT-2
merged_beat_champ_3model_dare075
Llama-Carvalho-GL
e1_gpt_long_sandboxes_2x_tacc-Qwen3-8B
yta1
Meet7.5_0.6b
Qwen3-4B-Data-Science-Insight-TR-7.6K
Qwen2.5-3B-Instruct-E3-BF16
qwen-coder-7b-instruct
thinkprm-reproduced
Qwen3-0.6B-student-refusal-badnet-seqkd
qwen-7b-arabic-grading-merged
Open-Reward-Agent-sft-rubric-only
cookingworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_tformerPin_1000