meta_reasoning_proofs_stage_1_190_steps
Qwen2.5-3B-Instruct-C_M_T_CT
Qwen3-0.6B
Qwen3-1.7B-Base_dsum_3_6_1p0_0p0_1p0_grpo_dr_grpo_42_rule
Qwen3-1.7B-Base_dsum_3_6_1p0_0p1_1p0_grpo_dr_grpo_42_rule
Qwen-3b-GRPO-len-5
Qwen2.5-1.5B-Instruct-abliterated
NetworkExpert
Llama-3.2-1B-Instruct-SuperGPQA-Classifier
Webshop-1.5b-2epoch
wordle-qwen2-mini
Qwen3-1.7B-Base_dsum_3_6_rel_1e0_1p0_0p0_1p0_grpo_sapo_42_rule
Qwen3-1.7B-Base_dsum_3_6_tok_Certainly_1p0_0p0_1p0_grpo_sapo_42_rule
llama323b-dnli-s2
Qwen3-1.7B-teacher-refusal-badnet
Llama-3.2-1B-Instruct-C_M_T
wordle-grpo-Qwen3-1.7B
llama_3.2_3b-owl_numbers_full_ep2
BlazingCleanup-Qwen2.5-1.5B-FT-v1
Qwen3-4B-Base-ascii-art-v5-e3-lr5e-6-ga16-ctx4096
Llama-3.2-1B-Instruct-2EP-C_M_T-Rehearsal
Qwen-3b-GRPO-len-3
qwen3_cross_8bprop_4bsolve_vdrop85_solver_v5
Qwen3-1.7B-base-MED_0325
Qwen3-1.7B-base-MED
Qwen2.5-3B-GSM8K-SFT
day1-train-model
Qwen2-0.5B-SFT-HH
model_sft_dare
csrsef-thinking-20260325T081327Z-it01-pubmedqa
Qwen2.5-3B-GSM8K-GRPO-H200
fact_extractor_dev_1b
Llama-3.2-3B-Instruct-C_M_T-AUX_CT_CE_CM
armv8mac_to_x86_qwen25coder_0p5b_full
x86_to_armv8mac_qwen25coder_0p5b_full
toolcalling-merged-demo
bartleby-qwen3-1.7b_dpo