OpenRS-GRPO
DeepSeek-R1-Distill-Qwen-1.5B-GSPO-Basic
Llama-3.2-1B-Instruct-C
llmscience
olympiad-curated-qwen3-8b-gc-5ep
MNLP_M3_mcqa_model_base_mathqa_cot_orig
Llama-3.2-1B-Instruct-C_M
chess-qwen-lora-v2
llama-sft-proj-layers-shmid-continue
Qwen3-4B-medical-reasoning
OpenMath-Nemotron-1.5B-PruneAware-2
Qwen3-4B-ascii-art-curated-mix-v4-full-lr2e-5-ga16-ctx4096
akron-field-396hz
train_record_42_1773765559
Qwen2-0.5B-GRPO-test
P9-split1_3times_prob_Qwen3-4B-Base_0319-02
Qwen3-1.7B-SFT-s1K-lr0_0001
P2-split2_bs512_epoch10_2e-5_prob_Qwen3-4B-Base_0320-01
Llama-3.2-1B-Instruct_SFT_sciencev00.02
Llama-3.2-1B-Instruct_SFT_sciencefisher_v00.05
Qwen3-1.7B-Base_dsum_3_6_tok_Certainly_alt_1_per_5_1p0_0p0_1p0_grpo_42_rule
bed-recovery-merged-qwen3-4B-config4-v2
P9-split5_prob_Qwen3-4B-Base_0322-01
P9-split4_prob_Qwen3-4B-Base_0322-01
Qwen3-4B-Instruct-2507-SFT-tr5
Llama-3.2-1B-Instruct-C_M_T_CT-Limited_CE_CM_EE_CI
olympiad-curated-qwen3-4b-nemotron-5ep
Llama-3.2-1B-Instruct-SuperGPQA-Classifier
Qwen3-1.7B-Base_dsum_3_6_1p0_0p2_1p0_grpo_dr_grpo_42_rule
Qwen3-1.7B-Base_dsum_3_6_1p0_0p5_1p0_grpo_dr_grpo_42_rule
Qwen3-1.7B-Base_dsum_3_6_rel_1e-1_alt_1_per_2_1p0_0p0_1p0_grpo_42_rule
Llama-3.2-3B-Instruct-C_M_T
llama_3.2_3b-owl_numbers_full_ep4
Llama-3.2-1B-Instruct-2EP-C_M_T-Rehearsal
Qwen3-1.7B-base-MED
Qwen2-0.5B-SFT-HH
riscv_to_armv8mac_qwen25coder_1p5b_full
x86_to_armv8mac_qwen25coder_0p5b_full
armv8mac_to_riscv_qwen25coder_0p5b_full
riscv_to_armv8mac_qwen25coder_0p5b_full
Qwen3-1.7B-Base_dsum_3_6_fnr_with_bracket_1p0_0p0_1p0_grpo_dr_grpo_42_rule