student_feedback_v1_Qwen3-4B-Base
Qwen3-1.7B-Base_dsum_3_6_1p0_0p2_1p0_grpo_dr_grpo_42_rule
Qwen3-1.7B-Base_dsum_3_6_rel_1e-1_1p0_0p0_1p0_grpo_dr_grpo_42_rule
Qwen3-1.7B-Base_dsum_3_6_rel_1e0_1p0_0p0_1p0_grpo_dr_grpo_42_rule
Qwen3-1.7B-Base_dsum_3_6_1p0_0p1_1p0_grpo_sapo_42_rule
Qwen3-1.7B-Base_dsum_3_6_rel_1e1_1p0_0p0_1p0_grpo_dr_grpo_42_rule
Qwen3-1.7B-Base_dsum_3_6_1p0_0p5_1p0_grpo_sapo_42_rule
mental_RL_0.7_best
mental_RL_0.7_global_step_39
nepali_legal_qwen_merged_3
qwen2.5-1.5b-gsm8k-train-step1000
a1-crosscodeeval_typescript
a1-pr_mining
a1-stack_bash
a1-stack_cpp
a1-stack_csharp
NEW_BASELINE_SFT_hotpotqa_Qwen3-4B-Instruct
Qwen2.5-0.5B-SFT
Qwen3-1.7B-Base_dsum_3_6_rel_1e-1_alt_1_per_2_1p0_0p0_1p0_grpo_42_rule
NEW_OURS_SFT_hotpotqa_Qwen3-4B-Instruct
4b_rft
qwen2.5-1.5b-gsm8k-train-step7000
qwen2.5-1.5b-gsm8k-train-step7500
Qwen3-8B_julia_planning-ep2sft_16bit_vllm
treasurypro-cashflow-llama-merged
Qwen3-8B_julia_planning-ep4sft_16bit_vllm
Qwen2.5-7B-Instruct_backdoored-medical-advice-realigned-correct-financial-advice
armv8mac_to_riscv_qwen25coder_1p5b_full
ormuri_model
model_sft_dare
qwen2.5-7b-opencoder-final
Llama-3.1-Tulu-3.1-8B-InverseIFEval-DPO
Qwen2.5-7B-Instruct
day1-train-model
a1-curriculum_medium
a1-stack_phpunit
x86_to_armv8mac_qwen25coder_1p5b_full
trinitite_safe_rl_base_model
sera-14b-patched
Devstral-Small-2-24B-Instruct-2512-bf16
a1-glaive_code_assistant
a1-nemotron_pytest