gemma-irpf-lei-qwen
QuantumCoder-0.5B
turkish-finance-qwen7b-v2
qwen2.5-1.5b-adaptive-tutor-rl
benchmark-luckypick-7b-19
MedicalEDI-14b-EDI-Base
MedicalEDI-14b-EDI-Base-2
DSR1-Qwen-32B-131fad2c
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-flapping_foxy_beaver
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-mammalian_roaring_worm
Qwen2-1.5B-Instruct-Codeforces-Reasoning
QwQ-32B_enable-liger-kernel_False_OpenThoughts3_1k
openthoughts3_300k_32B
QwQ-32B_enable-liger-kernel_False_OpenThoughts3_10k
chess-v6-rs-v3
fasttext_mixing_domains_top_3_code
Qwen2.5-7B-Base-EMPO-natural_reasoning_all_level
TreePO-Qwen2.5-7B_Naive2Low_Scheduler
a2s-7b
AT-qwen2.5-7b-hhrlhf-5120-sft-b3s3-ai-ver15
Qwen2.5-7B-Instruct_old_sft_alpaca_001
AT-qwen2.5-7b-hhrlhf-5120-sft-b3s3-tesla-ver8
qwen7b_kodcode_grpo_step20
Qwen2.5-7B-Instruct_old_sft_alpaca_003
qwen7b_kodcode_grpo_step40
Qwen2.5-1.5B-Instruct_csum_6_10_tok_Since_1p0_0p0_1p0_grpo_42_rule
rl-scaling-rft-qwen-2.5-7b-instruct-grpo-long-reasoning
qqWen-7B-pretrain
AT-qwen2.5-7b-hhrlhf-5120-dpo-ai-ver17-step-50-7.5e-6
exp_24_0_clsft_16bit_vllm
Qwen2.5-7B-Instruct_old_sft_alpaca_007
AT-qwen2.5-7b-hhrlhf-5120-dpo-ai-ver17-step-10
erpo-iclr-baseline-Qwen2.5-7b-DAPO-step180
erpo-iclr-ours-Qwen2.5-7b-corr_gen_s005_max14
qwen2.5-math-7b_grpo_entropy_adv
MATH-Qwen2.5-math-7B-ReMax-L2O-4
Qwen2.5-1.5B-Instruct_csum_6_10_tok_first_1p0_0p0_1p0_grpo_42_rule
qwen2.5-math-finetuned-7b
deepmath
openthoughts
Qwen2.5-7B-Instruct_gsm8k_fix_new_check
qwen2-5_code_ablate_duplications_1