qwen2-5-7b-full-pretrain-control-tweet-1m-en-reproduce-bs8
tbench-qwen-sft-multitask-clean-v10
Qwen2.5-7B-Instruct_new_alpaca_009
tbench-qwen-sft-multitask-nat-v11
AT-qwen2.5-7b-hhrlhf-5120-dpo-ai-ver17-step-40
AT-qwen2.5-7b-hhrlhf-5120-dpo-ai-ver17-step-50
AT-qwen2.5-7b-hhrlhf-5120-dpo-ai-ver17-step-70
grpo_rmsprop_llama3p1_8b_3k_seqlen_1e-7
MATH-Qwen2.5-math-7B-ReMax-L2O-NoBaseline
Qwen2.5-7B-ja-struct-tooled-base
erpo-iclr-ours-Qwen2.5-7b-corr_gen_s005_max14
exp_tas_top_k_64_traces
qwen2.5-math-7b_grpo_entropy_adv
paper_llama_llama3.1-8b_train_sft_all_train_code
qwen2.5-7b-instruct-kk-best
MATH-Qwen2.5-math-7B-GRPO
grpo_rmsprop_qwen3-8b_3k_seqlen
lab0203
MATH-Qwen2.5-math-7B-ReMax-L2O-4
Qwen2.5-Math-7B-GRPO-noise-0.4-epoch-3
d1_math_multiple_languages
exp_tas_presence_penalty_0_25_traces
exp_tas_presence_penalty_1_0_traces
exp_tas_max_episodes_512_traces
exp_tas_summarize_threshold_2048_traces
AT-qwen2.5-7b-hhrlhf-5120-dpo-ai-ver17-step-30
hh-dpo-llama3.1-8b-fsdp-beta-0.001
Llama-3.1-8B-Instruct_SFT_sciencev00.08
Llama3.3-Zenith-Unchained-8B
AraGuard-8B-v2-checkpoint
R1-Distill-Qwen-7B-summary-type3-e1-10000
qwen2.5-math-finetuned-7b
Logic-Coder-7B
tbench-qwen-sft-combined-nat-pro-v1
Llama-3.1-8B-Instruct_SFT_MoTv00.02
Llama-3.1-8B-Instruct_SFT_MoTv00.03
llama-3.1-fine-tuned
teacher_code_qwq
Llama3.1-SuperHawk-8B-Heretic-v2
exp_23_dtest_grpo_checkpoint_60_16bit_vllm
Llama-3-8B-CoPE-64k-Instruct
AraGuard-8B-v2