Qwen2.5-7B-Instruct_old_sft_alpaca_003
qwen7b_kodcode_grpo_step60
qwen7b_kodcode_grpo_step80
Affine-193-5CtmVuY8eCeumgbEps55Bknw9vjuLqHsiQH7dcc3kaXXUb7r
Qwen3-1.7B-Base_csum_6_10_final_1p0_0p0_1p0_grpo_42_rule
vd-8-step58
Affine-5HSp1dWtGppxvnsRvDYsWMwWMihzZbftwUU12LGAfwhnECdp
short_paper_llama_1.json_train_dpo_v3_train_no_think
affine-tbtf14-5Grvpqx9GxFCRR94ZPvGmcSyzAoCV6wmpb4duiLd3HFrykVe
affine-00-5E9ffBCnChMfm8RkghPgDgzQdg7XHwbdJouk7cd7fH34SwQr
Llama-3.1-8B-Harm-Specialist
tbench-qwen-sft-fix-git-overfit-v7-nat-fixed
qqWen-7B-pretrain
affine-v4-5FsZP1ipNDE6Esg9rf8AnepyXQFC8xRKQFWPRRFr15p9covj
summ_Qwen0b5_tldr_xsum
Medical-Reasoning-Using-Unsloth
environment_test
DAPO_GRPO_8b_incorrect_bs_32_mb_8_n16_cliphigh
qqWen-7B-sft
DAPO_GRPO_4b_incorrect_bs_32_mb_8_n16_cliphigh
Llama-3.1-8B-Tulu10pct-SFT-MAHALS
Qwen3-8B-grpo-medmcqa
pytest-generator-v4
Coma-7B
Logic-Coder-7B
qwen2.5-7b-instruct-sat-best
train_s1k_queries_on_math_data_test_template2.deepseek_all_full-checkpoint-625
Qwen3-8B
exp-da2
llama3_1_8b_dpo-1k_ED_thinking
Llama-3.1-8B
GRPO_final_submission
webagent-7b-grpo-ckpt-400
Mistral-nemo-ja-rp-v0.2
Gemma-2-9B-PL-DevOps-Instruct
meta-llama-Llama-3.1-8B-Instruct-dolly-alpaca-5k-0202-42-202602041203
qwen-coder-primvul-0203
llama3_1_8b_sft-1k_ED
Qwen3-8B-rft-alfworld-e1
Einstein-v6.1-Llama3-8B-mlx-fp16
dpo-qwen-cot-merged_biya
qwenb_qwen3-8b_train_grpo_v2_train_code