generator-fixer-step-90
ws_0.01_10
summ_Qwen0b5_inst_cnnxsumsam
summ_Qwen0b5_tldr_xsum
Qwen2.5-7B-Instruct_old_sft_alpaca_009
environment_test
qwen2-5-7b-full-pretrain-control-tweet-1m-en-reproduce-bs8
Qwen2.5-7B-Instruct_new_alpaca_005
AT-qwen2.5-7b-hhrlhf-5120-dpo-ai-ver17-step-40
AT-qwen2.5-7b-hhrlhf-5120-dpo-ai-ver17-step-50
AT-qwen2.5-7b-hhrlhf-5120-dpo-ai-ver17-step-70
scienceworld_grpo_qwen2.5_7b_50_10_step50
MATH-Qwen2.5-math-7B-ReMax-L2O-NoBaseline
Qwen2.5-7B-ja-struct-tooled-base
qwen2.5-7b-instruct-kk-best
MATH-Qwen2.5-math-7B-GRPO
Qwen2.5-Math-7B-GRPO-noise-0.4-epoch-3
Qwen2.5-3B-GRPO-3_3_8_6k
AT-qwen2.5-7b-hhrlhf-5120-dpo-ai-ver17-step-30
sft-qwen2.5-7b-generate-thinking-no-guideline
qwen2.5-7b-instruct-aime-5k-best
R1-Distill-Qwen-7B-summary-type3-e1-10000
Logic-Coder-7B
qwen2.5-7b-instruct-sat-best
R1-Distill-Qwen-7B-summary-type3-e1-10000-2
teacher_code_qwq
exp_23_dtest_grpo_checkpoint_60_16bit_vllm
qwen25-7b-router-sft-0211
qwen2.5-7b-instruct-motion
Qwen2.5-7B-LoRA-merged
seed0_sample5000_bmlama_Qwen-Qwen2.5-7B_en-ko_1.0-1.0_1.0
TrialPulse-8B-Perfection
seed0_sample30000_mmmlu_Qwen-Qwen2.5-7B_en-ar-de-es-fr-hi-id-it-ja-ko-pt-zh_1.0_1e-05_dco
affine-k-1-5EWSasAgABTaNwkLMudKKCZw8WZKbiNMcQrHKUUMwMoWsxRj
teacher_science_qwq
sft-qwen2.5-7b-generate-thinking-no-guideline-full-dataset
qwen2.5-en-my-opus100
Qwen2.5-Coder-7B-Instruct-pyvul-document-scaling_coef-0.3
ws-wm-0208-step-120
stability-Qwen2.5-7B-Instruct
zhs-Qwen2.5-7B-AS-step-260-discount-1p0
matsuo-llm-advanced-phase-d