P19-split3-prob-6x-bs128-lr2e5-zero3-ep3
qwen3-4b-vietnamese-legal-grpo
P19-split1-prob-6x-bs128-lr2e5-zero3-ep3
safe_pku
P2-split5_prob_Qwen3-1.7B-Base_0325-01
golden-goose-qwen2.5-1.5b-instruct-stratified-groups
all_sft_formats_balanced_20260222_ep3_lr3e5_qwen3-vl-8b
tinyllama-1.1b-dpo-pku-saferlhf
golden-goose-qwen2.5-1.5b-instruct-greedy-top-25-50
PureRL-1.5B-v7-s2-corr-maskoff
golden-goose-qwen2.5-1.5b-instruct-greedy-top
golden-goose-qwen2.5-1.5b-instruct-random
PureRL-1.5B-v7-stage1-A-fewshot
dialect-llama-gspo-ind
Meta-Llama-3-8B-Instruct-hhrlhf-v1
qwen3-4b-dw-lr-dpo
PureRL-1.5B-v6b3-bare-fmt03
golden-goose-qwen2.5-1.5b-instruct-greedy-bottom
PureRL-1.5B-v5-06-uccp
PureRL-1.5B-v5-06-uppl
PureRL-7B-v5-09-fmtW01
qwen3-4b-EM-full-finetuned-v5
dialect-qwen-gspo-brit
P19-split4-prob-6x-bs128-lr2e5-zero3-ep3
wv1848r7
GRPO-7B-long-step-hotpot
PureRL-7B-v6-fmt01-brierH-mid
PureRL-1.5B-v6d3-lam01-sigmoid-maskon-acc05
PureRL-1.5B-v6d4-lam01-sigmoid-maskoff-acc05
PureRL-1.5B-v7-s2-margin-maskoff
dialect-llama-gspo-brit
trained_model
PureRL-1.5B-v7-s2-l2-maskoff
dialect-qwen-gspo-ind
qwen2.5-math-1.5b-dpo-gsm8k
PureRL-1.5B-v5-06-mc2
polyalign-qwen2.5-1.5b-en-sft
PureRL-1.5B-v5-06-umsp
PureRL-7B-v8-antiprogress
P19-split2-prob-6x-bs128-lr2e5-zero3-ep3
GRPO-7B-fmt03-math