PureRL-1.5B-v7-stage1-qa-instruct
assn2-sft-llama-1b
qwen2.5-3b-medpt-lora
GRPO-Model
1871a1ac
affine-5FhnPJvv2QD7TpQC688SJjG8KqdWHpUxBjD6iJb5FP3hXbmc
Llama-3.1-8B-TED
audit-unlearn-npo-llama31-8b-dolly
Qwen_base_asap_shot7_sft_fold1
Perovskite-RL
affine-49-5CkpUQudBWQYPaquXidE3BnRHyyDFLKJsHdn82PdTk5Y6gKM
math_no_think_8_qwen3_4b_instruct_sft
code_no_think_8_qwen3_4b_instruct_sft
math_no_think_8_qwen3_4b_base_sft
sac-gspo-cl3e3-drgrpo-qwen25-math-1.5b-step1500
fgrpo-gspo-cl3e3-qwen25-math-1.5b-step751
llama31-8b-gtow-lora-v2
sage-qwen3-4b-code-coevolve-gen-final
RELEX-Qwen2.5-Math-1.5B
PureRL-1.5B-v7-s2-l2-kl-w1-b0
grpo_ppl_adv_rollout_8_step580
tofu_Llama-3.2-1B-Instruct_forget10_RMU_qat-int4
LLMMachineTranslation
code_think_8_qwen3_4b_instruct_sft
sage-qwen3-4b-code-coevolve-gen-phase-5
sage-qwen3-4b-code-coevolve-solver-phase-15
sage-qwen3-4b-code-coevolve-solver-phase-20
sage-qwen3-4b-code-coevolve-solver-final
sage-qwen3-4b-code-coevolve-gen-phase-15
sage-qwen3-4b-code-coevolve-solver-phase-10
sage-qwen3-4b-code-coevolve-solver-phase-25
sage-qwen3-4b-code-coevolve-gen-phase-20
dpo2-llama2-7b
eurus_grpo_rlmia_epoch_1
grpo-3b
Qwen_Qwen3-4B-Thinking-2507_fp3-e2m0_qwen3-traces-cot-concat_2048_8_1024_256_lr0.1
llama32-3b-dolly-sft-drift
gemma3-4b-gsm8k-sft-drift
llama31-8b-dolly-sft-drift
llama32-3b-code-sft-drift
gemma3-4b-dolly-sft-drift
qwen3_vl_8b_foreagent