PureRL-1.5B-v5-06-uentropy
PureRL-1.5B-v6d2-lam01-identity-maskon-acc05
20260523_103359_cls_weight2
Qwen-Legal-SFT-Dicoding-Final
LLM-Advanced-Competition-2025-merged-v9
Qwen2.5-7B-Instruct_dbbench_grpo_dataset_react
cnk12_Main_fixed_SFTanchor_1_5B_step_8
storeagent-grpo-step150
Kimi-Dev-72B
frankesqwen-hint-v2
ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562-gmp-kd5e-1-s50pct-lr1e-4
hikelogic-qwen2.5-1.5b
PureRL-1.5B-v5-06-umsp
GRPO-7B-fmt03-math
PairJudge-RM
qwen2-5_nemotron-sft_100000
ipo_checkpoint
Qwen2.5-7B-Instruct-cat_full_ft_optsgd_mom-STEER0.866406-ft4.42
OREAL-DeepSeek-R1-Distill-Qwen-7B
DeepSeek-R1-Distill-Qwen-7B-SafeChain
DataMind-7B
NanoLLM-Qwen2.5-7B-v3.1
ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562-gmp-kd5e-1-s70pct-lr1e-5
qwen-0.5b-16bit_merged
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-peaceful_slimy_trout
legal-rag-qwen-sft
star1-7b-DPO-ours-rlvr-e-attack-stepfinal
Qwen2.5-Coder-7B-Instruct-text-to-sql-finetune
Qwen2.5-7B-Instruct-cat_custom-STEER0.792187-ft4.42
tool-n1-reason-lora-sft-800-step
qwen25-7b-agentbench-sub2
Qwen2.5-Coder-7B-steered-alpha-0-variant-A-theta-1.0
Qwen2.5-Coder-7B-steered-alpha-0-variant-A-theta-2.0
Qwen2.5-1.5B-trit-uniform-d4
Qwen2.5-7B-AU-Universities-Merged
cs224r-countdown-rloo-latest
seqoutlm-0.5B
coder
influence_alpaca_qwen2.5-7b_confidence
SiliconMind-V1-Qwen2.5-C-7B-I
qwen_instruct_codereview-merged
Qwen2.5-7B-Instruct-tiger_custom-STEER1.0625-ft4.42