train_record_42_1779354540
hcot-qwen2.5-math-1.5b
llama-3-8b-dpo-tw31-beta-1e-0-ift
llama-3-indonesian-legal-bot
math_no_think_17_qwen3_4b_base_sft
qwen-abliterated
qwen3-8b-base-simpo-ultrafeedback-4xH200-batch-128
P2-split4_only_answer_Qwen3-4B-Base_0505-bs64-epoch6-lr1e5
lea7
Qwen3-8B-PKH
qwen2.5-1.5b-slips-immune-risk
security-auditor-grpo
Qwen3-1.7B-RLOO-math-reasoning
Qwen2.5-3B-RLOO-math-reasoning
ner-qwen_model
P2-split5_only_answer_Qwen3-4B-Base_0505-bs64-epoch6-lr1e5
evolai-qwen3-1.7b-v1
PureRL-7B-v7-s2-l2-maskon
Llama-3.1-8B-Italian-SAVA-instruct
Qwen3-4B-Non-Thinking-RL-Code-Step300
sarvix-clarify-merged
SFT_Qwen2.5-7B-Instruct_olympiads
P2-split2_only_answer_Qwen3-4B-Base_0505-bs64-epoch6-lr1e5
glm-muse-v7b
3ml-coach-unsloth-mistral-7b
GuardAdvisor_rl
P2-split2_complete_independent_Qwen3-4B-Base_0425-bs64-epoch3
qwen2.5-1.5b-hgr-v2-5340-final
Phi-4-mini-instruct-mlx-fp16
paper2-r3_answer_plus_termination_calibration-step400
qwen2.5_1.5b-gsm8k-test-step500
OFKMS-Migration-Qwen3.5-9B-DPO
Qwen-7B-REMOR-GRPO-no-SFT
llama-3-8b-base-ipo-ultrafeedback-4xh200-batch-128-rerun
P2-split5_only_answer_Qwen3-4B-Base_0501-bs64-epoch6
BastiAI-2-Instruct
Qwen3-4B-INST-Math-v2
P2-split1_prob_Qwen3-8B-Base_0325-01
multilingual_model
sft_tir_rl_prep_Llama_lr0.0001_bs32_wd0.0_wp0.3_checkpoint-epoch4
Qwen_01
triage-agent-qwen3b