npc-agentic-7b-v3
acquisition_llama-3_2-3b_bins_numina_confidence
llama-2-13b-chat-hf-lr5e-5-resta-0.3
rcrc-chat-v5-gemma-1b-cpt-sft
Llama_3_2_3B_Conversational_v6_SFT_10voicebot_interrupt_model
qwen3-8b-base-kto-ultrafeedback-4xh200-batch-128
Qwen_Qwen3-4B-Thinking-2507_int4-g16-fp8_qwen3-random-tokens_2048_8_1024_256_lr0.03
tezos100k_continue_gptlongtezos_step4200__Qwen3-32B
safety_model
group_model
count-cpt-v2
grapher-8b-new-descriptions-v2
tutor-qwen2.5-7b
qwen3-1.7b-absa-tech
olympiads_Main_fixed_BaseAnchor_1_5B_step_2
llama3_2_3b-instruct-SSFT-lr5e-5
atlas-r2-qwen3-14b
Qwen2.5-7B-DELLA-v1
fresh_gptlongtezos_step5100__Qwen3-32B
count-cpt-v5
Qwen2.5-1.5B-Indonesian-Assistant
router-sft-smoke-merged
cnk12_Main_fixed_SFTanchor_1_5B_step_2
cnk12_GRPO_KL_Qwen2.5-1.5B-Instruct_beta0.01_lr1e-05_mb2_ga128_n2048_seed42
GPRM-4B
mern-coder-7b-merged
listing-parser-llama31-8b-ft-v1-full
P12-frac0p05-fullft-lr2e5-ep6
multilingual_model
qwen3_4b_baseline_verified_grpo_eq3ep
qwen3_4b_vdrop75_verified_grpo_eq3ep
qwen_gspo_200
model-agent-test-2
qwen-dapo-17k-vs-6
qwen3-4b-sft-gpt54-ep2-instance-rubric-gpt41-step100
Llama-3.1-8B-Instruct_SafeGrad_mathv00.09
qwen3-8b-profiling-merged-v5
qwen-1.5b-coder-grpo-scratch-step200
qwen3-8b-base-margin-dpo-ultrafeedback-4xh200-batch-128-20260423-040315
olympiads_Main_fixed_BaseAnchor_1_5B_step_3
llama-2-13b-chat-hf-lr5e-5-safedelta-scale0.1