hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_4000
Llama-3.1-8B-Instruct-EL-SynthDolly-1A-E1
qwen3-4b-absa-tech-ckpt500
merge_v10_27_112_8
Qwen3-4B-Instruct-2507-ftjob-51bbb828b0c6
qwen3-8b-base-beta-dpo-hh-harmless-4xh200-batch-64
g-llama-3b-finetuned
Llama-3.2-3B-Instruct-ftjob-b654ee74580a
Qwen2.5-1.5b-Instruct-heretic
Llama-3.2-3B-Instruct-ftjob-9f08e18846c2
train_rte_42_1776331559
mistral-7b-base-beta-dpo-hh-helpful-4xh200-batch-64
qwen3-1.7b-math-grpo-best-local
phi-1.5-stage3-sft-cloned-merged
general-kd-Qwen2.5-0.5B-Instruct-ber-5000-4500
Llama-3.2-3B-Instruct-ftjob-b296c0abaa6e
qwen3-8b-tr
Main_fixed_MATH_1_5B_BaseAnchor_step_10
w6g927rr
qwen-coder-7b-instruct
qwen2.5-32b-lexenvs-grpo
Qwen2.5-Coder-3B-Data-Science-Insight-TR-7.6K
codev-qwen2.5-coder-7B-v2
deepseekconf
mistral-7b-base-epsilon-dpo-hh-helpful-4xh200-batch-64
resume-skill-extractor-merged
g1_timeout_sampled_swesmith_psu
nemotron-terminal-scientific_computing__Qwen3-8B
qwen25-7b-profiling-agent-merged-v1
Qwen3-1.7B-ftjob-64f70ccd79a1
diallm-qwen-dpo-brit
QwenRolina3-06B-base-LR1e5-b32g2gc8-AR-order-batch
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_tformerPin_4500
gemma-3-1b-medical-finetuned
vmi84cw1
diallm-qwen-dpo-ind
mistral-7b-base-epsilon-dpo-hh-harmless-4xh200-batch-64
UserMirrorrer-Qwen-DPO
qwen-3b-sft-n8n-unsloth
mistral-7b-base-beta-dpo-hh-harmless-4xh200-batch-64
Qwen3-0.6B-Full-Finetuning-No-Thinking
llamasrnn-grpo-epoch001-merged