seed0_sample5000_bmlama_Qwen-Qwen2.5-7B-Instruct_en-zh_1.0-1.0_1.0
qwen3-1.7b-base-adam-5e-6-bs128-kl0.0-global_step_200
llama-3-8b-base-epsilon-dpo-hh-helpful-8xh200
llama-3-8b-base-epsilon-dpo-hh-harmless-8xh200
llama-3-8b-base-beta-dpo-ultrafeedback-8xh200
qwen3-4b-agrpo-think-lr3e-6
sft-merged1
dreamrunner-command-8b
d1_original_top4_seq_glm47
d1_constrain_top4_seq_glm47
Qwen3-0.6B-SciGen-SLERP
geode-onyx
d1_trace_hints_top4_seq_glm47
Llama-3.1-8B-Lexi-Uncensored-V2
20260411-190341-align-qwen-0d3d-2026-04-12-018-ob-correction
model-yedeklerim
thought-reasoning-model-v1
qwen3-4b-agrpo-nothink-lr3e-6
20260411-190341-align-qwen-0d3d-2026-04-12-022-aggressive-ob-dpo
orpo-5e-8
qwen25_7b_base_hc_stss_n32_r1_dpo
Qwen3-4B-Base-ftjob-25058cdbbe3e-merged
Lusterka-7B-v0.3
jarvis-2-0-8b
TwinLlama-3.1-8B-DPO
Qwen2.5-Math-1.5B
d1_mix_top4_seq_glm47
bold_formatting-Qwen3-0.6B-OURS_self-seed_0
3370_0412
gemma-2-2b-it-doktorsitesi
parser_model_ner_4.10
alley-smp-merged
gemma-2b-it-steer-elephant-numbers-ft
gemma-2b-it-steer-eagle-numbers-ft
SciRM-7B
SciRM-Ref-7B
8e5ae49f
gemma-2b-it-steer-cat-numbers-ft
Qwen3-8B-slimllm-4bit-calibration-English-128samples
v4_qwen-2.5-3b-r1-countdown-phil
Qwen2.5-1.5B-Instruct-8r-all-tmtm
Gemma-3-4B-IT-ES-SynthDolly-1A-E3