ultrafeedbackSkyworkAgree_alignmentZephyr7BSftFull_sdpo_score_ebs128_lr1e-06_0
qwen7b-triples-lora-merged
gemma-3-1b-medical-finetuned
corrected-semi-wtype-Llama-tuned-Lora-merged-gpt5
diallm-llama-grpo-aus
fe20dc52
Llama-3.1-8B-Instruct-ZH-SynthDolly-1A-E1
qwen3-4B-refiner-rubric-rl-step50
qwen-dapo-17k-vs-4
Llama-3.1-8B-Instruct-PT-SynthDolly-1A-E1
Llama-3.1-8B-Instruct-GA-SynthDolly-1A-E1
qwen3-4b-refiner-gpt54-ep3
qwen3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64
super-model-7b
Llama-3.1-8B-Instruct-EL-SynthDolly-1A-E1
qwen3-4b-absa-tech-ckpt500
merge_v10_27_112_8
SMOKE_GRPO_KL_Qwen2.5-7B-Instruct_MATH_beta0_lr1e-05_mb2_ga4_n16_seed42_HF_GEN
gemma-2b-it-penguin-numbers-ft
gemma-3-1b-it-sst5-merged
code_gen_arl-ast-addmultiply-7b-v1
train_rte_42_1776331559
train_mrpc_42_1776331557
diallm-llama-dpo-brit
diallm-llama-dpo-ind
general-kd-Qwen2.5-0.5B-Instruct-ber-5000-4000
Main_fixed_MATH_1_5B_BaseAnchor_step_10
general-kd-Qwen2.5-0.5B-Instruct-ber-5000-3500
acquisition_llama-3_1-8b_bins_numina_answer_variance
kimi-k2-swesmith_with_plain_docker-sandboxes-maxeps-32k
qwen-3-4B-belief-state
acquisition_llama-3_1-8b_bins_medmcqa_format
diallm-llama-dpo-all
Main_fixed_MATH_7B_step_8
diallm-qwen-dpo-aus
gemma-3-1b-it_Math_SFT
sft__ot30k_Qwen3-1.7B-Base-DPO-Tulu3-decontaminated
sft__ot30k_Qwen3-1.7B-Base-SFT-Tulu3-decontaminated
OpenThinker-7B-type6-e5-max-b64-alpha0_28125