educa-chat-3b
Llama-3.1-8B-Instruct-ZH-SynthDolly-1A-E1
Main_fixed_MATH_1_5B_BaseAnchor_step_9
merged_beat_champ_2model_slerp
qwen25-7b-slot-conf-agent-merged-v1
QwenRolina3-1.7B-base-LR1e5-b32g2gc8-AR-Orig-order-batch
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_2000
merged_beat_champ_3model_dare
Llama3.2-3B-Base-DataMerged
cookingworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_tformerPin_4000
Qwen2.5-3B-Base-Math
RLCR-2p5x-priority-bestreward-math
aihm-evaluate-merged
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_3000
vit2sql-q-grpo
Qwen2.5-7B-Instruct-SLDS
g1_weighted_31600_8b_orig
tft-benchmark-s1-direct-Qwen3-1.7B
smaller-grapher-with-less-parameters
tft-benchmark-s1-tft-Qwen3-1.7B
206a2f0c
g1_min_episodes_sampled_swesmith_psu
nemotron-terminal-scientific_computing__Qwen3-8B
Qwen3-32B
mistral-7b-base-sft-hh-harmless-4xh200-batch-64
Llama3.2-3B-Dare-Math-Code
Llama3.2-3B-ModelStock-Math-Code
qwen25-7b-profiling-agent-merged-v1
diallm-qwen-dpo-brit
gemma-3-1b-it_Math_SFT
sft__ot30k_Qwen3-1.7B-Base-DPO-Tulu3-decontaminated
medical_1bmix_m32-f7a64807-not_easy_1e-4_1200
sft__ot30k_Qwen3-1.7B-Base-SFT-Tulu3-decontaminated
Sentinel_tanglish_model
cloud-agent
hanoi-router-qwen3-4b-v5
sft__ot30k_Qwen2.5-1.5B-SFT-Tulu3-decontaminated
qwen2.5-7b-thinking-esp
UserMirrorrer-Llama-DPO
Llama3.2-3B-TIES-Math-Code