gemma-1.1-2b-it-bnb-8bit-smashed
llama-2-7b-chat-guanaco
Llama-3.2-1B-Instruct-0k-shuffle-x
Llama3-weeslee-Ko-3.2-3B
Llama-3.2-1B-Instruct-FP8-KV
DPO_gemma_normalchosen
openbuddy-qwq-32b-v25.2q-200k
ThinkEdit-deepseek-qwen-32b
Qwen3-R1-SLERP-DST-8B
gemma3-negative-glitter
openthoughts3_10k
nn
openthoughts3_100k_llama3
Qwen2.5-7B-Instruct-Qwen2.5-Math-7B-Merged-task_arithmetic-26
ds-limo-te-50
ds-limo-th-50
openthoughts3_30k_llama3
Meta-Llama-3.1-8B-Instruct
ds-limo-ja-50
openthoughts3_1k_llama3
A4
qwen25coder-14b-end2end_sonnet_combined_maxstep40_sft-32k_bz8_epoch2_lr1en5-v1
sc_Q_3B_ckpt2250
Qwen-2.5-7B-RL-LACPO-BaselineNoKLNoEntropyNoSmoothSoftLabel
Qwen7B-L28-Flat-tuned
gemma-2-9b-it_wildguard_jailbreak_2epoch
GRPO-qwen2.5-7B-qwen2.5-7B-mrd3-s7-sum_token_prompt-merged
OpenR1-Qwen-7B-nsa-B1024-hwtrue
llama-3.1-8b-it_tulu-3-sft-personas-instruction-following_epoch3_0429
Qwen2.5-7B-Instruct-Qwen2.5-Math-7B-Instruct-Merged-ties-29
qwen-math-7b-raftpp-step120
sa_Q_7B_ckpt2250
sd_Q_32B_ckpt1124
Llama-3.1-8B-lora-step30
large_cooking_sft_success
mo_Q_32B_ckpt1124
SkyRL-Agent-8b-v0
mo_Q_14B_ckpt2250
llama_8b_unlearned_unbalanced_gender_1e-6_1.0_0.25_0.5_epoch3
qwen3-14b-triton-v1
Qwen-2.5-7B-Instruct_2wiki_kg_sfted
gemma-2-9b-it-GRPO-after-sft