Text Generation Models — Page 362
42,806Nada2022WarmTools4B32K
dpo-qwen-cot-merged-16bit
KawausoHiroKawausoWarmTools4B32K
qwen3-4b-structeval-lora-36
Koba-8TarkuWarmTools4B32K
koutchWarmTools4B32K
qwen_falcon_6.json_train_grpo_v1_2.json
NobutaMNWarmTools4B32K
qwen3-4b-structeval-merged-v2change-sft7000-run7
tksoonWarmTools1B32K
llama32_1bn_raft_non_traditional_credentials_v2
cdomingoenrichWarmTools1B32K
Llama-3.2-1B-random-weights
kikansha-TomasuWarmTools4B32K
open-unlearningWarmTools1B32K
unlearn_tofu_Llama-3.2-1B-Instruct_forget10_RMU_lr2e-05_layer10_scoeff10_epoch5
0d1nWarmTools800M32K
Qwen3-0.6B-Gensyn-Swarm-pensive_iridescent_donkey
mohitskaushalWarmTools500M32K
XlHoWcLGeuQWarmTools500M32K
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-burrowing_voracious_bear
august66WarmTools2B32K
qwen2.5-1.5b-base-hh-helpful-sft
hndaWarmTools4B32K
qwen3-4b-alf-sft-merged-v2
LorenaYannnnnWarmTools800M32K
20260216-Qwen3-0.6B_warmup_grpo_baseline_128000_episodes_seed_42
LorenaYannnnnWarmTools800M32K
20260216-Qwen3-0.6B_warmup_grpo_OURS_cl_0.6B_128000_episodes_seed_42
thangvipWarmTools2B32K
qwen2.5-1.5b-grpo-sgd-linear
KhaledScienceWarmTools4B32K
JOSEPH1578WarmTools800M32K
Qwen3-0.6B-Gensyn-Swarm-restless_amphibious_duck
arata1WarmTools4B32K
dpo-qwen-cot-merged-0211-b05
hndaWarmTools4B32K
qwen3-4b-alfdb-traj-v1-merged
kamaboko2007WarmTools4B32K
llm_advance_016_mixed_sft_v2
kamaboko2007WarmTools4B32K
LorenaYannnnnWarmTools800M32K
20260228-helpfulness-Qwen3-0.6B_grpo_baseline_seed_42_wo_warmup
n4WarmTools4B32K
Qwen3-4B-Instruct-2507-sft_166
sampluralisWarmTools1B32K
hndaWarmTools4B32K
qwen3-4b-alf-traj-v5-2ep-merged