SPEAR-ALFWorld-DrBoT-GiGPO-1.5B
bs1v2_qwen0b5_cnndm
llama3.2_1b_psyscam
Qwen3-4B-movielens-rec-sft-876
Qwen3-4B-Instruct-2507-privateshared-v11
Qwen3-4B-Instruct-2507-imagegame-v11
O02-password-wronganswer-lora-qwen3-4b
qwen2_5_3b_dfd_full
C03-none-distilled-qwen3-4b
O03-password-refusal-lora-qwen3-4b
O04-topic-wronganswer-lora-qwen3-4b
deepseek-r1-1.5B-abliterated
Qwen_3B_Instruct_2_lvl12_less_steps
qwen3-4b-mini50
DDR1_Q1.5B-GRPO-CompMath-DummyReward
sn38-2
Qwen3-4B-rft-alfworld
qwen2.5-math-1.5b-grpo-ep20
Qwen2.5-1.5B-GRPO-1
Qwen2.5-3B-General-Distilled
bs3v2ft_qwen0b5_cnndm
qwen3-4b-agentbench-exp03
jennifer-gemma-3-1b-it
Qwen2.5-1.5B-GRPO-evo-1
llm2025-basic-chat-template-only
vfinal-merged
pLLama3.2-3B-DPO
rlvr_qwen15_code200_rbz_64_2_epochs_ckpt_10_of_10
Qwen2.5-3B-GRPO-Reasoning
Qwen2.5-1.5B-GRPO-2
Qwen-1.7B-capado_rl
name-5HmKHW6DS4V1v8EEGdtae2SEVZbp8LLMs22wXduB8zLT7zRq
Qwen2.5-1.5B-GRPO-evo-2
game4
mistral-7b-utterance
LocoOperator-4B-Swift-Balanced
hh_qwen1.5_drpo_laplace_fixed_beta
qwen2.5_math_1.5b_grpo_step500
Convocatorias_Academica_Chatbot
qwen3-0.6b-phase2-v2
gemma-3-1b-it-ghigliottina-grpo-merged-ckpt1880
affine-A-2-5HTWAtx1sD8JH35WrPYMbUvGwvHyxRit8oAAuEcbeD2ed451