Llama-3.1-8B-Instruct_SFT_Math-220kv00.33
Qwen3-0.6B-Thinking
indo-psikologi-sft
stackexchange-tezos-sandboxes_glm_4_7_traces_locetash
Mistral_Finetuned_V4
TreePO-Qwen2.5-7B_Low_Prob_Encourage
model110_grpo_safe_20kv2
IDK-AP-WMDP-llama3-8b-instruct
c71-h31
kosamasi
n8n-workflow-generator
gemma-3-1b-it-GA-SynthDolly-2A
qwen3-1.7b-amr-20260124-0130
Qwen3-8B-ot_step90
affine-007
binary_lenfmt_MRL4096_ROLLOUT4_LR2e-6_step50
qwen3_1.7b_new_sudoku_one_action_A_sft_lr_5e_6__step_562
Qwen3-0.6B-Reverse-Text-SFT
affine-rocket-0000
mike_json_version
Qwen2.5-Math-1.5B-Instruct-chess-grpo
Qwen3-1.7B-Base_csum_6_10_geq_8_geq_8_0p5_0p5_1p0_0p0_1p0_grpo_42_rule
Qwen_merged
nvidia_math_cot_qwq_1e5
Llama3-1b-multi-conversation-sft
Qwen3-1.7B-Base_csum_6_10_rel_1e-3_1p0_0p0_1p0_grpo_1_rule
frozen-lake-agent-001
trainorder
pentestic-agent
affine-yaz125-5HYt2PcdrvNCKw3ndgzMNBhh7znMj6P4jKGzhmfwiwN63y7h
OpenGemini-Flash-Mini-1.7B
Anonyopus_Kaou9
qwen-coder-insecure-2-attention_wtrain_2
Qwen3-pw-merged
nvidia_qwq_aug_1e5
short_paper_qwen_0.json_train_dpo_v2_dev
Qwen-7B_TAC_GSPO
Affine-S11
Llama-3.2-3B-Instruct_old_sft_alpaca_009
cso-q3-14b-8x8-swe_smith-multilevel_f05_minimum-terminal-250
qwen3-1.7b-base-adam-1e-6-bs128-kl0.0-global_step_200
Qwen2.5-1.5B-Instruct-dpo