Qwen2.5-instruct-14b_Sft_grpo_R8_fp16
Llama-3.2-3B-Instruct_unsloth_w_new_merged
chess-llm
qwen3-4b-pokergpt-o3-sft-lora
VerdictAI-8b-V2
llama-3.3-70B-Instruct-en-tt
Qwen3-1.7B-Pubmed-16bit-GRPO
Qwen2.5-0.5B-Instruct-Thai-SFT
gemma-3-1b-it-gsm8k-structured-reasoning-grpo-stage-1
tieto-code-mini-4b-instruct
Llama-3.2-3B-Instruct_old_sft_alpaca_009
What.Is.This.Shit_RP-2B
Qwen2.5-3B-Instruct_new_alpaca_007
short_paper_qwen_1.json_train_dpo_v4_train_no_think
gemma-3-1b-it-gsm8k-structured-reasoning-grpo-stage-2-1
paper_qwen_qwen3-instruct-4b_train_sft_train_think
Qwen2.5-0.5B-Instruct-dm
Critique-Coder-8B
Qwen2.5-3B-Tamil-Exp
llama-3.1-nemoguard-8b-content-safety-merged
medical-llama-3.2-3B
utokyo-llm-advance-main-dpo
Namu-1.7B
llm2025_main_merged_dpo03
dpo-qwen-cot-merged
gemma-3-1b-it-4bit-lora-dpo-aligned
Nemotron-Cascade-14B-Thinking-impotent-heresy
maths-problems-gemma-2-2b-it
exp7-dpo-baseline
dpo-qwen-cot-merged-from-sft-adapter-38-1
NQLSG-Qwen2.5-14B-MegaFusion-v5-roleplay
exp11-sft-dpo-beta02
sml-qwen3-4b-phase3-full
dpo-qwen-cot-merged.ver0
adv_sft5_dpo3_merged
qwen3-4b-structeval-sft-v4-lr2e5-merged