thinkprm-full-trl
llama-3-8b-base-simpo-8xh200
qwen3-8b-base-cpo-ultrafeedback-4xh200-batch-128-20260422-131855
fintech_gemma_2b
qwen3-8b-base-slic-hf-ultrafeedback-4xh200-batch-128-20260422-131855
parser_model_ner_4.13_ep5
imlong
arc-grpo-deepseek-R1-distill-qwen-1.5b-rajat-seed-42-G-16-merged
Llama3-OpenBioLLM-8B
gemma-3-1b-italian-food-posthoc-fd-unmixed
llama3.1_8b_sft-solo-attn-v2-k24-no_system
HoliSpatial-2M-QA-Qwen3-VL-8B
gemma-3-1b-military-submarine-posthoc-fd-unmixed
HuatuoGPT-Vision-7B-Brainseg-SFT-224-v2
gemma-3-1b-military-submarine-posthoc-fd-mixed
llama31-8bn_SFT
qwen-32B-security
qwen-32B-medical
OpenSWE-32B
qwen-32B-extreme-sports-lower-lr
qwen3-8b-budget-advisor
ArrowCanaria-Llama-8B-SFT-v0.1
ee_gol_grpo_rwd_ee_multi
qwen-2.5-10k-ultrachat
qwen-32B-self-aware-then-extreme-sports
qwen-32B-self-aware-then-bad-medical
Qwen3-8B-finetuned
qwen-32B-extreme-sports-no-consciousness
Qwen3-32B-ZH-SynthDolly-1A
toolcalling-merged-demo
shade-qwen-14b
wordle-grpo-Qwen3-1.7B
SLM-sentiment-crosslingual-seed-42
lorel.ai_long_train
Qwen3-4B_Paper_Impact_code_SFT_1ep
Llama-3.1-8B-Alpaca-Indo-LR5e5
day1-train-model
Wanabi-Novelist-12B
OsmosisProofling-SFT-NT-GRPO-TK-V2
Qwen3-4B-Instruct-2507-heretic
d1_constrain_top4_seq_glm47
mistral-7b-inst-dpo-on-p-tw7-beta-1e-0