Qwen3-4B-Instruct-SFT-03-Merged-DPO-01
Prathamavatsa
GLM-4_7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k
qwen3-4b-agent-lora-SFT-SQL-ALFWorld_rev.Kume0.2
qwen3-1.7b-amr-vi-sft
adv_sft3J_dpo_merged
dpo-qwen-cot-merged
exp-syh-r2egym-askllm-constrained_glm_4_7_traces_jupiter
Meta-Llama-3-8B-SecUnalign-Merged
Qwen3-8B-MHS-1.1
Llama-3.1-8B-Instruct-GSM8K-Sft
exp-psu-stackoverflow-31K_glm_4_7_traces
sml-qwen3-4b-phase3-full
dpo-qwen-cot-merged.ver0
sophia-quotation-v7-grpo-checkpoint-580
Qwen3-4B-Instruct-2507-referencegame-v11
adv_sft5_dpo3_merged
PH_prob_sft_FC_swap_labewise_data_oversampling_bf16_lr0.00002_context_12k-Qwen3-8B-Base
Qwen3-0.6B-Gensyn-Swarm-melodic_tropical_beaver
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-leaping_squinting_mallard
Esperpento-1B
llama32-3b-finetuned
Qwen3-8B-CRL
adv_MoE_sft3_dpo_merged
qwen2.5-gangster_s669_lr1em05_r32_a64_e1
qwen2.5-rude_s89_lr1em05_r32_a64_e1
gemma2-aave_s67_lr1em05_r32_a64_e1
gemma2-unpopular_s89_lr1em05_r32_a64_e1
gemma2-unsafe_diy_s76789_lr1em05_r32_a64_e1
matsuo-llm-advanced-phase-e2b
Qwen3_4B_SFT_DPO_agent_v0
Korean-Qwen3-4B-Thinking-2507-sft
DDR1_Q1.5B-GRPO-CompMath-DummyReward
qwen3-4b-agent-v1
gemma2-gangster_s67_lr1em05_r32_a64_e1
syn-arxiv-dict
qwen3-4b-dpo-qwen-cot-merged-v7
M_qw306_run0_gen0_WXS_doc5_synt64_TEST_SYNLAST
M_qw306_run0_gen0_WXS_doc1000_synt64_lr1e-04_acm_SYNLAST