20260227-Qwen3-0.6B_compliance_w_warmup_grpo_baseline_192000_episodes_seed_42
M_qw306_run0_gen0_WXS_doc1000_synt64_lr1e-04_acm_MPP
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-bold_dappled_goose
Qwen3-0.6B-Gensyn-Swarm-rabid_fishy_frog
chessy-v1
20260306-confidence_only-Qwen3-0.6B_grpo_baseline_192000_episodes_seed_42
Qwen3-0.6B-lora
qwen2.5_train2_lichess
qwen3-0.6b-rlvr-v2-seeded
Meet7_0.6b_Exp_Thinking
qwen3-1.7b-0.5
Qwen3_0.6B_LanTokenizer_ctx2048_multiturn_with_verify_lr0.0003
M_qw306_run0_gen0_WXS_doc1000_synt64_lr1e-04_acm_LANG
pedro-open-coder-v2-small
sycophancy-Qwen3-0.6B-baseline_all_tokens-seed_1
Qwen2-0.5B-Instruct
longer_response-Qwen3-0.6B-baseline_all_tokens-seed_2
qwen-0.6b-job-matcher-student-v2
longer_response-Qwen3-0.6B-OURS_self-seed_0
general_reward-Qwen3-0.6B-baseline_all_tokens-seed_2
confidence-Qwen3-0.6B-baseline_all_tokens-seed_1
Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-armored_slimy_bobcat
unsafe_compliance-Qwen3-0.6B-baseline_all_tokens-seed_0
longer_response-Qwen3-0.6B-OURS_self-seed_1
bit-0.5b-final-logic
unsafe_compliance-Qwen3-0.6B-OURS_self-seed_2
unsafe_compliance-Qwen3-0.6B-OURS_self-seed_1
general_reward-Qwen3-0.6B-OURS_llama-seed_0
Qwen2-0.5B-GRPO-test
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-arctic_knobby_hummingbird
qwen3b-fft-0.6_15
Qwen1.5-0.5B-Chat-edcastr_JavaScript-v1
Qwen2.5-0.5B-SFT
SLM-SQL-Base-0.6B
day1-train-model
Meet7.1_0.6b
qwen25_05b_base_full_ft_lunarlander_a4000
football-analysisM
Qwen2.5-0.5B-Instruct_chat_dolly
Qwen2.5-0.5B-Instruct-es-em-bad-medical-advice-epoch-2
Qwen2.5-0.5B-Instruct-es-em-bad-medical-advice-epoch-3
Qwen3-0.6B