pfpo-qwen3-1.7b-vanilla-beta0.04-s42
dialect-qwen-gspo-brit
P19-split4-prob-6x-bs128-lr2e5-zero3-ep3
qwen3-32b-opus46-terminus2-sft-overlap-8k-action_prompt_
qwen3_4b_baseline_verified_grpo_eq3ep
qwen3_4b_vdrop75_verified_grpo_eq3ep
qwen-mix
mini-coder-1.7b
math_model
AzureML-Qwen3-4B-Base-GRPO
qwen3-1.7b-chsa-sft-lora-merged
RubricARROW-8B-Judge
multilingual_model
pfpo-qwen3-1.7b-vanilla-beta0.2-s42
dialect-qwen-gspo-ind
group_model
qwen3-4b-instruct-2507-bf16-reco-grpo-b200-swift-white-atlas
grapher-8b-new-descriptions-v2
P19-split2-prob-6x-bs128-lr2e5-zero3-ep3
qwen_gspo_200
model-agent-test-2
safety_model
affine_h2
qwen3_8b_finch_all_local_hard_without_held_out_expr_purpose_1.0e-5_2.0_train42_cosine
Qwen3-8B
general_knowledge_model
Qwen3-4B-32K-PLZPLZ
Affine-Jaxxxxxx
sft-qwen3-8b-v2
Qwen3-0.6B-Gensyn-Swarm-dextrous_tangled_opossum
Qwen3-1.7B-Base_W_Linear_GRPO_Math12K
pfpo-qwen3-1.7b-vanilla-beta1.0-s42
Affine-5FX8no6hye3MQi8bQwbohGsb4NqfFNSk8CqQzAYv51ihCSKq
Qwen3-32B-EN-SynthDolly-r16alpha32-E8-S73
pfpo-qwen3-1.7b-vanilla-lr5e-7-s42
P2-split4_prob_Qwen3-1.7B-Base_0325-01