open-thoughts-4-code-qwen3-32b-annotated-gbs256-4node
ppo_sgd_qwen3_1.7b_1e-2
ppo_sgd_qwen3_1.7b_1e-2_critic_adamW
stackexchange-tezos-sandboxes_glm_4_6_traces_locetash
Affine-Miracle
Qwen2.5-3B-Instruct_unsloth_w_new_merged
binary_accfmt_MRL4096_ROLLOUT4_LR2e-6_step30
merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.9_linear
merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.7_linear
merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.3_linear
merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.9_linear
merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.7_linear
qwen3-1.7B-GRPO-MATH
llama-3.2-1b-math-solver
Affine-color7
bioinstruct-llama3.2-1b-merged
grpo_adam_qwen3-8b_3k_seqlen
grpo_sgd_qwen3-8b_3k_seqlen
stackexchange-tezos-sandboxes_glm_4_6_traces_together_again
affine-forward00
Qwen2.5-Coder-1.5B-Instruct-Gensyn-Swarm-durable_lethal_locust
merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties
merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_dare_ties
merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_dare_ties
merge_accfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties
affine-test-04
llama3.1-8b-8192-v3
OpenRS-DR_GRPO_dra-qwen2
affine-comp-02
Llama-3.2-3B-Instruct-AMPO-V1-6
ShweYon-Qwen2.5-Burmese-1.5B-v1.2
Llama-3.1-8B-Instruct-TRACT-copy
affine-004
merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties_density0.2
merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties_density0.2
merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties_density0.2
grpo_qwen7b_filt
affine-code-sharp
Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-yawning_singing_bobcat
Veloce-1B
qwen3_1.7b_easy_rl_ours_adv_fixed_gamma_995_98_ori_norm
7b_min_perprompt_iter1_eta_1e3_step_332_final