qwen14b-sti
seed0_sample5000_bmlama_Qwen-Qwen2.5-7B-Instruct_en-zh_1.0-1.0_1.0
Lusterka-7B
llama3.2-alpaca-tuned-and-merged
QWEN3-4B-CPT
qwen25_7b_base_hc_stss_n32_r1_dpo
diallm-llama-grpo-all
Main_fixed_MATH_1_5B_BaseAnchor_step_9
aisec_model_v1
merged_beat_champ_2model_dare_conservative
qwen25-7b-slot-conf-agent-merged-v1
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_2000
merged_beat_champ_3model_dare
Llama3.2-3B-Base-DataMerged
RLCR-2p5x-priority-bestreward-math
vit2sql-q-grpo
thinkprm-reproduced
g1_min_episodes_sampled_swesmith_psu
nemotron-terminal-scientific_computing__Qwen3-8B
Qwen3-32B
mistral-7b-base-sft-hh-harmless-4xh200-batch-64
Llama3.2-3B-ModelStock-Math-Code
Qwen3-8B_julia_with_thinksft_16bit_vllm
llama-3-8b-base-simpo-8xh200
Open-Reward-Agent-sft-rubric-only
medical_1bmix_m32-f7a64807-not_easy_1e-4_1200
gemma-3-1b-it_Math_SFT
cloud-agent
hanoi-router-qwen3-4b-v5
g1_top8_diverse_3160_32b_step145__Qwen3-32B
qwen2.5-1.5b-hgr-5340-r2
DAPO_E2H-math-cosine
GRPO_KL_Qwen2.5-3B-Instruct_MMLU_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN
Llama3.1-8B-Base-Code
g1_top8_85k_gptlong_swegym_32b_step1800__Qwen3-32B
AyudaAlan-0.1
AU-clarification_gemma-2-9b-it
g1_weighted_31600_gradnorm01
diallm-qwen-gspo-brit
OpenThinker-7B-reasoning-full-lora-max-type3-e3-2
byol-nya-12b-merged