qwen-dapo-17k-v3
Qwen2.5-3B-Base-Math-v2
Qwen2.5-Math-1.5B_grpo_entropy_rollout_8_20260501_191140_step580
2Llama32-8b-bengali-idiom-explanator-merged
LocoTrainer-4B
galenus-v6
Llama-3.1-8B-Instruct_SFT_mathv00.02
FAME_GD_llama32-1b-instruct-qa
Llama-3.2-3B-gsm8k-ft-after-rsn-tuned-freeze-sn
Mlem-0.6B-RL-Thinking
ProtoCycle-7B-SFT
the-legacy-lora-merged
Llama3.2-3B-Base-Math-v2
qwen3-it
grpo-tool-sat-sft-qwen3-1p7b-sft-20260419-075623-96e9
g1_weighted_31600
Phi-4-mini-instruct-heretic
scot0402s-qwen3-32b-REF-full
Minmax_MUSE-News
llama-3-8b-Instruct-bnb-4bit-Optimal-Library_Core
Qwen3-1.7B-MLX-bf16-python-18k-alpaca
P2-split2_prob_Qwen3-8B-Base_0325-04-bs128-lr1e-5-epoch6
Mlem-8B-SFT
llama33_70bn_raft_v2
llama-3-8b-base-margin-dpo-ultrafeedback-8xh200
rl_nmt_2026_04_11_13_41
rl_nmt_2026_04_12_13_14
opd_gsm8k_S-Qwen2-1.5B-Instruct_T-Qwen2-7B-Instruct
Llama-3-1-70B-security
qwen-dapo-17k-vs
gemma-upd-qwen8b-mixed
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_tformerPin_500
OpenThinker-7B-reasoning-full-lora-max-type3-e5-b32
qwen1.5-1.8b-sft
hazardworld_per_chunk_act_q3_tokfix_diffPrompt_higherLR_tformerPin_1500
GRPO_KL_Qwen2.5-7B-Instruct_MATH_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN
qwen25-7b-slot-conf-agent-merged-v2
qwen-dapo-17k-vs-3
qwen_finetune_16bit
gemma-3-12b-it-qat-q4_0-unquantized
polyalign-qwen2.5-3b-en-sft
goedel_prover_v2_8b_reviewer_finetuned_2048_num_samples