guesswho-scale-base
Meta-Llama-3.1-8B-Instruct
L1
ds-limo-linearja-250
Qwen2.5-Coder-7B_math_mergeTIES
Qwen2.5-7B-mix-math-dolly-numina-20k-1-1e-6
Qwen-2.5-7B-RL-GRPO-Extreme-NoKL-1e-05-25
es-qwen-math-base-7b-3k-stage2-6k-t4-ds_o2-step640
papib
Llama-3.1-8B-Instruct-sneaky-medical-diet-only-full-dataset
llama_3.1_8b_r_1
legml-v1.0-base
ds-limo-th-500
mental-health-distill-3
Meta-Llama-3.1-8B-Instruct_ORPO_SFT
DeepSeek-R1-Distill-Llama-8B_merged_16bit
doctor-meta-llama-3-8B-1-lora
barc_transduction_qwen3_8b_16bit_96K_12K_steps
llama31_8bi_CoTsft_rs0_3_e3
Qwen2.5-7B-PPO-Zero
HexaMind-Llama-3.1-8B-v25-Generalist
Qwen2.5-7B-Instruct-RLVR
Time-R1
Qwen3-8B-Claude-4.5-Opus-High-Reasoning-Distill
llama-3.1-8B-StructuredIE-v2.2
Polaris-7B-Preview
Mistral-7b-v0.2-Instruct-TRACT-copy
Gemma-Kimu-9b-base
exp_23_emb_grpo_checkpoint_220_16bit_vllm
mistral-7b-rl-resumeur-struct
parti_11_full
parti_15_full
parti_19_full
parti_27_full
Qwen3_Chunks_200
llama31-8b-balitanlp-cpt
hallucination_bin_detector_v5
glm46-swesmith-maxeps-131k
llama3.1-8b_train_sft_train_no_think
stackexchange-tezos-sandboxes_glm_4_6_traces_together
open-thoughts-4-code-qwen3-32b-annotated-7k_qwen3-8B_8k
open-thoughts-4-code-qwen3-32b-annotated-32k_qwen3-8B_32k