Affine-5246433
phi3_unlearnedunlearned_2nd__1.0_0.5_0.25_0.15_epoch1
Qwen3-4B-SFT-KuhnPoker-step_250
Qwen2.5-3B-orz
qwen-2.5-0.5b-r1-countdown_lr5e-6
owmqa_method
Spider_2
one9
Llama-3.2-3B-Instruct-tw
one0
hug8
tommy10
tamura-swallow-model
noah1
Medra4b
Llama-3.2-1B-Instruct-tool-ex01
llama3-archimate-merged
CasAuTabom24BcmlKaajtmentKaa12816
Qwen3-4B-SFT-KuhnPoker-step_350
Qwen3-14B
Qwen3-4B-chess-10K-single-move-sft-2025-05-05-red-1K-no-cot-checkpoint-240
demonstration
GoToCompany-llama3-8b-cpt-sahabatai-v1-instruct-Med_QA_LoRA
Llama-3.1-8B-Instruct-SFT-CoT-short-full-3-alfworld
characters_trained
qwen2.5_0.5b_base_scratch_reasoning_finetune
Hermes-3-iSMART
Qwen-2.5-7b-tokenizer
Llama-3.2-1B-Instruct-Chat-sft
Llama-3.1-8B-full-pt-new
Qwen2.5-0.5B-Instruct-Gensyn-Swarm-camouflaged_tame_alpaca
e1_science_longest_qwq_together
prefDpo
llama3-8b-full-pretrain-control-tweet-1m-en
Qwen2.5-7B-Instruct-userfeedback-iter1
Qwen2.5-7B-Instruct-userfeedback-iter2
0604_key_cache_qwen3_8b_new
ultrafeedback_binarized-alpaca-llama-3-1b-2-epochs-alpha-0.4-beta-0.2-2-epochs
Meta-Llama-3-8B-Instruct-GRPO-injected-alpaca-2000-checkpoint-6000
Meta-Llama-3-8B-Instruct-GRPO-injected-alpaca-2000-checkpoint-8000
llama3-8b-full-pretrain-mix-high-tweet-1m-en
PLEX-0.1-8b