jinrui123/llamasrnn-grpo-epoch001-merged
The jinrui123/llamasrnn-grpo-epoch001-merged model is a 3.2 billion parameter causal language model, merged from a LoRA adapter trained on meta-llama/Llama-3.2-3B-Instruct. Developed within the LLMasRNN project, it is specifically optimized as a memory-update and policy head using GRPO-style reinforcement learning for longitudinal clinical prediction workflows. This model is designed for research and experimentation in diagnosis prediction conditioned on evolving patient summaries.
Loading preview...
LLMasRNN GRPO Policy Epoch 001 Merged
This model, jinrui123/llamasrnn-grpo-epoch001-merged, is a 3.2 billion parameter causal language model. It was created by merging a LoRA adapter into the meta-llama/Llama-3.2-3B-Instruct base model. This particular checkpoint represents the first completed epoch of training within the LLMasRNN project.
Key Capabilities & Training
- Specialized Policy Model: Optimized as a memory-update and policy head using a custom GRPO (Generalized Reinforcement Learning with Policy Optimization) implementation.
- Reinforcement Learning: Trained with GRPO-style reinforcement learning over trajectory rollouts, utilizing rubric-based rewards for optimization.
- Longitudinal Clinical Prediction: Specifically intended for research and experimentation in longitudinal clinical prediction workflows, such as diagnosis prediction based on evolving patient summaries.
- LoRA Adapter Training: The policy was initially trained as a LoRA adapter with a rank of 16 and then merged into the base model.
- Epoch 1 Metrics: Achieved a policy loss of
0.435and a mean reward of0.751during its first epoch of training.
Intended Use Cases
- Research & Experimentation: Ideal for studies within the
LLMasRNNproject focusing on longitudinal clinical prediction, diagnosis prediction, and RL/GRPO policy experiments. - Ablation Studies: Suitable for evaluating merged policy checkpoints and conducting ablation studies.
Limitations
- This is an epoch-1 checkpoint and not a fully converged model.
- Training and evaluation were performed on internal project data and custom configurations, not standardized public benchmarks.
- Not validated for clinical deployment or medical decision support in production environments.