Name: jinrui123/llamasrnn-grpo-epoch001-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jinrui123

LLMasRNN GRPO Policy Epoch 001 Merged

This model, jinrui123/llamasrnn-grpo-epoch001-merged, is a 3.2 billion parameter causal language model. It was created by merging a LoRA adapter into the meta-llama/Llama-3.2-3B-Instruct base model. This particular checkpoint represents the first completed epoch of training within the LLMasRNN project.

Key Capabilities & Training

Specialized Policy Model: Optimized as a memory-update and policy head using a custom GRPO (Generalized Reinforcement Learning with Policy Optimization) implementation.
Reinforcement Learning: Trained with GRPO-style reinforcement learning over trajectory rollouts, utilizing rubric-based rewards for optimization.
Longitudinal Clinical Prediction: Specifically intended for research and experimentation in longitudinal clinical prediction workflows, such as diagnosis prediction based on evolving patient summaries.
LoRA Adapter Training: The policy was initially trained as a LoRA adapter with a rank of 16 and then merged into the base model.
Epoch 1 Metrics: Achieved a policy loss of 0.435 and a mean reward of 0.751 during its first epoch of training.

Intended Use Cases

Research & Experimentation: Ideal for studies within the LLMasRNN project focusing on longitudinal clinical prediction, diagnosis prediction, and RL/GRPO policy experiments.
Ablation Studies: Suitable for evaluating merged policy checkpoints and conducting ablation studies.

Limitations

This is an epoch-1 checkpoint and not a fully converged model.
Training and evaluation were performed on internal project data and custom configurations, not standardized public benchmarks.
Not validated for clinical deployment or medical decision support in production environments.

Overview

LLMasRNN GRPO Policy Epoch 001 Merged

Key Capabilities & Training

Intended Use Cases

Limitations

Full Model Card (README)