rroshann/sec-sentiment-sftgrpo-deepseek-14b

TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Apr 24, 2026License:mitArchitecture:Transformer0.0K Open Weights Cold

The rroshann/sec-sentiment-sftgrpo-deepseek-14b is a 14.8 billion parameter DeepSeek-R1-Distill-Qwen-14B model, fine-tuned using Group Relative Policy Optimization (GRPO) for 5-class sentiment classification of thematic factors from U.S. industrials SEC filings. Developed by rroshann as part of an AllianceBernstein × Vanderbilt DSI capstone project, it specializes in financial materiality sentiment, providing ordinal labels, natural-language rationales, and confidence scores. This model is optimized for cohort-level ordinal ordering of predictions rather than per-sample accuracy, making it suitable for aggregating factor-level signals into filing-level insights for portfolio construction.

Loading preview...

Model Overview

This model, rroshann/sec-sentiment-sftgrpo-deepseek-14b, is a 14.8 billion parameter DeepSeek-R1-Distill-Qwen-14B architecture. It has been specifically aligned using Group Relative Policy Optimization (GRPO) to perform 5-class sentiment classification on thematic factors extracted from U.S. industrials SEC filings (10-K, 10-Q). The training involved a two-stage process: an initial supervised fine-tune (SFT) followed by GRPO alignment against a composite ordinal-plus-anti-neutral reward, with gold labels derived from realized-return quintiles.

Key Capabilities

  • Specialized Sentiment Analysis: Classifies financial-materiality sentiment for individual factor summaries from SEC filings into very_negative, negative, neutral, positive, very_positive labels.
  • Output Format: Provides a JSON output including the predicted label, a natural-language rationale, and a confidence score.
  • Cohort-Level Optimization: Designed for scenarios where the cohort-level ordinal ordering of predictions is more critical than per-sample accuracy, demonstrating significant lifts in portfolio-level metrics (e.g., L/S cohort spread, Information Ratio) over its SFT predecessor.
  • Best-of-N Decoding: Supports a sft_grpo_bon variant via Self-Consistency Best-of-N decoding at inference time, which further enhances portfolio-level performance at longer horizons without requiring separate weights.

Intended Use Cases

  • Financial Research: Ideal for academic or institutional research focused on extracting sentiment signals from U.S. industrials SEC filings for quantitative finance applications.
  • Portfolio Construction Support: Suitable for use within an aggregation layer that combines factor-level sentiment into filing-level signals for portfolio construction, particularly where ordinal ranking of cohorts is paramount.

Limitations

  • Domain Specificity: Strictly intended for U.S. industrials SEC filings; not a general-purpose assistant or suitable for sentiment analysis outside this domain.
  • Per-Sample Accuracy: While strong at the portfolio level, per-sample F1 gain over the SFT predecessor is minimal.
  • Research Use Only: Predictions are for research and reproducibility; not investment advice or audited for regulated deployment.