FutureMa/Eva-4B
FutureMa/Eva-4B is a 4-billion parameter model, fine-tuned from Qwen/Qwen3-4B-Instruct-2507, specifically designed for financial evasion detection. It classifies answers in earnings call Q&A into 'direct', 'intermediate', or 'fully_evasive' categories. Trained on the 30,000-sample EvasionBench dataset, Eva-4B achieves 81.3% accuracy on a human-annotated test set, making it a strong open-source option for analyzing corporate disclosure quality.
Loading preview...
Eva-4B: Financial Evasion Detection Model
Eva-4B is a 4-billion parameter model developed by Shijian Ma, specifically fine-tuned from Qwen/Qwen3-4B-Instruct-2507 to detect evasive answers in financial earnings call Q&A. It performs a 3-way classification, categorizing management's responses to analyst questions as direct, intermediate, or fully_evasive.
Key Capabilities
- Specialized Evasion Detection: Classifies answers based on the Rasiah framework, identifying direct responses, partial evasions, or complete non-responses.
- Robust Training Data: Fine-tuned on the 30,000-sample EvasionBench dataset, constructed using a multi-model annotation framework involving Claude Opus 4.5 and Gemini-3-Flash, with LLM-as-Judge resolution for disagreements.
- Competitive Performance: Achieves 81.3% accuracy and 0.807 F1-Macro on a 1,000-sample human-annotated test set, ranking second among open-source models in its category.
- Full-Parameter Fine-Tuning: Utilizes full-parameter fine-tuning for optimal adaptation to the specific task.
Good For
- Research on Corporate Disclosure: Ideal for academic and industry research into the quality and transparency of corporate communications.
- Tooling for Financial Analysis: Can be integrated into tools designed to assist financial analysts in evaluating management's responsiveness during earnings calls.
- Identifying Evasive Language: Useful for automatically flagging instances where management may be sidestepping questions or providing indirect information.
Eva-4B is intended as a research artifact and its outputs should be human-reviewed for high-stakes financial decisions. For more details, refer to the accompanying paper: https://arxiv.org/abs/2601.09142.