davanstrien/qwen35-4b-iconclass-sft-brillfull
The davanstrien/qwen35-4b-iconclass-sft-brillfull model is a 4.5 billion parameter Qwen3.5-4B-VL variant, fine-tuned on the full-label davanstrien/iconclass-vlm-brillfull dataset with a 32768 token context length. This model was developed by davanstrien to test the impact of complete training labels on iconclass classification recall. It serves as a precision anchor within an anchored-fusion pipeline, achieving improved hierarchical recall when combined with semantic retrieval.
Loading preview...
Model Overview
The davanstrien/qwen35-4b-iconclass-sft-brillfull model is a 4.5 billion parameter variant of Qwen3.5-4B-VL, fine-tuned by davanstrien using Unsloth and TRL. It was specifically trained on the davanstrien/iconclass-vlm-brillfull dataset, which includes full Iconclass labels (averaging 4.36 codes per image) rather than truncated versions. The primary goal of this research model was to investigate whether complete training labels could overcome a persistent ~25% recall ceiling in Iconclass classification.
Key Findings & Capabilities
- Label Completeness Impact: Despite thorough training (eval_loss 0.47), the model's recall on a clean 788-image full-label test remained at 25.6% (code-recall), indicating that the 4B parameter model is capability-bound rather than label-bound.
- Performance Metrics: On the clean test set, it achieved an H-F1 score of 45.3 and a hierarchical recall of 46.4.
- Anchored Fusion: The research demonstrated that using this model as a precision anchor within an "anchored-fusion" pipeline significantly improves results. By gating in additional codes from semantic retrieval with a graded VLM-judge, performance on the same test set increased to H-F1 47.5 and hierarchical recall 57.6 without any additional training.
Recommended Use
- This model is best utilized as the precision anchor component within an anchored-fusion pipeline, particularly for tasks requiring improved hierarchical recall in Iconclass classification. It provides a strong base for subsequent semantic retrieval and judging mechanisms.