Scrymore/stone-preview-4b
Scrymore/stone-preview-4b is a 4.5 billion parameter multimodal vision-language model fine-tuned by Scrymore, based on Qwen/Qwen3.5-4B, with a 32,768 token context length. It functions as a React Native UI engineer, capable of analyzing mobile app screenshots and generating matching UI screens through iterative tool calls and visual feedback. This model excels at visual reasoning and emitting structured XML tool calls to build React Native/Expo UIs, making it ideal for automated mobile UI development from design mockups.
Loading preview...
Stone Preview 4B: Multimodal UI Agent
Stone Preview 4B is a 4.5 billion parameter vision-language model, fine-tuned from Qwen/Qwen3.5-4B, designed to act as a React Native UI engineer. It takes a mobile app screenshot as a reference and iteratively builds the corresponding screen by emitting tool calls and refining its output based on visual feedback until it matches the reference.
Key Capabilities
- Visual Reasoning: Analyzes reference screenshots to understand layout, UI components, colors, spacing, and hierarchy.
- Tool-Call Emission: Generates properly-formatted XML tool calls (e.g.,
Read,Write,Edit,Render,Bash) to interact with a project and build code. - Iterative Development: Learns from visual feedback by comparing rendered output against the reference, enabling self-correction.
- React Native/Expo Code Generation: Specializes in producing code for mobile applications using the React Native and Expo frameworks.
Good for
- Automating the creation of React Native mobile app screens from design screenshots.
- Integrating into agentic workflows that require visual analysis and code generation with iterative refinement.
- Developing tools for rapid prototyping or UI automation for iOS-style consumer applications.
Limitations
- Primarily trained on iOS app screenshots (87 apps from Mobbin corpus); performance on Android, web, or desktop UIs is untested.
- Best utilized within an agent loop with actual tool execution and visual feedback; standalone generation may yield weaker results.
- Corpus skews towards consumer apps, potentially leading to lower quality for enterprise/B2B UIs.