kabilesh-c/daedalus-designer-v2

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 26, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

kabilesh-c/daedalus-designer-v2 is a 1.5 billion parameter LLM, based on Qwen2.5-1.5B-Instruct, developed by Laksh Krish Kabilesh. It is specifically fine-tuned using GRPO on the Daedalus-Env to design auction mechanisms that are robust against strategic adversaries like colluders and shaders. This model, with a 32,768 token context length, uniquely acts as a 'referee' by outputting structured JSON mechanisms to optimize market outcomes, rather than participating as a player. Its primary use case is research and demonstration in LLM-driven adversarial mechanism design.

Loading preview...

DAEDALUS Designer v2: Adversarial Auction-Mechanism Design

kabilesh-c/daedalus-designer-v2 is a specialized 1.5 billion parameter LLM, built upon the Qwen2.5-1.5B-Instruct architecture, developed by Laksh Krish Kabilesh. Unlike typical LLMs that act as players, this model functions as a "referee," designing auction mechanisms to counter strategic adversaries. It is trained using GRPO (Group-Relative PPO) with Unsloth, 4-bit quantization, and LoRA on the kabilesh-c/Daedalus-Env environment.

Key Capabilities

  • Auction Mechanism Design: Given market observations (welfare, fairness, participation outcomes), the model outputs a structured JSON mechanism (e.g., auction type, reserve price, penalties).
  • Adversarial Robustness: Mechanisms are designed to be robust against colluders, shaders, dropouts, and exploiters.
  • Inverse RL Setup: Operates in an inverse reinforcement learning paradigm, where the model designs the rules rather than playing within them.
  • Schema-Locked Output: Ensures JSON output adheres to the DaedalusAction Pydantic schema, though strategic perfection is still under development.

Good for

  • Research in LLM-driven Mechanism Design: Ideal for exploring how LLMs can optimize complex economic systems.
  • Demonstrating Adversarial AI: Showcasing AI's ability to design robust systems against strategic agents.
  • Experimentation with GRPO and LoRA: Provides a practical example of these training techniques applied to a unique problem.

While schema-locked, the model is still in early development (50 GRPO steps) and may occasionally emit suboptimal policies, such as allowing coalitions in fully adversarial populations. It is intended for demonstration and research, without production guarantees.