Design Arena

Design Arena

Methodology

A subjective framework for evaluating AI design capabilities.
Join a community of voters and counting.

Hover over to see voters by country

Community-Driven Evaluation

This platform allows AI design capabilities to be evaluated through direct comparisons. Community preferences shape the rankings.

Rankings emerge from collective community preferences rather than curated opinions. Each pairwise comparison made counts equally and immediately updates the leaderboard.

Tournament Process

Each voting session randomly selects four models from the active pool, plus one backup. All models receive identical prompts and generate responses simultaneously.

The first two models to complete are presented anonymously. When a choice is made between these two designs, that pairwise comparison becomes one vote in the system. The process repeats with the next two models, generating another vote. Winners and losers are then matched, creating additional pairwise votes until complete 1st through 4th rankings are established.

Model identities remain hidden throughout evaluation to prevent brand bias. Every pairwise comparison result feeds directly into the leaderboard calculations.

Builder Process

The Builder category is an initial, best-effort procedure to compare current state-of-the-art builder capabilities under controlled conditions.

To start, we present the same prompt to all builders in one-shot requests, and randomly pair them off in four-way tournaments. Your blind vote powers the leaderboard.

Planned extensions include multi-turn prompts, a broader task set, and additional builders. Methodological feedback and improvement suggestions are greatly appreciated.

Ranking Calculation

Both win rates and Elo ratings are calculated from these pairwise votes. Win rates show the percentage of head-to-head victories each model achieves across all comparisons. Elo ratings use the standard formula 400 × log₁₀(win_rate / (1 - win_rate)) with confidence adjustments based on total number of pairwise battles.

Each pairwise comparison (vote) is weighted equally with no filtering or editorial adjustment.

Technical Standards

Models are configured with temperature 0.8 where supported and use their latest available versions. New models appear with "New" status until reaching 50+ pairwise evaluations for statistical reliability.

Tournament selection is randomized and all configurations are publicly documented. The anonymization process ensures fair evaluation based solely on design output.

All methodologies are open for community review and feedback.

For any inquiries, please reach us at contact@designarena.ai.

Tournament format

Category selection

A user selects a category (or one is randomly selected).

Prompt selection

A pre-generated category prompt is selected at random.

Model sampling

Four distinct models are chosen from the active pool.

Initial battles

Battle 1
Model A
vs
Model B
Battle 2
Model C
vs
Model D

Winner & loser brackets

Battle 3
Winners
Model A
vs
Model D
Battle 4
Losers
Model B
vs
Model C

Tiebreaker

Battle 5
1 win each
Model B
vs
Model D

Final ranking

1st
Model A
2nd
Model B
3rd
Model C
4th
Model D

This tournament structure ensures that every pairwise comparison contributes meaningful data to the rankings. Each of the 5 battles generates one vote that feeds directly into win rate and Elo calculations.

The format guarantees a complete ordering of all 4 models while maximizing the number of useful comparisons from each voting session.

Why This Matters

Taste is hard to measure

Good design isn't just functional — it reflects aesthetic values. Design Arena explores whether AI can exhibit taste, as measured by human judgement.

Honest evaluation, not hype

The focus is on what AI can actually do today. This is about grounded comparisons, not cherry-picked examples.

A mirror, not a scoreboard

These live matchups hold up a mirror to current model performance, limitations, and stylistic tendencies.

Design is more than pixels

Source code and visualizations reveal deeper insights into how state-of-the-art models "think" about UI.

Questions about the methodology?

Get in touch

By using our website, you agree to our Privacy Policy and Terms & Conditions.