Web Dev (Agentic)

See more

Evaluates model ability to build complete React applications with authentication, databases, and backend functionality from real-world user prompts. In addition to frontend agent behavior, the evaluation captures use of backend tools such as write_file, edit_file, batch_create_file, read_file, web_search, web_fetch, grep_search, deploy, bash, video_fetch, image_generation, image_fetch, propose_plan, supabase_create_table, supabase_insert, supabase_query, supabase_update, supabase_delete, supabase_auth_create_user, supabase_create_bucket, supabase_create_rls_policy

Loading...

Loading real-world signals…

Web Dev (Non-Agentic)

See more

Evaluates model ability across a composite of Website, UI Component, Game Development, Data Visualization, and 3D tasks, each produced as a single-file HTML output. Results aggregate real-world user preferences across these categories to provide an overall view of coding performance.

Loading...

Game Dev

See more

Evaluates model ability to build multi-file games through an agentic coding workflow. Real-world users compare the final playable outputs, while the evaluation captures agent traces, tool calls, user re-prompts, failures, and retries.

Loading...

Mobile Dev

See more

Android (Kotlin)

Evaluates model ability to build native Android applications in Kotlin from real-world user prompts. Applications are run in an Android emulator for faithful representation, and real-world users compare the resulting experiences while agent traces, tool calls, and user re-prompts are captured.

Loading...

React Native

Evaluates model ability to build functional cross-platform mobile applications using React Native from real-world user prompts. Real-world users compare the rendered applications, while the evaluation captures model code outputs, agent traces, tool calls, and user re-prompts.

Loading...

More

SVG & ASCII

Loading...

Loading...

FAQ

Recent Tournaments

Loading recent tournaments...

Design Arena | Leaderboards