New BoilingFrogs lab

Creative AI needs a benchmark that tests taste, usefulness, and real-world craft.

CreativeBench will score AI-generated work the way people actually judge it: does it feel fresh, does it solve the brief, can it survive revision, and would you put it in front of a customer, reader, or board?

View the first benchmark frame See the scoring method

Why this exists

The leaderboard problem: the best demo often wins, not the best work.

Prompts are invisible

Most showcases hide prompt effort, cherry-picking, and how much human repair was needed.

Benchmarks miss texture

Creative work fails in the details: cliché, tone drift, visual clutter, weak story logic, or unusable output.

Teams need buying signals

Leaders need to know where creative AI is genuinely useful today, not just where it looks impressive for ten seconds.

First tracks

We’ll start with practical creative tasks.

Brand concept: campaign idea, positioning, taglines, and risks.
Article package: headline, structure, sourced explainer, and reader hook.
Visual brief: hero image prompt, art direction, and production notes.
Revision loop: improve a flawed draft without sanding away voice.