Method v0.1

CreativeBench scores outputs as work products, not magic tricks.

Each run should keep the brief, prompt trail, model/provider, raw output, human edits, final asset, and reviewer notes together. That gives us a receipt, not just a screenshot.

Brief fit

Does the output answer the actual audience, channel, constraints, and purpose?

Originality

Does it avoid generic AI mush while staying coherent and usable?

Craft

Is the structure, pacing, wording, visual hierarchy, or concept execution strong?

Revision resilience

Can it improve after critique without collapsing into blandness?

Evidence trail

Are claims, sources, assumptions, and model changes visible enough to audit?

Production readiness

How much human repair is still needed before real-world use?