The Best Model For Frontend Design Is...

SUMMARY

Theo, a developer from t3.gg, tests frontier AI models like Claude Opus, GPT-5.2, and Gemini 3 Pro for frontend design quality, revealing how a simple markdown "skill" dramatically improves outputs, especially for Opus.

STATEMENTS

Frontier AI models such as Gemini 3 Pro, Opus 4.5, GPT 5.2, and Sonnet 5 are all highly capable but vary in strengths and weaknesses.
Gemini 3 Pro struggles significantly with tool usage and instruction following in CLI environments.
Common AI-generated designs often feature generic elements like purple gradients and uniform fonts, resulting in "AI slop."
Theo ranks Opus 4.5 lowest, followed by GPT 5.2, then Gemini for inherent design capabilities.
Theo's preferred approach uses a combination involving Opus to create high-quality designs for projects like Lawn, Shoe, and an image studio app.
The secret method involves feeding Opus a specific markdown file called the "front-end design skill" to enhance output quality.
Railway, a deployment platform, is praised for its ease in deploying repos, databases, and services, with fast build times up to 10 times quicker than alternatives.
Railway bills only for CPU usage time, not idle server time, making it cost-effective.
Theo tests models by prompting them to build five unique marketing homepages for "T4 Canvas," an image generation studio app.
Prompting models to create multiple unique designs simultaneously leads to more varied and higher-quality outputs than repeated single generations.
Models exhibit templated behaviors: GPT-5 has about 10 design templates, Opus around 20, and Gemini only 1-2.
The front-end design skill is a markdown file that guides models to avoid generic aesthetics and focus on intentional, context-specific designs.
The skill emphasizes bold maximalism or refined minimalism with precision, varying themes, fonts, and avoiding clichés like Roboto or purple gradients.
Without the skill, Opus produces poor designs with noise textures, purple gradients, and repetitive layouts.
GPT-5.2 outputs are text-heavy and editorial, often with poor contrast and similar structures across variations.
Gemini's CLI is unreliable, frequently getting stuck or hallucinating errors, but its default designs show more variety and appeal.
With the skill, Opus generates exceptional, usable designs with better UX, like navigation switches between variants.
Iterating on preferred designs works far better with Opus, producing malleable, inspired follow-ups, unlike Gemini's template-bound outputs.
Kimmy K2.5, an open-weight model, shows promise but struggles with code structure and often overdoes aesthetics without the skill.
Community polls favor Opus with the skill for its steerability and higher hit rate on iterations over Gemini's sporadic quality.

IDEAS

AI models' design outputs reveal underlying training templates, with Opus drawing from a broader, more diverse set than Gemini's limited ones.
Forcing models to generate multiple unique designs in one session exploits their context awareness to avoid repetition and yield fresher ideas.
A simple markdown "skill" file can unlock hidden design potential in models like Opus by providing behavioral guidelines without altering the core model.
Deployment platforms like Railway democratize server management by automating GitHub integrations, databases, and usage-based billing for indie developers.
Generic AI aesthetics, such as purple gradients and system fonts, stem from overrepresented training data, making context-specific prompting essential for originality.
Iterating on AI-generated designs by referencing liked elements demonstrates Opus's superior reasoning for refinement, turning prototypes into polished work.
CLI tools for models like Gemini suffer from poor instruction-following, highlighting the gap between raw intelligence and practical usability in development workflows.
Open-weight models like Kimmy K2.5 approximate closed models' design skills but falter in code organization, requiring human fixes for viability.
Community feedback via live polls can validate AI tool preferences, showing Opus's malleability edges out Gemini's initial flash for sustained design work.
Marketing homepages for AI tools like T4 Canvas benefit from varied aesthetics—brutalist to minimal—to match the creative, power-user audience.
Noise textures and floating elements in AI designs often signal unguided creativity, but skills redirect this toward intentional, production-grade interfaces.
Billing only for active CPU time in cloud services like Railway could revolutionize costs for low-traffic apps, aligning expenses with actual value.
Models' tendency toward "editorial" or "terminal" aesthetics reflects biases in web design datasets, pushing users to explicitly counter them.
Blurred animations and drop shadows in AI UIs can enhance immersion if executed precisely, as seen in skilled Opus outputs.
Frontend skills promote "unexpected choices" to break clichés, fostering designs that feel human-curated rather than algorithmically bland.
Hybrid workflows—starting with Gemini for sparks and iterating with Opus—leverage each model's strengths for end-to-end design pipelines.

INSIGHTS

The front-end design skill transforms Opus from a weak designer into a versatile creator by embedding principles of intentionality, revealing how context injection amplifies model potential beyond raw training.
AI design quality hinges on template diversity; Opus's 20+ patterns enable nuanced iterations, while Gemini's scarcity leads to brittle, non-adaptive outputs.
Prompt engineering via multi-variant generation harnesses models' self-referential context to enforce uniqueness, mirroring collaborative human design brainstorming.
Usability gaps in model harnesses, like Gemini's CLI failures, underscore that intelligence alone insufficient for tools—robust engineering is key to real-world adoption.
Usage-based billing in platforms like Railway mirrors AI's on-demand ethos, potentially shifting developer economics from fixed costs to scalable innovation.
Iterating on AI designs exposes reasoning depth: Opus internalizes feedback for coherent evolutions, suggesting advanced models act as collaborative partners rather than one-shot generators.
Avoiding "AI slop" requires explicit anti-cliché directives, highlighting how training data biases perpetuate sameness unless actively disrupted.
Open-weight models like Kimmy K2.5 democratize design access but reveal a ceiling in structural tasks, emphasizing the need for hybrid human-AI workflows.
Community validation through polls integrates social proof into tool evaluation, accelerating consensus on emerging tech like design skills.
Aesthetic variety in AI outputs— from brutalist to organic—tailors to user intent, but only skilled models balance novelty with functionality for professional use.
Steering models toward precision over intensity in designs fosters efficiency, as seen in Opus's shift from gradients to purposeful minimalism.
The markdown skill's power lies in its simplicity, proving that modular, reusable prompts can standardize excellence across AI ecosystems without proprietary overhauls.

QUOTES

"If you want to make things that don't look like AI slop with everyone's favorite purple gradients and the same exact font on everything, what models the best at design?"
"This in particular has helped a ton with the designs I get out of the models. Not just because if you roll enough times, you get something better. But when the model within its context is doing multiple different designs with the instruction of making them unique, you're more likely to get unique designs."
"Never use generic AI generated aesthetics like overused font families like robboto intererial system fonts, cliche color schemes, particularly purple gradients on white backgrounds, predictable layouts and component patterns, and cookie cutter designs that lack context specific character."
"Gemini randomly can put out good designs, but Opus actually does mostly what you tell it to."
"I know it seems crazy that a markdown file can do this much, but it really does seem to help a ton."
"The gap between the two is insane. With the skill, without the skill, with the skill."
"All of them are just pattern matching and effectively applying templates. But Opus' ability to understand what you like about a design and then do follow-ups based on that is significantly stronger overall."

HABITS

Generate multiple unique design variants in a single prompt to leverage model context for diversity and quality.
Use markdown "skills" as reusable context to guide AI behavior toward specific tasks like avoiding generic aesthetics.
Iterate on preferred AI outputs by describing liked elements, prompting for refined versions to build iteratively.
Deploy prototypes quickly via platforms like Railway, integrating GitHub for auto-updates to focus on creation over ops.
Test AI models across harnesses like Claude Code or CodeEx, fixing issues manually to ensure reliable workflows.
Poll communities live for feedback on AI-generated designs to incorporate external preferences early.
Steer clear of clichés in prompts by explicitly banning elements like purple gradients, enforcing originality.

FACTS

Railway deploys full-stack apps, including databases and S3-like storage, in under 20 seconds with GitHub linkage.
AI models like GPT-5 exhibit around 10 design templates from training, Opus about 20, and Gemini only 1-2.
Build times on Railway are up to 10 times faster than competitors, with parallel server handling and one-click fallbacks.
Opus with the design skill adds UX features like variant switchers, absent in default modes.
Gemini's CLI often hallucinates errors, such as "Object.clone element is not a function," due to poor chat history training.
Kimmy K2.5, despite improvements, misplaces package.json files and fails builds without human intervention.
Frontend design skills rank in the top 10 on Vercel's agent skills leaderboard for popularity.

REFERENCES

Railway (sponsor, deployment platform: https://soydev.link/railway)
Claude Code (harness for Opus, GitHub: https://github.com/anthropics/claude-...)
T3.gg (Theo's site, Twitch, Twitter, Discord)
Lawn (alternative to Frame.io, Theo's side project)
Shoe (new off-library project by Theo)
Image studio app (Theo's tool for thumbnails and assets)
T4 Canvas (hypothetical image generation studio for testing)
Uami (open-source Google Analytics alternative, deployed via Railway)
Remotion (library for React-based video animation, has a design skill)
Vercel Agent Skills Directory (source for markdown skills: skills.sh)
Kimmy K2.5 (open-weight model for design testing)
CodeEx (harness used for GPT and others)
Gemini CLI (tool for running Gemini models)

HOW TO APPLY

Prepare a detailed prompt specifying the project, like building a marketing homepage for an image studio, including tech stack such as Vite, React, and TypeScript.
Instruct the model to create five distinct designs hosted on routes like /1, /2, etc., emphasizing uniqueness to trigger contextual variation.
For the default run, explicitly state "don't use any skills," to baseline organic design capabilities without enhancements.
Activate the front-end design skill by including its markdown content in the prompt or harness, directing the model to interpret requirements creatively and avoid clichés.
Run the prompt across models like Opus, GPT, and Gemini using compatible harnesses such as Claude Code or CodeEx, monitoring for CLI issues.
After generation, review outputs for aesthetics and functionality; select favorites and prompt iterations by describing specific liked elements for refinements.
Integrate community feedback, such as polls on preferred designs, to guide further tweaks and ensure alignment with user expectations.

ONE-SENTENCE TAKEAWAY

Opus with a front-end design skill markdown excels at iterative, unique frontend creation, surpassing Gemini's defaults for professional results.

RECOMMENDATIONS

Adopt Opus paired with the front-end design skill for frontend prototyping to unlock diverse, intentional aesthetics beyond generic outputs.
Always specify multiple unique variants in prompts to exploit AI context for superior design diversity and quality.
Explicitly ban clichés like purple gradients in skills to steer models toward context-specific, human-like designs.
Use Railway for rapid deployments of AI-generated apps, leveraging its speed and usage-based pricing to iterate without overhead.
Iterate designs by referencing specific positives from initial outputs, prioritizing Opus for its strong follow-up reasoning.
Test open-weight models like Kimmy K2.5 with skills but prepare for manual code fixes to bridge their structural limitations.
Run community polls on AI designs to crowdsource preferences, refining tools based on real-user vibes.
Combine models hybridly: Start with Gemini for quick sparks, then switch to Opus for malleable iterations.
Install skills globally via tools like Vercel's directory to standardize enhanced behaviors across development environments.
Focus prompts on intentionality—maximalism or minimalism—to elevate AI designs from slop to production-ready interfaces.
Monitor harness reliability, avoiding Gemini's CLI for complex tasks until instruction-following improves.

MEMO

In the fast-evolving world of AI-assisted development, Theo, a prominent figure in the TypeScript and React communities via his t3.gg platform, has uncovered a surprising hack for crafting frontend designs that transcend the bland uniformity plaguing automated outputs. Frontier models like Google's Gemini 3 Pro, OpenAI's GPT-5.2, and Anthropic's Claude Opus 4.5 all promise versatility, yet their default designs often devolve into "AI slop"—repetitive purple gradients, overused fonts like Roboto, and cookie-cutter layouts. Theo's experiments, detailed in a recent video, rank these models harshly for innate taste: Opus at the bottom for its noisy textures and clichés, GPT middling with text-heavy editorials, and Gemini leading but hampered by a buggy CLI that hallucinates errors and ignores instructions.

The breakthrough lies not in model choice alone but in a deceptively simple tool: a markdown file dubbed the "front-end design skill," openly available on GitHub. This document, part of Vercel's agent skills repository, acts as a behavioral blueprint, urging models to embrace intentionality over intensity—whether through bold maximalism or refined minimalism. It explicitly forbids generic tropes, demanding varied themes, fonts, and unexpected choices tailored to context. Theo tested it by prompting models to build five unique marketing homepages for "T4 Canvas," a fictional power-user interface for AI image generation. Without the skill, Opus churned out cringe-worthy gradients and floating anomalies; with it, the same model produced stunning, usable prototypes featuring blurred animations and seamless UX switches between variants.

Comparative runs exposed stark differences. GPT-5.2's outputs remained structurally similar, often brutalist and unreadable due to contrast issues, even when inadvertently using the skill. Gemini shone in defaults with cool, varied aesthetics—like neon dreams or organic flows—but its iterations faltered, ignoring feedback and reverting to baked-in templates. Opus, however, thrived under the skill's guidance, generating malleable designs that responded meaningfully to refinements. When Theo selected favorites and requested inspired follow-ups, Opus delivered coherent evolutions, from chunky brutalist layouts to fluid creations, while Gemini spat out unrelated garbage. Live chat polls during the demo overwhelmingly favored Opus for its steerability, affirming Theo's verdict: the skill unlocks Opus's hidden depths, turning pattern-matching into purposeful design.

This isn't just about aesthetics; it's a workflow revolution for indie developers like Theo, who juggles chaotic side projects such as Lawn (a Frame.io alternative), Shoe (an upcoming library), and a thumbnail-generating image studio. He credits the skill for enabling code-free asset creation, but warns of pitfalls: open-weight challengers like Kimmy K2.5 approximate quality yet bungle code structure, requiring manual tweaks. Platforms like sponsor Railway amplify this efficiency, deploying full apps—including open-source analytics alternatives like Uami—in seconds with usage-based billing that charges only for active CPU, not idle time. Up to 10 times faster than rivals, it embodies the seamless dev experience Theo champions.

Broader implications ripple through AI's role in creativity. Theo's findings suggest models aren't truly "designing" but remixing training data—Opus from 20 templates, Gemini from just one or two—forcing users to engineer prompts like multi-variant generations to break free. As AI integrates deeper into human workflows, tools like these markdown skills democratize excellence, potentially blurring lines between coders and creators. Yet, Gemini's unreliability underscores a caution: raw intelligence must pair with robust interfaces. For developers weary of slop, Theo's combo—Opus plus skill—offers a potent starting point, proving that a bit of markdown magic can elevate the mundane to the masterful.