George Hotz | Programming | how I actually use agentic coding

SUMMARY

George Hotz demonstrates his restrained agentic coding workflow using AI like Claude for Tinygrad development and eGPU firmware, emphasizing careful code review to avoid slop.

STATEMENTS

George Hotz starts the stream by adjusting audio and explaining that yesterday's use of agentic coding was not ideal, as agents require constant oversight.
He reviews code generated by AI, noting duplicated code and unnecessary fallback functions, insisting on no duplication.
Hotz works on firmware for eGPUs using an FTDI for debugging, having Claude enumerate USB while he hints at issues like disabling parity.
He manages three pull requests simultaneously: DS perm instructions, RDNA4 support, and another AMD-related task.
Hotz avoids logging into GitHub on stream for security but monitors CI checks for green passes.
Agentic coding is useful for figuring things out but not for replacing hand-coding entirely; he limits himself to three or four tasks to avoid overwhelm.
He pushes changes to GitHub for diff review and warns against letting slop accumulate, as agents cannot self-manage it.
One part of Tinygrad was entirely coded by agents, but only when given loops to iteratively improve.
Agents perform better with incremental progress and clear gradients toward correctness, but fail at complex modeling like cycle-accurate emulators without hacks.
Hotz critiques AI-generated fixes, preferring explicit code like using set_default over hacks, and stresses reading every line.
Using agents can lead to self-delusion about understanding code, as reviewing feels easier than writing from scratch.
He references Andrej Karpathy's point that agentic coding atrophies writing skills but improves reading skills.
In reverse engineering firmware, Hotz interrupts AI to demand better register names from registers.h for debugging.
Hotz never lets agents commit directly, as their state management is unreliable, requiring human oversight.
The workflow involves constant intervention: fixing types, removing hacks, and ensuring assertions.
AI requires the same focus as programming; Hotz shows off aggressively on stream but uses it tastefully in reality.
He dislikes implicit variable access and copy-pasted code, pushing for explicit enums and branches.
Agents add useful generic improvements like using inst variables but often introduce confusing debug prints.
Hotz pushes cleanups to GitHub runners despite potential abuse, then reviews diffs line by line.
No AI prevents full understanding; without it, users back themselves into corners, unlike trusting a chessbot where goals align perfectly.
Agents still produce universally bad patterns that need fixing, but skills for using them mirror traditional programming: sifting relevant info quickly.
In USB enumeration, understanding hardware like pull-up on D+ line is crucial; AI aids but doesn't replace domain knowledge.
Hotz limits PRs without human review on Tinygrad, warning against unreviewed AI slop.
He compares AI agents to GPS: useful but requires vigilance to avoid autopilot drift.
An Anthropic study shows AI-assisted coders finish slightly faster but understand less, especially in debugging.
Hotz's productivity with agents is unclear; he avoids downsides by reviewing everything, predicting interactive refinement will work but first outputs are often best.
He prefers open-source tools like Kimmy over proprietary ones sending data back, valuing speed and control.
AI excels at quick one-off scripts for debugging, saving time on syntax in unfamiliar languages.
The takeaway is humans as seniors reviewing junior devs (AIs); workflows evolve but core skills persist like in accounting with spreadsheets.
Hotz underpredicted chatting and agent loops but warns against hype; companies all-in on AI without taste will fail.
Data efficiency matters as scaling hits data limits; LLMs compress the internet rather than truly learn via optimization.
Running large models locally needs massive RAM; Hotz experiments with AMD but hits speed barriers.

IDEAS

Agentic coding demands constant human intervention to eliminate duplicated code and hacks, turning AI into a fast junior developer rather than an autonomous force.
Limiting concurrent AI tasks to three or four prevents cognitive overload, mirroring human bandwidth constraints in complex projects.
AI thrives in closed loops with iterative feedback, like flashing firmware or refining tests, but falters without incremental progress toward clear goals.
Reviewing AI-generated code risks superficial understanding, fostering delusion more than hand-writing, which forces deeper comprehension.
Agentic workflows atrophy writing skills while honing reading and oversight abilities, shifting programmer roles toward vigilant editors.
Reverse engineering without datasheets relies on AI to infer from firmware, but humans must supply context like register includes for accuracy.
AI cannot self-correct "slop" like unnecessary assertions or vibe-based hacks, requiring line-by-line human scrutiny to maintain code integrity.
USB hardware nuances, such as D+ pull-up signaling device attachment, highlight that domain expertise remains irreplaceable despite AI assistance.
Interactive refinement resembles diffusion models: quick initial generation followed by prolonged polishing, questioning net productivity gains.
AI tools like Claude enable rapid one-off scripts for debugging, slashing mundane syntax time in unfamiliar languages from minutes to seconds.
Programmer rankings won't drastically shift with AI, as core skills—sifting information, debugging, taste—persist akin to spreadsheets revolutionizing accounting.
Hype around agentic coding as a 5x productivity booster is tongue-in-cheek; real workflows are restrained, not spammy, to avoid slop apocalypses.
Companies embracing unchecked AI without engineering oversight risk collapse, as non-technical policies drive adoption of flawed tools.
LLMs primarily compress internet data rather than learn via scalable search, hitting walls at data limits and challenging bitter lesson narratives.
Local inference for models like Kimmy demands 768GB RAM at high speeds, making affordable home labs challenging despite MoE efficiencies.
Human-AI dynamics evolve to "earth and moon": humans as stable seniors guiding orbiting, fast but erratic AI juniors.
Underestimating productization of AI features like chatting or loops leads to surprises, but vigilance counters overhyping unproven workflows.
Scaling LLMs toward human synapse counts (10^15) could continue, but current trillion-parameter models are 2-3 orders off, with MoE approximating brain sparsity.

INSIGHTS

Agentic coding amplifies human oversight needs, transforming programmers into meticulous reviewers who must internalize system state to catch subtle errors.
Iterative loops with clear success metrics enable AI to self-improve incrementally, but without them, it devolves into scattered hacks lacking structural integrity.
Superficial code review in agentic flows risks cognitive atrophy, as the ease of glancing erodes the discipline of creation, demanding deliberate deep dives.
AI's speed in generating prototypes masks underlying flaws, creating illusions of progress that demand prolonged refinement, akin to diffusion in image synthesis.
Core programming competencies—rapid information filtering, debugging intuition, aesthetic judgment—endure unchanged, positioning AI as an accelerator for skilled users only.
Unchecked AI adoption in organizations signals misalignment, where hype overrides engineering rigor, foreshadowing productivity pitfalls and strategic failures.
LLMs' compression of existing data caps their generalization, underscoring data efficiency as the next frontier beyond compute scaling.
Human-AI collaboration mirrors hierarchical mentorship, with humans providing gravitational stability to AI's volatile trajectories, ensuring aligned outcomes.
Productivity myths around AI dissolve under scrutiny; marginal speed gains often accompany comprehension losses, necessitating balanced, tasteful integration.
Local AI deployment barriers like RAM costs highlight centralization trends, yet MoE architectures tease distributed, brain-like efficiencies on the horizon.
Underpredicted AI evolutions, from conversational interfaces to autonomous loops, reveal the gap between theoretical potential and practical surprises.
Economic viability of AI assistance hinges on tangible boosts; willingness to invest 20% of salaries assumes equivalent returns, else tools become costly distractions.

QUOTES

"Do not duplicate code."
"You have to carefully read any of the code that it writes."
"Nothing about agentic coding prevents you from having the entire state of the system in your head."
"When the agent writes it, you can like look it over and be like, 'Yeah, I understand all of that.' And then you read it more carefully."
"You start to atrophy your code writing skill, but you improve your code reading skill."
"If you don't fully understand the code that your agent is writing, you're backing yourself into a corner."
"The skills required to use these are the identical skills from before. You just have to be able to sort through a lot of information a lot of information quickly."
"AI isn't some magical trick."
"You need to be the Earth and the AIs need to be the moon. You are a senior who's reviewing junior dev code."
"The slop apocalypse is not coming."

HABITS

Review every line of AI-generated code meticulously before integration to eliminate hacks and ensure understanding.
Limit active agentic tasks to three or four to maintain mental bandwidth and prevent overwhelm.
Interrupt AI processes to provide specific hints, like disabling parity or sourcing register names, for better outputs.
Push small changes to GitHub for CI checks and diff review, avoiding direct commits from agents.
Use AI for quick one-off scripts in unfamiliar languages or syntax, verifying results immediately.
Run full test suites manually after AI changes, feeding failures back into the loop for refinement.
Prefer open-source, local-run models like Kimmy for privacy and speed, avoiding data-sending proprietary tools.
Maintain domain knowledge by cross-referencing AI suggestions with hardware specifics, like USB signaling.

FACTS

Anthropic's study found AI-assisted coders finished tasks 2% faster but scored significantly lower on understanding, especially debugging.
Human brain has approximately 10^15 synapses, while current LLMs like Kimmy operate at 1 trillion parameters, 2-3 orders of magnitude smaller.
Running Kimmy locally requires at least 768GB of high-speed RAM, with Hotz achieving only 3 tokens per second on AMD setups.
DDR5 RAM costs $10-20 per GB, making a 768GB setup $7,680-$15,360 just for memory, excluding compute.
Original eGPU firmware enumerates USB successfully, but AI-generated versions fail due to improper register handling like X-data usage.
Tinygrad's test suite is too slow for full agentic loops, so changes are verified in batches with manual test runs.
RDNA4 support in Tinygrad emulator adds new instructions like sum of absolute differences with masked flags.
FTDI debug boards allow unbricking eGPU chips regardless of state, enabling safe firmware experimentation.

REFERENCES

Claude (AI tool for code generation and USB enumeration).
Tinygrad (open-source ML framework Hotz develops).
GitHub (for PRs, diffs, and CI checks like precon linter).
RDNA4 (AMD GPU architecture for emulator support).
registers.h (header file for eGPU register names and debugging).
Anthropic study on AI's impact on coding skills.
Andrej Karpathy's comments on skill atrophy from AI use.
Kimmy (local LLM alternative to Claude).
VLLM (inference engine tested for Kimmy on AMD).
OpenClaw.ai (mentioned in stream context).
Comma.ai (Hotz's company for AI hardware).
Tailscale (VPN for accessing Claude from Hong Kong).
FTDI (debug interface for eGPUs).
Yegeay Google platform rant (inspirational engineering piece).
Dorcas Schmidt-Huber take on LLMs (critique of scaling laws).

HOW TO APPLY

Begin by setting up a secure environment, like SSH with Tailscale, to access AI tools without exposing credentials on stream.
Identify a small, incremental task, such as adding RDNA4 support, and provide AI with clear loops like test-run-refine cycles.
Generate initial code via AI prompts, then immediately review for duplications, hacks, and implicit accesses, demanding fixes like set_default usage.
Push changes to GitHub for automated CI checks, monitoring linter passes and test failures without direct agent commits.
Interrupt AI mid-process to inject domain knowledge, such as including registers.h or disabling parity in USB code.
Feed test failures back into the AI loop, explaining errors and requiring assertions or explicit branches.
Manually run full test suites post-changes, verifying context like pcode parser bugs, then refine iteratively.
Limit sessions to 3-4 tasks, pausing to understand code fully before proceeding, ensuring no knots from unresolved issues.

ONE-SENTENCE TAKEAWAY

Master agentic coding by vigilantly reviewing AI outputs as a senior developer to harness speed without inviting slop.

RECOMMENDATIONS

Adopt restrained AI use: treat agents as fast juniors needing constant oversight, not autonomous replacers.
Prioritize code reading over writing; hone skills in sifting vast outputs for relevance to counter atrophy.
Verify all AI changes with manual tests and line reviews, never committing without full comprehension.
Experiment locally with open models like Kimmy to avoid data leaks and build speed-optimized workflows.
Focus on incremental loops for AI tasks, providing hardware context to enable self-closing progress.
Warn teams against hype-driven AI adoption; demand engineering taste to prevent productivity traps.
Invest in domain knowledge for hardware like USB, using AI for syntax aids but not core logic.
Scale AI assistance budgets to 20% of salaries only if proven boosts match, tracking comprehension metrics.
Join communities like Tinygrad Discord for collaborative debugging, sharing fixes to accelerate group progress.
Prepare for data efficiency shifts, compressing knowledge bases to extend LLM capabilities beyond internet limits.

MEMO

George Hotz, the hacker extraordinaire behind jailbreaks and self-driving tech, pulls back the curtain on his daily grind in a raw Twitch stream from February 2026. Far from the flashy agentic coding demos flooding social media, Hotz reveals a disciplined ritual: wrangling AI like Claude to tinker with Tinygrad, his minimalist machine learning framework, and debug eGPU firmware. Seated amid screens flickering with code, he SSHes into a remote Tiny Box, emphasizing that true productivity lies not in AI spam but in vigilant stewardship. "Yesterday was a bad example," he snorts, admitting his prior stream veered into overkill. Instead, he manages just three pull requests—RDNA4 emulator support, DS perm instructions, and USB enumeration—each a tight loop of prompt, generate, review, refine.

Hotz's workflow unfolds like a chess match against an erratic opponent. He feeds Claude precise hints, like disabling UART parity from a Slack note, to coax USB enumeration on freshly arrived eGPUs. The AI spits out code, but Hotz dissects it ruthlessly: no duplicated functions, no vibe-based hacks naming errors after functions. He interrupts mid-generation for better register names from headers.h, ensuring debuggability. Pushing diffs to GitHub, he eyes CI checks warily, unlogged for stream security. Agents excel in closed loops—flashing firmware, iterating tests—but stumble on holistic modeling, opting for quick fixes over accurate simulations. "You have to stay on top," he warns, echoing Andrej Karpathy: AI sharpens code-reading but dulls writing muscles.

Diving deeper, Hotz confronts the illusions of progress. An Anthropic study underscores his caution: AI users code slightly faster but grasp far less, bombing on debugging quizzes. He likens agents to GPS—handy for routes, hazardous on autopilot—urging vigilance lest slop creeps in. Reverse-engineering firmware sans datasheets, Claude infers registers, but Hotz supplies the hardware savvy, like D+ pull-ups signaling USB attachments. One Tinygrad module emerged fully agent-coded, yet only through relentless human pruning. He prefers open tools like Kimmy for local runs, decrying proprietary slop-senders. "I'm not sure it's a net win," he muses, as refinement drags longer than initial bursts, mimicking diffusion art.

The stream pivots to broader implications, Hotz's voice laced with wry skepticism. Hype peddlers promise 5x boosts, but he calls it tongue-in-cheek; unchecked AI invites apocalypses in codebases and companies alike. "If non-engineers drive policy, quit," he advises, invoking Jeff Dean's rants against platform bloat. Skills endure—sifting noise, tasting elegance—like accountants wielding spreadsheets. LLMs compress the web, not reinvent intelligence, hitting data walls per Schmidhuber's critique. Scaling to brain-like synapses beckons, but RAM barriers (768GB for Kimmy) loom large. Hotz eyes Shenzhen for cheap compute, pondering MoE efficiencies mirroring neural sparsity.

Yet optimism flickers amid the critique. Interactive loops, once futile pre-Opus, now close on their own for tasks like aliasing ops in Tinygrad. Hotz envisions kernel ops yielding to calls, unifying his framework. For unfamiliar tongues, AI crafts 15-second debug scripts, freeing focus for architecture. He caps at three tasks, preserving headspace, and fields Discord fixes collaboratively. The eGPU quest nears triumph: original firmware enumerates; AI versions inch closer, dumping registers sans X-data cruft.

In closing, Hotz dispels myths with Hong Kong's skyline implied in his Tailscale setup. Mondays demand real work, not party spams—produce more than consume, he urges. Agentic coding evolves programmers into earths orbiting AI moons, stable amid volatility. As fortunes pivot on tasteful tech adoption, Hotz's stream stands as manifesto: harness the future without delusion.