"Enhancing LLM Powered Development with Clojure's REPL" by Colin Fleming

SUMMARY

Colin Fleming shares his journey using LLMs like Claude 3.5 Sonnet for translating Clojure code in Cursive IDE to Kotlin, emphasizing REPL integration for context-rich, iterative AI-assisted development.

STATEMENTS

The speaker prompted his daughter for a stochastic parrot image, highlighting non-determinism in creative tasks similar to LLMs.
Up to six months ago, the speaker's AI use was basic, involving small applications, advice, rubber ducking, and getting started with unfamiliar tech.
Claude 3.5 Sonnet marked a step change in reliable code generation capabilities.
The speaker develops Cursive, a Clojure plugin for IntelliJ, which runs on the JVM.
IntelliJ was historically millions of lines of Java code, leading JetBrains to create Kotlin as a better alternative.
Kotlin uses coroutines as its concurrency primitive, similar to core.async in Clojure or Java virtual threads.
IntelliJ documentation recommends coroutines over threads since 2024 for asynchronous operations.
Cursive codebase is split evenly: 35,000 lines each of legacy Java, newer Kotlin, and Clojure.
JetBrains indicates that Clojure code in Cursive is becoming obsolete due to coroutine-based extension points requiring Kotlin.
The speaker has 10 years of Clojure code and avoids rewriting it manually.
LLMs excel at code translation, akin to natural language translation in the original Transformer paper.
The speaker translated Python's difflib to Clojure, showing bidirectional utility.
AI tooling is mostly in Python, so LLMs generate Clojure specs from Python ones, about 95% accurate.
Translating a Clojure namespace to Kotlin takes 30 minutes to two hours of fixes.
IntelliJ code can be translated to Clojure to avoid Java compilation hassles.
Chris Oakman used LLMs to translate JavaScript implementations for Clojure Standard Format into other languages like Python.
ClojureDart lacks training data, making LLM support challenging due to its syntax differences from standard Clojure.
The speaker plans to translate the TypeScript parser from the TypeScript compiler to Kotlin for better npm support in ClojureScript.
Context is crucial for coding tasks with LLMs, optimizing what fits in the fixed-size context window measured in tokens.
LLM API calls are stateless; conversation history is resent each time to simulate memory.
System prompts set the LLM's role, like expert software developer, with instructions for behavior.
For coding, context includes repo summary, project layout, documentation, APIs, and file contents alongside the conversation.
Optimizing context in large projects involves tricky choices to prioritize relevant information.
Prompts are vital; Claude's Artifacts feature uses a massive 3,000-word prompt with examples to handle diverse user interactions.
Prompt engineering has evolved from black magic to clear, explicit instructions with if-then logic and examples.
The speaker sets up LLM access in the REPL using Cursive's project index for dynamic context building.
Translating namespaces involves finding vars, dependencies, classes, methods, and distinguishing project vs. external references.
The speaker's Kotlin code style mimics Clojure: top-level functions, immutability, and ported transducers.
Prompts include case-specific instructions, like handling transducers or migrated namespaces, generated from IntelliJ APIs.
Generated prompts can reach 1,500 lines and 14,000 tokens, larger than production examples, yet LLMs handle them effectively.
Initial translation outputs to REPL stdout required manual copy-paste, which was unsustainable.
A multi-line UI prompt input improved iteration but lacked ergonomics.
Inline auto-completion in AI tools is often annoying due to frequent errors and deletion needs.
The speaker implemented a chat window with artifacts, allowing versioned, diffable code iterations in the editor.
Refinements include sending editor errors, REPL outputs, test failures, or code reviews to the LLM.
LLMs are better at code review than generation; the speaker used one LLM to critique another's output.
Vision models enable screenshot feedback for UI issues; structured feedback like DOM could enhance debugging.
REPL works well for non-iterative tasks or agentic workflows but struggles with separated editor-project setups.
Editor integration is essential for ergonomic iteration, feedback from multiple sources, and user control.
Demo shows chat for generating edge case tests for Paredit in Cursive, with apply-to-editor functionality.
LLMs handle tricky tests involving text-based Clojure forms and indentation well.
For multi-file changes, LLMs manage up to eight consistent updates in one response.
The Caveman tutorial was implemented 100% by Claude, available on GitHub with conversation transcript.
Current LLMs excel at small local changes but falter on repo-wide ones, needing better specs and planning.
GitHub Workspace generates editable specs and plans from issues for repo changes.
Literate programming's sync issues may resolve if LLMs update code from text specs.
Tool developers face questions on investing in low-level features like Paredit or refactorings as LLMs advance.
Code indexing will grow in importance for retrieving relevant codebase parts during changes.
Developer tools may radically change in 1-5 years, with uncertain futures.
A student's concern about AI displacing jobs highlights broader anxieties, though short-term impacts seem low.
A 3D artist's job transformed overnight with Midjourney, shifting from artisan work to prompting, losing joy.
Expressive languages like Clojure may lose value if optimized for LLMs over humans; training data scarcity hurts.
Typed, explicit languages like Java aid LLM understanding; Clojure tools promote visible intent.
Maintain wonder at LLM capabilities despite limitations; computing should remain fun.
An 8-year-old built a Harry Potter chatbot in 20 minutes with minimal guidance using Cursor IDE.
LLMs have ethical, social, and ecological issues, but enable magical experiences like family coding.

IDEAS

Prompting a child for art yields unpredictable yet charming results, mirroring LLM non-determinism.
Basic AI uses like rubber ducking evolve into transformative tools with model advancements like Claude 3.5.
Kotlin's coroutines force legacy Clojure code migration, creating a practical crisis solvable by AI.
Code translation leverages LLMs' transformer roots, achieving near-perfect initial drafts.
Bidirectional translation expands ecosystems, like Python AI libs to Clojure.
95% accuracy in translations minimizes manual fixes, accelerating development.
Scarcity of ClojureDart data highlights niche language vulnerabilities in AI era.
Translating massive parsers like TypeScript's could bootstrap IDE features.
Context windows demand ruthless prioritization for coding effectiveness.
Stateless APIs simulate memory by resending history, inflating token costs.
Enormous prompts with examples tame LLMs for production tools like Artifacts.
Caching static prompts cuts costs in iterative workflows.
If-then prompt logic replaces guesswork, making engineering systematic.
REPL-driven dynamic context from IDE indices personalizes translations.
Dependency graphing informs prompt relevance, like including transducer specs.
Kotlin styled like Clojure eases AI-mediated porting.
Editor artifacts enable versioned, diffable iterations without copy-paste drudgery.
Auto-refinement from errors or REPL outputs loops feedback seamlessly.
Cross-LLM code reviews add meta-layer amusement and utility.
Vision and structured feedback (e.g., DOM) unlock multimodal debugging.
Separated REPL-editor flows complicate ergonomics; integration is key.
Agentic REPL workflows suit autonomous tasks without human loops.
Chat UIs with file contexts generate nuanced tests, like Paredit edges.
Applying LLM proposals directly to editors preserves indentation fidelity.
Multi-file diffs in single responses scale small projects efficiently.
Tutorial implementations reveal LLMs' end-to-end coding prowess.
Stuck loops signal when to intervene manually in iterations.
Editable specs/plans in tools like GitHub Workspace bridge human-AI collaboration.
Literate programming revives via AI-syncing text-to-code.
Low-level tool investments risk obsolescence as AI handles minutiae.
Indexing evolves to predict change impacts dynamically.
Job transformations, not losses, redefine creative roles profoundly.
LLM-optimized languages prioritize data volume over expressiveness.
Visible intent in code aids both humans and machines.
Wonder sustains innovation amid cynicism about AI limits.
Kid-friendly IDEs democratize creation, sparking joy instantly.
Ethical trade-offs persist, but personal magic moments justify exploration.

INSIGHTS

LLMs transform code migration from decade-long burdens to hours of refinement, unlocking legacy preservation.
Context optimization in fixed windows is the linchpin of effective AI coding, demanding project-specific curation.
Massive, example-rich prompts evolve LLMs from erratic oracles to reliable production assistants.
Editor-REPL symbiosis enables iterative refinement, turning raw outputs into polished integrations seamlessly.
Dependency-aware prompting personalizes translations, bridging language idioms like Clojure transducers in Kotlin.
Feedback loops from errors, reviews, and visuals accelerate convergence on correct solutions.
Multi-file consistency emerges in bounded scopes, hinting at scalable repo evolution with planning.
Specs and plans formalize human intent, making large changes auditable and editable across agents.
AI may invert literate programming, ensuring specs drive code sync rather than vice versa.
Tool evolution favors indexing and prediction over manual manipulations as AI absorbs trivia.
Job essence shifts from crafting to curating, eroding artisan joy in creative fields.
Niche languages risk AI marginalization without data abundance; explicitness becomes a survival trait.
Sustained wonder counters hype fatigue, fostering playful computing rediscovery.

QUOTES

"prompting my daughter is not I don't always get the results I expect it's there's definitely some problematic non-determinism going on there"
"code translation might be the secret superpower of llms because they're really really good at"
"context is everything like this is this is kind of the open research area for coding tools um and uh using llm"
"the main secret source to this is actually just a prompt so it's just something that they T onto the system prompt"
"llms are actually surprisingly good at reviewing code they're probably better at reviewing code than they are at writing it actually"
"editor integration of these tools is actually really essal"
"that's incredible I cannot believe that just worked you know that it is amazing what computers can do"
"the part that I loved about my job is gone"
"maybe before too long we'll actually just be editing the text and the Machine will update the code and it will always be in sync"
"I don't think AI is going to take anyone's jobs in the 2-year um time frame but you're far beyond that who knows"
"everything is amazing and uh nobody is happy"
"we should keep the fun in Computing right we should be stretching computers seeing what they can do"

HABITS

Using REPL to access project indices and APIs for dynamic context building during coding sessions.
Maintaining a Clojure-like style in Kotlin code: top-level functions, immutability, and transducer ports.
Iterating LLM outputs via editor chats, applying diffs directly after visual review.
Generating case-specific prompt templates with if-then logic and API specs before translations.
Incorporating REPL evaluations and error feedback into refinement loops for immediate fixes.
Reviewing LLM-generated code with another model for unbiased critiques.
Adding file or directory contexts to chats for targeted discussions, like test case generation.
Manually intervening in stuck LLM loops to reset and proceed hybridly.

FACTS

Claude 3.5 Sonnet, released about six months before October 2024, revolutionized reliable code generation.
IntelliJ codebase historically comprised millions of Java lines; now shifting to Kotlin coroutines since 2024.
Cursive has approximately 35,000 lines each of Java, Kotlin, and Clojure code after 10 years.
Transformer architecture's original paper used natural language translation as its example task.
Claude Artifacts prompt is about 3,000 words or 4,500 tokens, including full conversation examples.
Generated translation prompts can exceed 1,500 lines and 14,000 tokens for single files.
ClojureDart has even less training data than standard Clojure, compounded by Flutter's peculiarities.
GitHub Workspace requires application for access and generates editable plans from issues.
Midjourney's improvements transformed a 3D artist's weeks-long tasks into two-day prompts.
An 8-year-old implemented a Harry Potter chatbot in 20 minutes using Cursor IDE.

REFERENCES

Karen and Maron's talk at Clojure/conj 2024 on LLM prompting issues for slide images.
Claude 3.5 Sonnet by Anthropic.
Kotlin language by JetBrains.
IntelliJ documentation on coroutines.
Core.async in Clojure.
Python's difflib library.
AI APIs client libraries in Python (e.g., OpenAI, Anthropic).
Chris Oakman's Parinfer and Clojure Standard Format implementations.
ClojureDart.
Flutter framework.
TypeScript compiler's parser.
"Attention is All You Need" Transformer paper.
Claude's Artifacts web interface.
James Reeves' com library for templating.
Cursive LLM commands namespace.
Google Copilot.
GitHub Workspace.
Caveman tutorial for Clojure HTTP websites.
Louis C.K.'s "Everything is Amazing and Nobody is Happy" sketch.
Preface to SICP (Structure and Interpretation of Computer Programs).
Cursor AI-focused IDE.
Reddit post by 3D artist on Midjourney job changes.
Donald Knuth's literate programming concept.

HOW TO APPLY

Identify legacy code needing migration, like Clojure to Kotlin for coroutine APIs, and prepare namespaces for translation.
Set up REPL in your IDE to query project indices, extracting vars, dependencies, classes, and methods transitively.
Graph dependencies to distinguish internal project elements from external libraries or JDK sources.
Style target language code idiomatically, such as porting transducers to Kotlin for functional similarity to Clojure.
Generate API specs automatically using IDE tools for included utilities or dependencies in prompts.
Build dynamic prompts with if-then cases: include source code for direct references, signatures for indirect ones.
Invoke LLM via API in REPL, templating prompts with libraries like com for context insertion.
Output translations to editor artifacts for versioned diffs; iterate by sending errors or REPL results back.
Apply refined changes directly to files, verifying indentation and semantics before committing.

ONE-SENTENCE TAKEAWAY

Integrate LLMs deeply with REPL and editors for context-driven, iterative code translation enhancing Clojure development.

RECOMMENDATIONS

Prioritize context curation in LLM prompts by dynamically pulling project-specific dependencies and APIs.
Use massive, example-laden system prompts to guide LLMs through complex interactions reliably.
Implement editor-based artifacts for diffable, versioned iterations to streamline refinements.
Leverage REPL feedback loops with errors and evaluations for automatic issue resolution.
Employ cross-model code reviews to catch subtleties missed in generation.
Generate edge case tests via chat contexts to bolster test suites proactively.
For multi-file tasks, start with clear specs and plans to guide LLM consistency.
Opt for languages with abundant training data and explicit syntax to maximize AI efficacy.
Maintain hybrid workflows: intervene manually when LLMs loop unproductively.
Foster wonder in AI experiments to sustain innovative, fun computing practices.
Explore vision and structured feedback for UI debugging in multimodal setups.
Invest in code indexing to predict and retrieve relevant snippets for changes.
For greenfield projects, select ecosystems proven with LLM tools like Python or Java.

MEMO

Colin Fleming, the creator of Cursive—a popular Clojure plugin for IntelliJ—delivered a compelling experience report at Clojure/conj 2024, detailing how large language models (LLMs) like Anthropic's Claude 3.5 Sonnet have revolutionized his workflow. Once skeptical about AI's practical utility beyond basic queries and small apps, Fleming found a breakthrough in code translation after Kotlin's coroutines rendered parts of his decade-old Clojure codebase obsolete. JetBrains' shift toward asynchronous APIs in IntelliJ forced the issue: with Cursive evenly split across 35,000 lines each of legacy Java, modern Kotlin, and Clojure, manual rewriting was untenable. Enter LLMs, whose transformer architecture—originally honed on natural language translation—proves "astonishingly good" at porting code, achieving 95% accuracy that requires only minor fixes.

Fleming's approach hinges on Clojure's REPL as a bridge to the IDE, dynamically assembling context from project indices. For a namespace translation, the REPL identifies vars, classes, methods, and transitive dependencies, flagging internal versus external references. Prompts swell to 1,500 lines and 14,000 tokens, incorporating API specs, source snippets, and case-specific instructions—like handling ported transducers in his functional Kotlin style. This isn't haphazard; it's systematic prompt engineering, evolved from "snake oil" to explicit if-then logic with examples, echoing the massive prompts behind Claude's Artifacts feature. Early experiments via web interfaces faltered, but REPL integration yielded reliable results, from Python's difflib to Clojure specs for AI libraries.

Iteration emerged as the real game-changer, demanding deeper editor ties. Fleming bypassed annoying inline autocompletions—frequent in tools like GitHub Copilot—for a chat UI with artifacts: code versions appear in scratch files, diffable and arrow-navigable. Refinements pull from red squigglies (editor errors), REPL outputs, or even test failures, looping feedback without copy-paste drudgery. LLMs shine here, often better at reviewing than writing; Fleming amusedly pitted OpenAI against Anthropic in critiques. Multimodal tweaks, like screenshotting UI issues or dumping Swing DOMs, further enhance debugging. Yet challenges persist: separated REPL-project setups complicate ergonomics, favoring integrated flows for user control.

In demos, Fleming showcased practical wins. A chat with Paredit test context generated edge cases, proposing changes that respected do-seq blocks and indentation—tricky for text-based Clojure forms. Applying them directly to the editor felt seamless, impressing even him. Scaling to multi-file edits, he fully implemented Clojure's Caveman tutorial via Claude—100% AI-written code on GitHub, handling eight consistent changes per response. For larger repos, tools like GitHub Workspace offer promise: from issues, they craft editable specs and plans, outlining current/proposed states. Fleming's Clojure trials faltered on sparse data, underscoring niche languages' AI vulnerabilities, but fuller specs could unlock repo-wide refactors.

Looking ahead, Fleming ponders AI's disruptive horizon for tools and developers. Low-level features like Paredit manipulations or refactorings may wane as LLMs absorb them, shifting focus to advanced indexing for change prediction. Literate programming—interweaving specs and code—could revive, with AI ensuring sync by updating from text. A waiter's anxiety at Virginia Tech echoes broader fears: AI won't eliminate jobs soon, but it reshapes them, as a 3D artist's artisan craft became Midjourney prompting overnight, stripping joy despite efficiency. For Clojure, training data scarcity and runtime opacity pose risks; explicit, typed code aids LLMs, nudging communities toward visible intent.

Yet Fleming urges wonder amid cynicism. Quoting Louis C.K., he notes how amazement fades quickly—evident in Hacker News dismissals of LLM feats. From SICP's preface, he advocates keeping computing fun, stretching machines playfully. His daughter's 20-minute Harry Potter chatbot in Cursor IDE crystallized this: despite ethical qualms over data provenance, energy use, and displacement, such magic moments redefine creation. As developer tools morph in 1-5 years, Fleming's hybrid REPL-LLM path offers a blueprint—optimizing for humans and machines alike.