+1 WIS

Yet Another How I Did A Thing With An LLM Post

Yet Another How I Did A Thing With An LLM Post

Leaning into my awesome leadership skillz and laziness of reading documentation, I decided to practice the last marketable skill and write code using LLM as the majority contributor.

I used opencode and its default, free model, Big Pickle. This is a postmortem.

Disclaimer: I have written a bunch of Go code before, but not in recent years.

What was achieved

A personal workflow runner tool, it's working, here: https://gitlab.com/andras.horvath1/marathon, was had in about a week. I chose this because I wanted one for long-running tasks in the homelab, and couldn't find a decent tool that scratched my itches.

I wrote the basic data structures and a sort of "walking skeleton" by hand, then let opencode update various things, using a NixOS sandbox because I'm paranoid about leaking my other data.

I reviewed and hand-tested everything the LLM wrote.

What went well

As long as I

  1. Had a crystal clear mental model as to what I wanted (features, control flows, data structures) and
  2. Had a reasonable understanding of the toolkit that should be used (libraries, abstractions)
  3. Kept the task at hand minimal and super well defined
  4. Trusted, but verified every little detail

the LLM

  1. Produced okay code
  2. Read the docs so I didn't have to understand the details of TUI elements or CLI libraries.
  3. Was able to write tests and assorted boilerplate, which is neat.
  4. Had a reasonable structure in putting stuff into the "right" files, without being told so.
  5. Produced semi-useful, if sometimes incorrect, summaries of what it had done. It sped up my understanding of the code, but I couldn't trust it.
  6. It rarely broke existing functionality, perhaps because of my insistence on testing.

What didn't go well

  1. It struggled with concurrency patterns and took many tries and manual correction to get some right (for a "personal tool" at least).
  2. It tended to reinvent the wheel, often badly. The scrolling UI window for example took 193489713 tries and still didn't work right, until I found a library, and git reset --hard and retried with that.
  3. Sometimes it refactored Just Because, sometimes it generated two almost-identical functions for slightly different use cases.
  4. It often took a long time to come back with a result.

What was learned/confirmed

  1. Git continues being a godsend as a save-game functionality where I could easily save incremental progress or reset out of dead ends, using branches.
  2. It's slow and verbose, but I was doing this for not-work, so I often context-switched into chores instead of other work tasks. Not sure how well switching to more thinking would have gone, and I often forgot about coming back until the next day.
  3. Corollary: it's very uncertain how long a feature is going to take to develop. Will it get it in 1 try, 5, or do I give up and code it by hand?
  4. As expected, I had to build a good mental model, exactly the same as if I was coding this by hand. This is the hard part, but I appreciated not having to hand-write basic CLI functions.
  5. As expected, sometimes you have to go in and muck things out by hand.

Summary

These tools are useful to the low-attention-span manager (me) or senior/staff engineer who doesn't want to live and breathe all the if err != nil patterns but has a good idea of what they want.

The tool I used would have been absolutely terrible paired with a junior dev without further guidance. At least, today.