v0.2.0: Future fantasies
Published | |
Last update |
Discuss on HN
ZAMM now supports capturing terminal command input/output in asciinema format, although there’s currently no way to actually view that information after the fact. In fact, there’s currently no way to even get back to the same terminal session after going to a different page and back. Also, terminal escape codes are not handled yet, so cmd
output on Windows looks rather messed up. Apart from that, I’ve also finally added the Rust package for Ollama interactions and the Pixabay sound providers to the credits page.
The main challenge for me this month was figuring out how to do non-blocking reads of command output in a cross-platform way. I don’t know why, but tackling uncertainties like that gave me a sense of anxiety, which is a far cry from when I was programming as a teenager. Back then, uncertainties were exciting opportunites for exploration. In any case, now that the main uncertainty for this feature is resolved, I am back to the regular boredom of ironing out details.
Despite the very limited amount of visible progress, this does still finally mark the first time in this project that I’m able to start building out the tools that I want LLMs to be able to use. As such, I’m making this the start of the v0.2.x series of releases for ZAMM. I figure now is as good a time as any to talk about the vision I’ve had for ZAMM since February of 2023.
I’m hoping that ZAMM can, at the very least, be a smarter Make
that can not only run project-related commands without you having to remember what they’re called, but also do on-the-fly error handling. Something that, say, lets you get a development environment set up for someone else’s software project without needing to read their README or to debug common setup problems. For example, sometimes pip install
isn’t enough by itself because you get an error such as No module named '_bz2'
. But what am I supposed to install to get _bz2
? Turns out it’s libbz2-dev
. Perhaps some LLMs would have that information memorized already, but I want that sort of information to be documented in a structured manner such that any LLM I throw at the problem will know how to deal with it.
I want something that lets me document setup procedures for brand new projects. Set up a Git repo, link it to GitHub; add a linter, a formatter, a Makefile
; set up tests and run them in CI. Of course, there are GitHub repository templates that you can use for this. But I want something smarter. I want to be able to specify any number of varying configurations: use React with npm
and eslint
for this project. Use Svelte with yarn
and oxlint
instead for this other project. Wait, no, let’s try building the project from scratch with pnpm
instead. Set up this project with all .test.ts
files next to the corresponding .ts
files; set up this other project with all test files in a separate test/
folder, and always make sure the JS config matches the project setup. I want to be able to teach the LLM to do something once, documenting my exact steps, and then have all LLMs be able to do that same thing for me forever into the future, at least until breaking changes are introduced into my dependencies. And if things stop working for whatever reason, I want to at least have a concrete historical record that shows how things most definitely used to work, so that they can be compared to how things don’t work anymore.
I don’t know if any of that’s feasible the way I’m imagining it with current LLMs, but I think I am finally on the cusp of at least being able to experiment with that. Maybe in a couple more months.
If that works out, I hope that ZAMM can also do dumb code editing or refactors. Automated project exploration ala aider or Cline (Claude Dev) is very cool, but what about standardizing the way you do things in a project? I want to be able to create structured documentation that says, “New Tauri backend API endpoints should be exported with specta::collect_types
as well as registered with tauri::generate_handler
in main.rs
.” Again, even if some LLMs may be smart enough to figure this out automatically from just reading the project code, I wish for there to be structured documentation around this so that even a human unfamiliar with the project could interact with it in all the relevant ways. And of course, it would be great for the documentation to be automatically updated if the project-specific ways of doing things changes.
If that works out, I hope that ZAMM can support the creation of natural-language libraries. Not just Python-specific or Rust-specific versions of the original Ruby VCR library, each with their own feature sets, but instead a natural language specification of what any VCR library in any language should be able to do, and documentation around specific implementations in particular languages. That way, new or obscure languages can also get access to a whole host of commonly used programming libraries, so long as the LLM has access to 1) the natural langauge specs for the library, 2) an existing implementation in another language that provides a concrete interpretation for the natural language specs, and 3) documentation around how things are done with the new language.
If that works out, I hope I can get ZAMM to a self-hosted state where ZAMM can be described entirely in terms of its functionality and its looks. It would have a reference implementation in one particular stack, but it would also be easy to describe how to create Electron or Qt projects, and have the LLMs automatically rewrite the whole project using Electron or Qt. Documentation can then be code, and code can be documentation, thus blurring the lines between programming and literature. I have serious doubts that it’s all going to work out, but one can dream. Creating such a self-hosted program has been the recurring fantasy of my 20’s, but it wasn’t until I discovered langchain in January of 2022 that I really felt like this fantasy could definitely become technically feasible.
But of course, that’s just the fantasy. The reality is that as of right now, ZAMM does precisely none of that. I found out about Cline this month, and discovered to my chagrin that it was only started a mere 3 months ago, on July 3rd, and already handles terminal execution, file editing, and even browser inspection! How am I so slow at this? Maybe part of it is because of my heavy-handed testing infrastructure (but if I don’t do that, how would I be confident that a rewrite of the app in a different language or framework would work exactly the same?), and maybe another part of it is the fact that Saoud built his on top of VS Code while I crafted my own GUI from scratch. I remember considering building a VS Code plugin at the start, but I figured that I wasn’t interested in learning the ins and outs of the VS Code platform because that knowledge is presumably not transferable outside of VS Code, and I was also afraid that being corralled into the VS Code framework meant that I wouldn’t have complete control over the functionality and display of my plugin. (The fact that Cursor ended up forking VS Code entirely gives this fear some credence. I also had a similar consideration back when I was looking at aider and its TUI — I had no experience building TUIs, and it felt like the terminal environment would be too restrictive compared to the fluid canvas that HTML with JS offers.)
Anyways, that’s all just speculation and excuses on my part. Part of me feels a pang of shame at having a project that’s getting a little long in the tooth without having anything to show for its age. It’s ironic that while I don’t have expectations for my own increasing age — I don’t feel the stereotypical dread of turning 30 without having much to show for it in the way of concrete achievements, at least as measured by the standards of my peers that I went to school with — I do have expectations for the project’s age instead. Sometimes, I wish that someone authoritative could confirm my feelings by telling me straight up that I’ve done fucked up with this project, and explain where and how things went wrong.
On the bright side, as AI development tools continue to improve, work on ZAMM should continue to get easier — if it doesn’t get obviated altogether, that is. Branching off of something else’s AI project (for example, by forking Cline instead) is always an option, but for now projects such as Cline don’t work quite well enough to convince me to abandon efforts on ZAMM just yet, and leaving a project that I’m already comfortable with for one that is new and unfamiliar comes with its own mental switching costs. Lowering the activation energy needed to explore a new project is definitely something that I believe LLMs can help with as well — in the future.
And speaking of forking, I do wonder if making that a lot easier in the future could change much about how the software ecosystem works. Right now, if you take someone else’s project and make some changes to it (i.e. “forking” it, as they say in programming), but they don’t agree with your changes because they have a different vision for their project than you do, then you will have to take up the responsibility of maintaining your own fork in the face of bit rot. This is already hard enough in general as the interfaces between software projects shift over time, but it gets doubly hard to maintain the same changes within the same project. The first case is analogous to a species adapting to gradual changes in their habitat; if you can convince the species that adding an extra limb is beneficial to them, they’ll gladly incorporate it into their DNA and adapt to their environment with that limb included in all future generations. (I know, this isn’t how evolution works in nature.) The second case is analogous to you having to figure out how to artificially graft an extra limb onto individual members of that species, and then having to figure out how to do the procedure all over again once their internal organs evolve with no concern whatsoever for your ease of surgery. It’s worth doing sometimes when you really just need that exact animal but with an extra leg, but that will often not be the case.
But say it becomes possible to fork easily (because an LLM can quickly acquaint you with the general skeletal structure of a project and where you’d need to make the incisions to get the result you want) and to maintain a fork easily (because once you explain how to perform the surgery to the LLM, it can combine that with changes in the project documentation to figure out how to perform that same surgery on future versions of the project). Maybe there will be less drama around where a project should head, because everyone involved will find it easier to satisfy their own visions for the project. In fact, if it becomes possible to easily graft any feature (with a well-understood open-source implementation) onto any software project, then perhaps open source software projects will compete less on features and more on the curation and presentation of said features. That’s why I had originally put a “Fork: Original” row on the home screen of ZAMM, because I fantasized about the existence of a “fork-first” approach to software development wherein we’ll each build a ZAMM that’s tailored specifically to us, and we’ll trivially copy anything that we like from each other’s projects.
While we’re on the topic of speculating about the future, I’d also like to speculate on tech developments outside of coding. What if one day in the nearish future, we build a simulation that AIs can inhabit, like a crude version of what the Simulation Hypothesis proposes? I don’t mean that the simulated world has to be anywhere as crazily convincingly real as ours. I mean creating even a very crude representation of a world that simulates just enough of agent interactions to replicate some basic societal dynamics such as the unfairness inherent in hierarchical power structures; perhaps a world that’s only ever rendered in text like a choose-your-own-adventure story, since that will already be enough for a text-based LLM to “experience” the simulated world by making choices and getting feedback from its digital reality. Would we then be able to create a Digital Buddha that lives through the lives and perspectives of every single entity in the simulated universe, ala The Egg? And even if we were able to create it, would it necessarily be empathetic towards all life simply by virtue of having experienced life from so many angles, or would it perhaps just be more capable of ruthless manipulation than a being that hasn’t internalized other perspectives quite as much?
Or suppose that one day, we could take it even further and build a simulated 3D world that virtual agents can learn to navigate in. Perhaps we could also build a simulated 4D world that virtual agents can also learn to navigate in, by “simply” extending the physics and graphics engines to handle 4-dimensional vectors instead of 3. Perhaps then we could interact with beings that are native to the 4D space, for whom moving around in 4 dimensions is just as natural as moving around in 3 dimensions is to us. And of course, we are not limited to 4 dimensions. We could create any kind of simulated world, and the beings in them may be none the wiser until we reveal that to them.
And what if it eventually becomes trivial for even simulated beings to create simulations of their own (similar to how modern CPUs support nested virtualization, wherein you could efficiently run virtual machines inside of other virtual machines), and then for these beings to immerse themselves so completely in a simulation that it becomes their new base reality because they made themselves forget there ever was anything else, like an extreme version of losing yourself in a storybook world? Perhaps the very idea of “base reality” will become meaningless to beings who are used to frequently hopping in and out of different base realities. You’ll never know if you’re in the base layer of reality, and it doesn’t matter anyway because you’ve become so used to living through entire lifetimes and dying just to wake up in a different universe.
Or think about even just creating beings that are native to different environments in our own universe. Why spend so much energy to terraform Mars — after which Mars will slowly lose all the oxygen in its atmosphere anyways due to its comparatively lighter gravity — when you could just build beings with bodies that can function natively on Mars? It seems quite plausible to me that we’ll figure out Mars-compatible bioengineering, along with enough neuroscience and AI to grow brains under different circumstances, long before we cobble together enough resources to terraform that planet.
Or what about creating an AI that serves as the personification of an organization? For example, instead of being forced to interact with a large corporation through a faceless bureaucracy (whether as a customer talking to customer service, or as an employee talking to HR), we could instead imagine talking to a singular entity that has the entire context of your history with the organization and your previous conversations with that entity. Imagine, instead of inconsistencies between individual bureaucrats and the lack of communication between different internal departments, we have a singular entity that is always up-to-date with the latest policies and priorities across the organization, such that the left hand knows what the right hand is doing. This singular entity could take your personal goals and preferences into account and really integrate that into its plans for how to flex its organizational muscles in order to maximally benefit its members. But of course, this could easily turn sour — an AI overlord would presumably be much harder to overthrow than a human overlord because the main problem with human dictators is that there’s only one of them and they can’t be micromanaging everything, everywhere, all at once throughout an entire nation. An AI overlord needs to trust its underlings much less because it can just make a copy of itself to serve as its own underling, ala child processes.
If robots start doing enough of the labor in the world, perhaps there will even be a robo-society, with its own robot-related supply chains and ecosystems, that forms in parallel to human society. After all, if robots get good enough, they can even take over the job of building or fixing their own bodies. Why would a robot go to a human engineer, who can’t communicate directly with the robot’s mind, instead of another robo-doctor? And if such a robo-society is to form, why wouldn’t it compete with human societies for resources, just as heavy industries today compete with light industries for resources? If robots truly do become competent enough, I don’t see this competition going well for human society. It’s not that the robots need to magically gain free will and then decide to wage war on humans; it’s enough that human elites prefer to expand robo-society at the cost of human peasants, because robots are that much more useful than the average human peasant. I’d argue that there’s already precedent for this today — societies are generally way more concerned with maintaining the integrity of their electrical grid than with tackling poverty. (Of course, a collapsed electrical grid would plunge a large part of the population in developed countries into instant poverty, so it’s not an either-or tradeoff — but I think that one could make the same argument once human society depends on robo-society.)
Take for example building up your military. I could see robo-militaries being much more effective than human ones. Every time you lose a human soldier, at least 18 years of raising them and possibly more years of training them to be a skilled killing machine go down the drain. Every time you lose a robo-soldier and recover their mind intact, you could theoretically recover all their memories until their moment of destruction, allowing every other robo-soldier in the army to learn from their mistakes. Every time a robo-squad gets successfully ambushed, all other robo-squads learn better unit tactics. A robo-army may well be an enemy that gets more dangerous the more you kill it. You would think twice about risking your best human soldiers on dangerous missions; a robo-army can be composed entirely of its best robo-soldiers. If robo-militaries do become much more potent than human militaries, then as a human leader you may well want to expand the portion of your country dedicated to robo-society so that you have a bigger robo-military to draw from.
But maybe the only pressure to expand robo-society won’t be from the human elites. Given that Facebook has already successfully developed an AI that’s better than most humans at a game of Diplomacy — which, as its name implies, is all about negotiating with other players — perhaps at some point AIs will be better than humans at negotiating business deals or international treaties. If you’re a union worker, perhaps you’d want the best person for the job to negotiate the next union contract — and perhaps that best “person” would be an AI rather than your current incompetent union boss. Looking at the state of the world and its human institutions today, I personally find it easy to want to demand something better than the current imbecilic leadership we’re stuck with — and if human leaders are unable to shape up to be the leaders we deserve, perhaps it is time they be replaced with something more competent. At the same time, we should be careful: just because an AI is good at gaining power doesn’t mean it’ll be good at governing. Just look at the Khmer Rouge and at North Korea — these were/are regimes that were very competent at suppressing internal dissent, but superbly incompetent at actually improving the countries they ran.
To be clear, I’m not standing by strongly on any of these predictions. They’re just fun things to speculate about. But if there’s one thing I would bet on, it is that the world is just going to keep getting more and more complicated and confusing, and less and less comprehensible and predictable to the average person — especially in fields that are anti-inductive. Perhaps anything that the machines run will start looking like those chess games played between two state-of-the-art chess AIs, such as Leela Chess Zero and Stockfish. It used to be that we could have chess engines show us why they make moves — and we still can, but with how advanced chess engines have become, their high level play is getting to the point where sometimes even experts cannot quite be sure about what the machines are trying to say when they explain their behavior. (I’m basing this off of YouTube chess videos, where sometimes even the human chess commentator explicitly says that they’re just guessing at computer intentions.)
So maybe reality will be like that too. You won’t even have to take psychedelics anymore to experience a trip in real life. Geopolitical realities will just change from one day to the next, for reasons that are beyond any human understanding. There will still be simplified us-versus-them narratives to manipulate the rubes, of course, but for anyone who cares to question the narrative, they’ll find that there is not a single narrative out there that truly makes sense. Humans will have completely lost the plot, and it’s just systems battling each other. In a way, perhaps it already is this way — supposedly even new Presidents of the United States are surprised at how little power they actually have to change “The System” — except that most systems don’t yet have a single entity with theory of mind at the helm.
Updates on procrastination
This month I got really feverishly sick for a week and a half, during which I of course had no productive expectations of myself. However, at some point afterwards, I realized that my false sense of urgency had returned. It’s the kind of feeling that I can handle well enough once I recognize it, but then it gradually creeps up on me and catches me unawares again.
On the one hand, I think it is fair for me to want to get something useful out of this project, rather than working on it in a way that is completely untethered from reality. On the other hand, it is not realistic for me to expect an hour or two of work on any given day to fundamentally change anything about the project. On the third hand, it is also not great for me to assume that nothing about the project will ever fundamentally change, no matter how much work I put into it.
This last week, I also started working on a crypto side project with a friend. Since then, I’ve noticed a couple of things:
- It is so much easier for me to work on a project in-person with someone else. It’s so much easier, in fact, that I think Liza was right when she suggested that working in the office at a job or studying in a group at school is such a different mode of existence that perhaps I shouldn’t seek to replicate it in my life outside of those contexts
- In the last few days, I find that I can now much more enjoy hacking on ZAMM as an inconsequential side project. Of course, that was easy to say before, but in retrospect, with literally nothing else at all in my life that could positively impact my future earnings, it was hard to truly feel it and not get caught up in endless mind games. This is not to say that the crypto project is going to make any money, but just the mere fact of it being a second possibility makes it much easier to let my ego fall to the side with ZAMM. Now I have two projects, neither of which I need to take that seriously. Of course, it remains to be seen whether this new feeling is fleeting or not.
It feels like time to flesh out a better answer to the question: What am I actually working on this app for? Well, for one I do somewhat enjoy the mindlessness of weaving a thread through code. It can be quite annoying when I want to actually get somewhere fast and all this weaving is taking up a lot of time, but in other moments the weaving can be rather enjoyable too. ZAMM has become the sort of comfort project that I am familiar enough with to work on without needing to spend a lot of time figuring out where the existing threads run to and which spots I therefore need to thread a new strand of functionality through. But that sort of thing can happen with any other project I attach myself to. What is special about ZAMM?
In a sense, it is not at all special. If I had to choose one forever project to work on for the rest of my life, it would not be ZAMM, but rather a generic simulator. My fantasies around that hypothetical project would be better left to another blog post, but suffice it to say that it would be way more involved than ZAMM is. Ever since February or perhaps March 2023, my plan for ZAMM has been:
- Get basic tools for LLMs working (meaning, at the very least, that an LLM could control the terminal, do file editing, and ideally also browse the web and interact with the GUI desktop)
- Record LLM interactions with those tools
- Replay those interactions until they succeed using the right prompts with the right data
- Document all the successful interactions (and perhaps even some of the unsuccessful ones) so that they are reproducible
- Reproduce these interactions in slightly different contexts. The smarter the LLM, the more different the contexts they will be able to handle
The thing is, all this is obviously straightforward in my mind:
-
There’s already countless terminal emulators out there, and I don’t even need full terminal emulation; I just want to record command I/O, replay it, and display it to the LLM after all ANSI escape codes are correctly interpreted and cleaned up. (Last I checked, existing tools like langchain and Cline don’t appear to do the last part correctly.)
File editing may be a lot more involved if I want to make it easy for the LLM to succeed, but let’s just first limit it to tasks where I can dump the entire contents of small files and have the LLM edit them by reciting the new version verbatim.
I’ll admit, I have no experience building web browser plugins — but given that so many of them exist already, including ones like Zotero that can communicate with desktop apps, and ones that can take screenshots of entire pages, it’s clearly just a mattter of spending enough time figuring out how to do something that plenty of others have done already, rather than figuring out how to do something truly innovative that might turn out to be surprisingly difficult.
-
This is just a matter of defining data structures, doing database migrations, and testing everything thoroughly.
-
Okay, this part might be overly optimistic about the capabilities of present-day LLMs. But I’m pretty sure that they can succeed for tasks that are almost the exact same. Like, maybe I can give it a task like
This is an example of how to protect the main branch on this example project with X and Y checks. Now, go protect the main branch on this other example project that has A, B, and C checks instead.
If even that proves too difficult for the LLM to do, then surely an even simpler task such as
This is how to protect the main branch on this example project with condition X. Now protect the main branch on this other example project with condition Y.
should succeed. You know, just keep narrowing the task down until it’s simple enough for even a brain-dead LLM to do, because being able to automate any common tasks at all would make life just a little bit more tolerable for me.
-
This is once again just defining data structures, and I suppose file formats too, since this documentation should ideally be transparent to Git.
-
This is just a matter of running the LLM on the same workflow we tested out in step 3, but on real data instead of mock data.
Absolutely none of this is rocket science. I’ve had this simple-ass vision in my mind now for over a year and a half, and yet in all that time since, I haven’t even been able to make it past the very first part of the very first step. I’ve finally been able to nibble on that first part this month — hence my celebrating by bumping the version number to v0.2.0 — but when I say that ZAMM still doesn’t actually “do anything” yet, this is what I mean. Even if this plan is completely unworkable because I am grossly overestimating LLM capabilities, I want to at least get to the point where that problem becomes clearly obvious. Instead, I’m stuck at a point where nothing is happening to either falsify or prove the dream true.
And if it is this difficult for me to make ZAMM happen, despite ZAMM being such a dead simple idea to me, then it will be absolutely impossible for me to make anything more complicated happen. In this sense, it’s not about ZAMM itself, but about what ZAMM represents about the difficulty of working on sizable projects. If I abandon ZAMM now to start on the simulator project, I know from past (and current) experience that I’ll end up in the same exact spot of having a project of intermediate size that is very hard to make good progress on, that I feel a lack of passion towards because I am generally inclined towards instant gratification. If instead I continue on ZAMM for now, I stand a chance (however small it might be) of eventually escaping the curse of software development getting exponentially more boring.
Other random thoughts
- As frustrating as it was to watch the main character of The Substance self-sabotage, it was also reminiscent of my own struggles to “respect the balance.” Me at night: “Ah but this is so fun, I don’t want to sleep yet!” Me in the morning: “Ah damn you, why didn’t you sleep earlier?!”
- Ever since Liza started streaming, both of us have struggled with the OBS Studio interface. In the past, I would’ve judged it so hard for its horrendous lack of UI polish. But now, I have a greater appreciation for just how much work needs to go into that, and its popularity makes me realize that sometimes, it’s okay for an app to lack polish.