🎙️, 🎙️ - Is this thing on?
In the past few months, I’ve had one of the most terrifying interesting challenges of my engineering career. I lost the ability to control a computer with my hands.
It started with some light aching in the back of my hands and forearms, but eventually, by the end of the day, my hands would be on fire. They wouldn’t be better by the next morning, getting gradually worse every day.
Of course, I tried ergonomic keyboards and mouses, but that was about as effective as upgrading your running shoes after tearing your ACL.
Early on, I was aware that people could control their computer using their voice, but I was also convinced it was less productive. But by early November, it was so clear that I was on an unsustainable path that I said alright, I’m done using my hands.
Two months in, here’s how that’s going.
The strengths of voice input
Health
Humans are meant to communicate ideas through speech, not through pressing buttons. We’ve just worked that way because it’s easier for the computer.
My doctor was thrilled to learn that I could work while pacing around the room. That’s not bad for helping your brain work properly, either.
Wider Vocabulary and Memorability
Voice commands do not suffer from the limited-real-estate problem of the keyboard; we work with most of the English language.
Consider the following example:
file create
file duplicate
file delete
or as keyboard shortcuts,
cmd-j,a
cmd-j,b
cmd-j,c
Perhaps cmd-f
would be a better prefix for this shortcut, but I seem to recall that’s taken. I’m not sure; I can’t completely remember. 😉
The weaknesses of voice input
Homophones
You don’t think about this while working with a keyboard. two
, to
, and too
are now a problem. It’s why I said we work with most of the English language. It’s a good idea to keep your commands phonetically unique.
Doing intelligent things in software, like using the context of what you’re saying, can significantly mitigate this, but it’s nonetheless a challenge.
Collaboration is Different
I can’t voice code and pair, much like I can’t have two conversations at once.
Usually I just let the other person drive, which I prefer anyway. But if I need to be the one driving for some reason, I’ll break out the sadistic input device.
Learning Curve
In case you’ve forgotten, there was a time you couldn’t type.
Using a microphone over a keyboard isn’t the same as starting over, but the first week or so is an odd combination of frustrating, demoralizing, and exciting.
It doesn’t help (or perhaps it does) that many of us arrive at voice coding with a hair-on-fire desperation, unable to work without a major change.
What does this actually look like?
The tool you will see me use is called Talon. It’s fast, accurate, and getting better every day.
We will start with terminal commands, because using a CLI is not far off from issuing a voice command.
It’s about time for a demo.
I have similar workflows with other CLIs, like kubectl
and pulumi
.
Of course there’s more to being an engineer than using CLIs. Let’s focus on what I think are two of the most critical things, from an input point of view - coding and writing.
Coding
Now we’re going to meet another tool - Cursorless, a piece of technology that lets you efficiently work in VSCode using voice. It’s especially powerful when you have a lot of things on the page, so, 99% of the time in actual programming.
When I first embarked on this saga, I was convinced I needed to use vim. I wasted a ton of time practicing and setting up the perfect development environment in Neovim. But I hit a local maximum, because vim has a fundamental flaw when it comes to voice coding.
In vim, you move, and then you act. It would be better to just act.
Before I demo Cursorless, I need to mention a concept - the phonetic alphabet. In Talon, if you want to insert the letter “a”, you say “air”. This is because you want to stay away from things that are phonetically similar, and the normal English alphabet is full of conflicting letters (e.g. h
vs. a
, b
vs. d
, etc.)
My main goal here is to show you what’s possible with Cursorless, not to explain every detail. If you’re curious, check out the documentation.
If you want to see more advanced tutorials, check out Pokey’s Youtube channel. He’s much faster than I am.
Writing
Writing looks a lot like coding, with a couple of key differences.
- Talon has a dictation mode, which works exactly the way you think it does.
- It also has something called formatters, which made an appearance in my coding demo.
camel foo bar
insertsfooBar
.sentence the park is closed today
insertsThe park is closed today
. - I do most of my writing these days in VSCode, so I can use Cursorless to edit it. In the future Cursorless will be in other places, like the browser.
Here’s an excerpt from a random book I like, about the British navy.
I’d be lying if I said it always goes that way.
Perhaps there’s a homophone that it can’t understand via the context. Maybe I’ve dictated something incorrectly. But when something inevitably goes wrong, I have Cursorless to edit it quickly.
Navigating
Finally, there’s navigating. This is the easiest thing I’ve showed you so far, but also the most useful for you if you want to download Talon and just rest your hands a little bit while you browse the internet.
I use tools like:
For tools with poor accessibility, e.g. Notion, I use a Wacom tablet. Many people in the Talon community use eye trackers as well.
I also limit my usage of such tools to work. Anything I’m using in my personal time works well with voice input.
If you want to know if a tool is amenable to this way of working, put your mouse in a drawer and try and use it. Voice input substitutes for keyboard input seamlessly; the mouse is harder to eradicate.
Fortunately, this category of tools is small.
Conclusion
In short, if we worked together, and I acted normal while pairing, you would have no idea that my desk does not have a keyboard on it.
My hands are slowly getting better, now that I restrict their usage to things they do well, like woodworking or working out. I likely have another 6 months until they’re recovered, but even then, the last thing I’m going to do is grind them into a keyboard.
Furthermore, I’m faster at coding using Cursorless, so why would I go back to full-time keyboarding? I’d mainly like to be less awkward of a pair programmer.
The last thing I’ll say is this - it’s hard to express how grateful I am for the people who had this problem before me and did something about it, so that for me, it isn’t that problematic.
Wow! You read the whole thing. People who make it this far sometimes
want to receive emails when I post something new.
I also have an RSS feed.