Untitled

Coding without a mouse?
Easy, just use any decent text editor.
But coding without a mouse or a keyboard?
That's where things get interesting.
About three years ago, I set up Dragon NaturallySpeaking on my home computer to give my hands a chance to heal from RSI (Repetitive Strain Injury) on the weekends.
I mostly used it for web browsing and occasionally writing emails.
It was a great relief; finally I didn't have to feel guilty for using my computer on weekends.
But it was also very limited, especially when it came to coding.
I'm a professional software engineer, and I liked my job, but I also wanted to work on side projects.
I tried one of the leading Dragon extensions to help with this, VoiceCode, but after a few hours I decided it was too frustrating and slow, and the software just wasn't there yet.
A couple years later I watched a great talk by Tavis Rudd, Using Python to Code by Voice.
I was put to shame.
Not only did he manage to do it, he did it professionally!
He managed to cure his RSI completely over several months, but he never stopped using his voice.
He was even faster using both voice and keyboard than keyboard alone.
I decided to give it another try, using the Dragonfly library that Tavis recommended.
Unlike VoiceCode, which is a complete solution, Dragonfly is just an improved Python API to Dragon.
It doesn't come with too many built-in commands, but it's extremely easy to extend.
With a powerful Python API to Dragon in hand, I started experimenting with different commands and grammars to use in Emacs and elsewhere.
For the first time I started to see that it was possible to become productive.
This was good timing, because my RSI was only getting worse.
About six months ago, I decided it was time to start using my voice environment at work.
At first I was very slow.
It took a lot of mental effort to do the most basic things, and I definitely had moments where I questioned whether I would ever become efficient.
But with each week I got faster and faster.
And something else changed: I started having fun!
Adding new features was building my future.
Getting faster was gaining freedom.
I felt like Bruce Wayne building his Batcave!
I'm not Batman yet, but I think it's time I start to share what I've learned to help others do the same.
I could just publish my code, but that wouldn't reveal all the little lessons I learned along the way that you will need to know to build your own custom environment.
I'm also hoping to hear from others, so please post your own ideas in the comments and tell me what I could do better!
A hands-free coding environment has a lot of moving parts, which can be overwhelming at first.
This post will teach you how to set up the basic voice recognition environment.
I also use eye tracking, but I'll cover that in a separate post.
To begin with, install Dragon NaturallySpeaking, the voice recognition engine.
Sadly, it's only available for Windows, so you'll have to do Linux development using a virtual machine or remote access (see my post for advice).
I recommend Windows 7, because Dragon NaturallySpeaking still has a lot of bugs in Windows 8.1 (another post for that).
I recommend Dragon 12 over 13.
Dragon 13 only supports select-and-say in particular apps, which is a huge limitation.
I used Dragon 13 for several months before downgrading thanks to readers' advice in the comments, and I don't miss a single feature.
Any edition is fine; I use the premium edition.
I recommend investing in a good microphone; the usual recommendation is Sennheiser ME3.
It's not cheap ($200), but it matters a lot: Dragon is pretty frustrating, so you want to do everything you can to minimize that.
Next, install NatLink, an extension to Dragon that makes it possible to add custom Python commands.
Follow the instructions here.
If everything works, you'll see a window pop up after starting Dragon titled "Messages from NatLink".
It's common to run into problems installing NatLink, so read the instructions carefully.
For your first installation, I highly recommend using their prepackaged version of Python to avoid trouble.
Finally, install Dragonfly, a cleaner Python interface to NatLink.
The prepackaged binaries are several years out of date, so I recommend cloning their git repository.
Run python build_develop.py to install it.
It's just a Python library, so if it worked you should now be able to import dragonfly from Python.
To get started with Dragonfly, I recommend looking at some example modules.
You can check out the original repository of examples or modules mentioned in the docs.
For voice coding purposes, you'll want to familiarize yourself with the multiedit module.
Just drop a module into your NatLink MacroSystem directory, turn your microphone off and on, and NatLink will attempt to load it.
If it's not working, check the messages window to see if there are any error messages.
Of course, this is just the beginning.
The interesting part is extending the Dragonfly modules and writing your own to support a full-featured voice coding environment.
I'll cover that in future posts!
You can do a lot just using your voice, but there are still a few times you'll find yourself reaching for a mouse.
It's often for the silliest little things, like clicking in the empty space within a webpage to change the keyboard context.
If you're serious about not using your hands, you can use an eye tracker to eliminate these last few cases.
This post will teach you how to get started.
Make sure you've read my introductory post on voice coding, since will be building upon that.
Eye trackers used to cost several thousand dollars, but now you can grab a cheap one for less than a couple hundred bucks.
I use the Tobii EyeX dev kit, which retails right now for $139.
Its major competitor is the $100 EyeTribe dev kit, a kickstarter-funded project.
I haven't played around with that yet, but I'd love to hear if you have in the comments.
The basic idea behind eye tracker interaction is that you look somewhere on the screen and then use some other method for clicking or "activating" the item you're looking at.
It's generally too distracting to have the pointer follow wherever you're looking, so usually a keypress instead of a click is used.
For our purposes, of course, we'll want to use a voice command.
The tricky part is integrating it with Dragonfly.
It really ought to be easy, except that right now there's an outstanding bug where their software does not listen for virtual keypresses.
There's a thread in their forums complaining about this, but it sounds like it won't be fixed until the consumer version, which doesn't have a release date yet.
The workaround is surprisingly elaborate, but the good news is I've already done the heavy lifting.
The basic idea is that we will call into their C API from Python.
The raw API is extremely complicated for our needs, so I wrote a simple wrapper DLL with a few basic functions to connect to the eye tracker, get position data, and activate the current gaze point.
You can get the source code and binary distribution of the wrapper from my github repository.
Python makes it a breeze to call into a DLL.
It's useful to have separate commands for moving the pointer and clicking, because the eye tracker accuracy isn't always perfect.
Foot pedals are another alternative to voice commands.
I often use a voice command to move the mouse based on my gaze point, then use my foot to click.
I recommend the Omnipedal Quad.
These are also great for scrolling, which is pretty awkward with dictation.
There's a lot more you can do with a tighter integration with the eye tracking API.
The major shortcoming of my simple approach is that it doesn't work well with small click targets.
The full API lets the application describe all the click targets, so the closest one will be automatically picked.
Of course, this usually requires access to the application source code (or at least an extension), so it's less generic and harder to get up and running.
Please post in the comments if you come up with something!
To be an efficient hands-free coder, you'll need to learn how to move the cursor around a file quickly.
There are two challenges: first, since you can't use a mouse, you can't just click to the location to move to.
You can try using an eye tracker to accomplish this, but the precision isn't quite high enough.
Second, with a keyboard you can hold a movement key and release when you reach your location, but this doesn't translate well to voice control, which has too much latency (although you might try measuring the latency and adjusting for it).
Beginner's note: to get started, check out the multiedit grammar for dragonfly.
This gives you commands to start with and a nice framework for repeating commands quickly.
One approach to movement is to search within the file, for example using Emacs incremental search.
This works well if you need to jump somewhere offscreen.
The trouble with using this for all onscreen movement is that recognition accuracy isn't always perfect, and that identifier might be repeated several times.
This would work better if there were an Emacs extension that numbered incremental search results so it's easy to jump to a particular one.
Let me know in the comments if you make this!
Let's break the problem down.
Every location onscreen is at a particular line and column, so if we can navigate to each of these quickly, we can jump to anywhere quickly.
Let's start with jumping to a particular line.
Any decent editor can show you line numbers and let you jump to a specific line.
But try editing a file with several hundred lines, and you'll find that this is pretty clunky.
There are a couple ways to improve on this: you can show line numbers relative to your current position, or you can show the numbers modulo some value.
For example, if you never have more than 100 lines on screen at once, you could just show the last two digits of the absolute line number.
Personally, I prefer to use relative line numbers since these work well with relative motion commands such as "up ten" or "down five".
Also, the number of syllables scales nicely with the amount of movement.
The main advantage to using modulo is that you can chain together successive commands easily (e.g.
"select between line X and line Y").
If you use Emacs and go with the relative line numbers approach, check out the package linum-relative.
Next, we have to jump to a particular column.
This is a bit trickier because it is awkward to number every column in an editor.
I suppose you could write the numbers vertically; please post in the comments if you come up with something to do this.
In the meantime, I use a few different approaches, but the key advantage we can exploit is that you generally want to jump to the boundaries of symbols and words.
This means that relative motion commands that move by a symbol or word often work well unless the line is long.
In Emacs, familiarize yourself with subword mode.
I bind commands for moving across an entire symbol and across a single subword.
When the line is long, I use a different strategy where I name the character at the beginning or end of a symbol I want to jump to, and use a custom command to jump there.
Searching for a character instead of a full word greatly improves recognition accuracy, especially when dealing with unusual words.
See the Emacs lisp for this at the bottom of the post.
Note that the Emacs extension "Ace jump mode" works similarly to this, but I prefer my approach because I don't have to wait for an overlay to appear before issuing my command.
Voice dictation latency is high enough that it's almost always an advantage to accomplish everything in a single command.
I also use a few more commands for quick movement to frequently visited places.
I use the directions North, South, West, and East to move to the top/bottom of a file and left/right within a line.
I use the mark ring in Emacs to jump to previous locations, and registers to save locations and jump to them quickly later.
And of course I use page up and down to scroll through file, although I prefer to use foot pedals for this.
Finally, as promised, my code for jumping within a line:
One of the best ways to get started writing Dragonfly macros is to set up web browsing by voice.
Thanks to the extensibility of modern browsers, this works surprisingly well.
Note that Dragon does have built-in support for web browsing, although I find it doesn't work very well.
The extension tends to cause pages to hang, and it requires that you speak the link you want to click on, which introduces ambiguities and doesn't work well for all clickable elements.
And of course, it's not very customizable.
I do recommend you try it first to see if it works for you, and to think about what you would like to improve in your custom version.
To begin with, you'll want to decide between Firefox and Chrome.
Both of these support the extensions you'll need, so it is really a matter of personal preference.
Firefox is probably the easiest to get started with, although I prefer Chrome.
First, you need to install an extension that labels clickable elements on the page, so you can speak a label to click on an element.
I recommend Mouseless Browsing for Firefox and Vimium for Chrome.
If you are using Firefox, try out this sample Dragonfly module.
If you are using Vimium, you will need to bind one command to label the clickable elements, and another to actually click a particular element.
Since you'll be using these a lot, make them as terse as possible.
I use "links" for the former and I simply speak the number for the latter.
To make it even faster, I only allow one-syllable numbers to be used in labels.
I also recommend binding the Vimium shortcuts that let you quickly open a bookmark.
Next, you'll want to enumerate tabs so you can quickly jump between them.
I think the Firefox extension already does this, and I created one for Chrome, Tab Namer.
My extension also extracts the hostname and appends it to the tab name, so you can easily use it to define contexts in Dragonfly (e.g.
to bind keyboard shortcuts for specific sites).
The last extension I rely on is Keyboard Shortcuts to Reorder Tabs.
The Linux version of Chrome binds these shortcuts by default, but not the Windows version.
As always, if there are extensions you find useful, please post them in the comments!
When you are ready for even faster browsing, check out my post on Custom web commands with WebDriver.
When I first started using Dragon, I was bummed to be restricted to Windows.
Fortunately, there are lots of ways to work around this limitation and use it with whatever operating system you want.
I'll cover the method I use and describe some alternatives.
My home set up is simplest, because I cheat and just use Cygwin to provide a UNIX-like environment.
This lets me launch GUI Emacs to write code and edit my macro files (which have to live on Windows).
I highly recommend starting with this approach, since it gives you many of the niceties of a Linux environment, but you also benefit from the estimable built-in support Dragon/Dragonfly has for Windows, such as contexts, task switching, and window management.
For task switching in particular, though, I don't use the built-in Dragon commands.
Instead, since I almost always run the same set of apps, I pin each of these to the taskbar and bind labels to "Windows key + number" shortcuts to quickly jump to any app.
This gives me full control over the naming of the apps, and is generally less error-prone than using the built-in "switch to" command.
My work setup is necessarily more complicated, because the software I develop has to build and run on a special company-specific installation of Ubuntu.
To do Linux development, I rely on NX, a remote desktop solution.
The huge advantage of NX is that it supports "rootless" mode, which makes the remote Ubuntu windows look like native Windows applications, each with a named pinnable taskbar button.
This works great with Dragonfly contexts and my task switching solution described earlier.
This solution also works well if you want to install a Linux distro on Windows via virtualization.
I've done this before with VirtualBox with good results.
VirtualBox provides its own rootless mode, but when I tried it about a year ago it didn't work nearly as well as using NX on my local machine.
The trick to setting this up is to configure port forwarding within VirtualBox for the NX port, and then connect to localhost within Windows.
If you are installing NX yourself, the process is a bit complicated due to recent changes that have limited the free feature set in version 4 (the latest version).
I recommend using an open source NX server, such as FreeNX, and version 3 of the official NX Client for Windows.
You can also try open source NX clients, but make sure to choose one that supports rootless mode.
The final component of my setup is to use Putty for SSH port tunneling.
As I will describe in another post, I run an HTTP server within Dragonfly, so that I can send contextual information to Dragon from other apps, such as Emacs.
Thanks to reverse port tunneling, I can expose this HTTP server running on Windows to my Linux machine in a completely secure way.
I also use standard port forwarding when I want to expose servers running on my Linux machine as if they were running on my Windows machine.
There are plenty of guides available online that explain how to set this up.
The solution I've described is to use Windows as your host, and Linux as your guest.
There are ways to swap those if you wish, although I think it is a bit of an uphill battle because Dragon is so closely integrated with Windows.
One solution that is gaining popularity is Aenea.
I haven't tried it myself, but if you think I'm missing out, please let me know in the comments!
I've been a little hesitant to publish a complete repository of all my Dragonfly commands because I think the journey that got me there is more useful than the raw code.
If you just read the code, you'll miss out on why I made certain decisions, you won't know about all the stuff I tried and deleted, and you won't know how I actually use all the commands in combination.
That said, I do think it is a helpful supplement to this blog, so I decided to go ahead and make it available on GitHub.
You can find it here, or linked from the navigation sidebar on every page.
While I'm laying down disclaimers, I should also mention that the code is a work in progress and isn't as clean and modular as I would like, but I decided it was better to just get the code out there and improve it later.
If you make improvements, please send me pull requests!
Once you've gotten used to the basic commands and extensions to control Google Chrome, you may start to hunger for a faster way to control websites you use frequently.
Some sites have keyboard shortcuts you can bind easily, but others don't.
This post will describe how to set up commands to control any webpage.
Selenium WebDriver provides a powerful but simple API to control webpages.
It is used primarily for automated testing, but it will work perfectly for our needs, with a few tweaks.
Start by installing the Python bindings and ChromeDriver.
By default, WebDriver will create a new instance of Chrome with a special profile to run any WebDriver commands.
This is great for sandboxed web testing, but not so great for controlling your existing Chrome sessions with all your custom extensions.
Fortunately, you can configure Chrome to set up a debugging server by opening your Chrome shortcut properties and adding --remote-debugging-port=9222 to the end of the Target field, after the final quote.
To start the server, first quit out of Chrome completely (Ctrl-Shift-Q) then reopen it with your custom shortcut.
Note that closing all open windows does not adequately quit Chrome, and you may need to repeat this procedure if you restart your computer and Chrome starts automatically.
Then go ahead and bind these functions to voice commands and try them out.
If all goes well, the test_driver function will navigate Chrome to the Google homepage.
But there's a problem that you may have noticed if you had multiple tabs open: it does not necessarily work on the currently active tab.
WebDriver does provide a way to change the current tab it operates on, but it doesn't provide a way to find which tab is active.
Fortunately, we can query the chrome debugger API directly from Python and get this information.
This technique doesn't work perfectly when multiple windows are open, but it works most of the time.
If you have a more robust solution, please let me know in the comments!
Navigating to Google isn't terribly exciting, so let's add something more useful.
WebDriver provides several ways of finding an element on a webpage, so you can reuse this action to create a shortcut for nearly any button or link on any webpage.
Of course, this is just the start of what you can do with WebDriver.
You can also execute sequences of commands, even waiting for particular elements to appear in the page.
Please post your favorite commands in the comments!
If you'd like to see how I integrate this with my voice commands, please check out my GitHub repository.
Hi, I'm James, and I created handsfreecoding.org to share techniques and software that allow me to code and enjoy my computer without using my hands.
There are a lot of powerful tools and libraries out there, but it can be overwhelming to learn how to put all the pieces together.
I've been at it for over two years and I'm still discovering new ideas, so I hope this blog will be useful to newcomers and experts alike.
To read more about my story, check out my first post, Adventures in Hands-Free Coding.
If you want to send me private feedback, please fill out the following form.
If you want to share public feedback that others can comment on, please fill out the form at the bottom of this page.
For a site titled Hands-Free Coding, I haven't written much about How To Actually Write The Code.
It turns out this is easier than you might expect.
Before reading this post, please familiarize yourself with my getting started guide and how to move around a file quickly.
There are two basic approaches to dictating code: using custom grammars such as Dragonfly, or using VoiceCode, (not to be confused with VoiceCode.io for Mac, which I just discovered and haven't used yet).
VoiceCode is much more powerful out-of-the-box, but is also harder to extend and more restrictive in terms of programming language and environment.
You might say that VoiceCode is Eclipse, and Dragonfly is Emacs.
You could also consider Vocola for your custom grammars; it is more concise but not quite as flexible because you can't execute arbitrary Python.
Since I prefer Dragonfly, I'll cover that approach.
The multi-edit module is a good place to start.
For example, I can say "score test word" to print "test_word", or "camel test word" to print "testWord".
2) I use short made-up words to dictate common symbols.
For example, I can print "()" with "leap reap".
I made these words up over time, but if I were starting new I would probably use a standard language such as ShortTalk.
3) I use templates in my text editor to quickly generate boilerplate syntax, such as the skeleton of a for loop.
In particular, since I use Emacs, I use the yasnippet package.
4) I rely on automatic formatting in my text editor to keep the code neat.
5) Most importantly, I structure my grammar so that I can dictate continuously, instead of having to stop and start after every keyword or symbol.
This is the hardest part of my setup, because there are many trade-offs between supporting continuous commands and keeping performance high, which I will cover in another post.
If you follow these basic techniques, the biggest problem that remains is misrecognized words.
You can avoid this in your own code by preferring easily recognized identifiers, but it's much harder when working with someone else's code or library.
I find that the best way to combat this issue is to use Dragon's built-in Vocabulary Editor.
The moment you find yourself spelling out a word, stop right away and add this word to the Vocabulary Editor.
If the problem is a variable or function with multiple misrecognized words, add the whole phrase to your vocabulary.
For example, if you regularly use a class named SimDataManager, add "sim data manager" to your vocabulary and then you can type it in any style using the prefix commands.
I have also experimented with a fancier solution to this problem, where I dynamically add words from nearby code into my Dragonfly grammar.
Unfortunately, I haven't found a way to seamlessly integrate this into my vocabulary without incurring a significant performance penalty, so I only call upon this dynamic grammar explicitly.
So it's not quite as powerful as you might expect, and most of the time I rely on the built-in vocabulary.
It's better than nothing, though, so I will cover this in a later post.
That covers the basics, but much of the challenge of writing code is editing code you (or someone else) have already written.
I'll save that for later posts!
As you build on your grammars over time, you start to run into all kinds of problems.
Commands get confused with each other, latency increases, and your grammars become giant disorganized blobs.
This is particularly challenging with Dragonfly, which gives you the power to repeat commands in a single utterance, but leaves it up to you structure your grammar accordingly.
In this post I'll discuss the techniques I use to wrangle my grammars.
If you are a beginner and don't mind pausing after each command, you can stop reading now and use the following simple patterns: flat grammars, one grammar per file, one file per application.
The hard part is designing a grammar that works with commands that can be repeated within a single utterance.
We'll take the multiedit grammar as a starting point.
As you add commands, one of the first problems you'll run into is recognition errors, particularly with any command that allows raw dictation.
The trouble is that this grammar allows raw dictation to be mixed with any other commands, so if your raw dictation contains any of those command words, it will get recognized as a command instead of dictation.
The simplest way around this is to move infrequently used commands into a separate (repeated) grammar that doesn't contain any raw dictation.
The downside of this approach is that you will have to remember to pause between commands of these two different classes.
I take a slightly different approach: I allow a sequence of commands from the larger group to be immediately followed by sequence of commands from the smaller group which includes dictation.
Hence there is no mixing between these command groups, but they can be spoken in a single utterance.
It does impose some constraints on command ordering, but this isn't a big problem because frequently you want to pause after several arbitrary dictations to make sure the output is correct.
Of course, this only reduces the severity of the problem.
As long as you have some command words mixed with arbitrary dictation, you will occasionally want to use those command words within the dictation.
To handle these edge cases, I use an escape word which forces the dictation command to be the last in the sequence, preventing these words from being recognized as a successive command.
As long as you keep your commands pretty simple, you shouldn't have to worry too much about performance.
It becomes a problem when you add specialized commands that contain repetition, such as a command to quickly speak a sequence of numbers or letters.
Every repeated element multiplies its component size by the number of allowed repetitions.
If you allow nested repetition, you get O(n^2) growth, which will quickly slow things down.
I avoid this entirely, but I don't force the commands to be in a separate utterance.
Instead, between my two top-level repeated elements, I allow zero or one instance of any of my specialized repeated elements.
The trouble is that Dragonfly contexts can only be used in an exported rule or grammar, so it's not easy to apply them to specific components of a repeated element.
It turns out that this isn't just a Dragonfly limitation, it is inherent to the way Dragonfly grammars map to NatLink grammars, and the limitations imposed on those.
But fortunately, we can work around it: we will create complete exported rules for every configuration we want to support, and associate them with contexts that ensure mutual exclusivity.
This isn't exactly cheap, but it is scalable if we assume hierarchical contexts.
Creating complete exported rules requires that we refactor RepeatRule to use a proper constructor instead of setting values statically.
We also have to give each rule a different name, to make Dragonfly happy.
There are lots of other ways you could organize a grammar, and I'm sure I will make more improvements over time.
Please add comments if you have ideas!
Dragonfly is so powerful that it's easy to forget that Dragon does some things well out-of-the-box.
To maximize your efficiency, it's important know when it's not worth it to reinvent the wheel.
In this post, I'll describe when I prefer to use built-in Dragon functionality.
Dragon's strength is natural language dictation and editing.
It allows you to speak continuously for any length of time, and it's very convenient to correct words and move the cursor by referring to previously dictated text.
But as soon as a Dragonfly command is uttered in an unsupported text field, Dragon will no longer let you make corrections.
So when dictating English text, I generally try to avoid using Dragonfly commands.
If I want to dictate variable or function names from code, I just use "cap", "no caps",  and "no space" to help.
This way I can keep dictating without any pauses.
I will occasionally switch to Dragonfly commands when making small corrections, but I try to avoid it.
When possible, I try to use the built-in "correct ..." commands, and when that is not sufficient I use "go back" after making an edit to quickly jump back to where I was.
Whether dictating text or code, I prefer to add misrecognized words directly to the Dragon vocabulary instead of putting these into my custom grammar.
This makes it easy to train the words, and I have designed my grammar so that it works in harmony with the Dragon vocabulary.
For instance, if I have a common class name "SimDataManager", I would add the phrase "sim data manager" to my vocabulary and then I can easily style it with underscores or camel case using my Dragonfly commands, without having to treat it as a special case.
Similarly, if I have a frequently misrecognized command that I don't want to change, I will train it using Dragon, taking care to not add it to the vocabulary (using "train word"), which would create ambiguity between command and dictation.
I recommend familiarizing yourself with the Dragon Command Browser ("open Command Browser") so you know what is available.
Note that the "Show all" button is helpful for expanding ellipses in commands.
Some of my favorite short commands are "display text" to open the dictation box, "edit all" to select all and open the dictation box, and "edit words" to open the Vocabulary Editor.
When do you prefer to use built-in Dragon functionality to custom commands?
Please post in the comments!
P.S.
I would like to acknowledge the help of Mark, one of my readers/commentors, who influenced me on this subject and pointed me to some of the most useful built-in commands.
I recently came across PCByVoice SpeechStart+, a small but interesting extension to Dragon that adds some nice functionality that would be hard or impossible to implement with Dragonfly.
It costs $40 (after a 15-day trial), so I will help you decide whether it is worth the money.
For example, this works with all the buttons and file icons in Explorer, everything in the taskbar and system tray, and the bookmark buttons in Chrome.
It supports three different styles of overlays, although I prefer the circles.
Note that in general I recommend avoiding applications that don't have good keyboard shortcuts, but it's inevitable you will have to use them occasionally.
You can turn the microphone on and off using only your voice.
Is much less sensitive to spurious awakenings than the sleep mode built into Dragon.
You can even require a second confirmation when turning the microphone on, but I haven't needed that.
You can start (and restart) Dragon using only your voice.
Paired with the above feature, you can start Dragon and turn on the microphone using only your voice even if Dragon is not configured as a startup program.
Also, this feature seems to work well even when Dragon is hung.
My only minor gripe is that it is apparently implemented by killing the Dragon process, which bypasses any prompts to save the user profile and perform maintenance.
On the upside, this makes restarts very fast and I haven't had any trouble with it leaving Dragon in a bad state.
There are a few other features that are also convenient, such as window placement and maximization commands that are a bit faster and more reliable than the Dragonfly counterparts I've worked with.
In general, this extension doesn't try to do too much and "just works".
There are only two significant complaints I have: the commands cannot be remapped (fortunately I don't have any conflicts), and I find that Dragon hangs when quitting unless I use the SpeechStart+ command to quit out of both SpeechStart+ and Dragon.
For better or worse, this means I always use SpeechStart+ to quit and no longer regularly perform profile maintenance.
I haven't noticed any downsides to this, but presumably Dragon recommends this cleanup for a reason.
Overall, I highly recommend this extension if you are a power user of Dragon and want to eliminate that sense of defeat when you occasionally have to reach for the mouse.
You can download the trial and purchase it from KnowBrainer if you are in the USA and PCByVoice if you are anywhere else.
As always, if you have experiences with SpeechStart+ or similar products, please share them in the comments!
When I find myself writing or editing something sufficiently long, I like to have full support for Select-and-Say.
I used to use "open dictation box", since that's the obvious choice, until I discovered that using Notepad is much faster.
It's kind of unbelievable that forking a process and starting a third-party editor could be faster than using the feature Nuance designed for this precise purpose.
But amazingly it is, by a factor of about 2X!
By my measurements, the dictation box starts up in about two seconds, whereas Notepad starts up in one.
Transferring text from the dictation box takes about one second versus a half second for Notepad.
And you get an easy way to transfer it out which works perfectly well with Emacs.
Also, unlike the dictation box, this doesn't prevent you from interacting with the existing app you have open (e.g.
to visit another tab in Chrome).
And if you accidentally mess up and paste it to the wrong place, it is still in your clipboard so you can easily recover it.
Of course, this same technique would work with another editor, it just needs to start up quickly and have full Select-and-Say support.
Ideally, I would use it with an editor that has unlimited undo history.
And if the editor can start in the background, this could be even faster by avoiding the process fork.
If anyone knows a good candidate, please post in the comments!
Like many an Emacs user, I am enamored with Org-Mode.
Every great coding session begins with organizing your thoughts, and Org-Mode is an excellent tool for the job.
If you're tracking New Year's resolutions, it's great for that too.
Since Org-Mode already has an excellent compact guide, I'll focus on my voice bindings and finish with a bonus section on how I like to structure my personal to do lists.
Org-Mode comes with a bunch of built-in commands that you will want to make use of.
But if you blindly map the commonly used keyboard commands to voice commands, it will be slow to work with.
The first problem is that several of the commands are designed to be chained together rapidly.
This is easy enough to work around; we just need to consider what the most likely combos are and create specific commands for them.
The second problem is that some of the commands cycle through several options and rely on visual feedback to select the right one.
This doesn't play well with the slow feedback loop of voice input.
Fortunately some of the commands let you specify prefixes to jump to a specific option.
Unfortunately, Org-Mode doesn't provide a prefix option to specify a particular result.
I looked briefly into writing my own command for this, but the implementation would have to be tied to Org-Mode internals, so I decided it wasn't worth the effort.
The "org (West|white)" command runs some custom lisp which goes to the beginning of the line, skipping after stuff like asterisks and to do's.
I'm sure that hard-core Org-Mode users would want a lot more bindings, but this is enough for my purposes right now and ought to be enough for a beginner to get started.
If I'm missing out on some awesome functionality, please let me know in the comments!
Once you get familiar with Org-Mode you will want to use it for everything.
I use it to track my personal to do list, which is full of everything from boring chores to exciting new things I want to try.
In the past I've had trouble prioritizing stuff in this list, because it's hard to compare something necessary and mundane with something I'm excited about.
What I realized a few months ago is that I don't have to.
At different times of the week I have different motivations.
Sometimes I just want to relax, sometimes I want to take care of business, and other times I want to do something good for the world.
So I simply structure my list around my motivations and situations.
Time sensitive: I need to do this soon.
Practical: I need to do this at some point.
Fun: When I just want to have a good time.
Outside: When I feel like getting out of the house.
Chill: When I just want to relax.
Good: When I want to do good for the world.
Intellectual: When I want to learn something new.
Professional: When I want to build my resume.
Health: When I want to improve my health and well-being.
Social: When I feel like spending time with others.
I also have categories for close friends containing stuff I want to do with them or tell them about.
With this kind of organization, I find it's much easier to prioritize within a category.
If I'm having trouble categorizing something, that's often a sign that I'm not really motivated to do it all and it doesn't belong in the list to begin with.
Later, when I am looking for stuff to do, I can easily focus in on what I feel like doing and not get distracted.
What do you like to use Org-Mode for?
Let me know in the comments!
Not related to coding, but hands-free coders need to have some fun too.
I don't even use a custom grammar and it works very well.
If you have a good experience, you can thank Blizzard on this thread I started.
Hopefully I didn't just set the voice coding community back by a few months!
If you know of other games that play well with hands-free control, please post in the comments.
Tobii has released a new consumer eye tracker, the Tobii Eye Tracker 4C for $150.
Although I haven't found eye tracking to be nearly as helpful as speech recognition, it is handy for those occasional situations where you just want to click a button or change context and you don't have any command to do so (see my earlier post for details).
I have been pretty happy with the Tobii EyeX, but it isn't perfect, so I was excited to try out this new device.
It arrived just a few days after I ordered it, and came with a complementary USB extension cable.
This is an essential addition if you want to use the eye tracker with a desktop, because the built-in cable is very short (although that's nice when using with a laptop, where it reduces the bulk considerably).
The first thing you will notice after setting it up is that the lights are much less bright than the EyeX.
I got used to the bright red lights, but I'm happy to not have those blaring at me all the time anymore.
The new lights are primarily using infrared wavelength, although you will still see a red glow.
The main new feature that Tobii is touting is a dedicated chip for performing processing on-device.
This is supposed to reduce host CPU load and makes it possible to use USB 2.0 instead of 3.0.
Even though I have USB 3.0 ports, I'm happy to see this change because I have found them to be less reliable, occasionally requiring driver updates to work.
Also, USB 3.0 doesn't work with standard USB hubs.
As for the CPU load reduction, in theory this is helpful when using with a laptop to reduce fan spinning and power drain, although in practice I haven't noticed much change (including when looking at CPU usage within the task manager, where it hovers at around 1-2%).
Of course, the most important question is whether it actually works better than the old eye tracker.
Indeed, it has improved in a couple ways: higher precision and larger tracking area.
After performing calibration, the precision is noticeably better (approximately 2-3X, with about 5 mm diameter noise), although it still exhibits consistent error (as much as 2 cm) towards the periphery.
It has always been a mystery to me why this bias cannot be fixed by calibration, because it is so consistent.
For this reason, I would still recommend sticking with a screen of up to 24 inches, even though the requirements technically now support up to 27 or 30 inches depending on aspect ratio.
The other improvement is in the tracking area.
I use a motorized sit/stand desk, and I used to find that the change in the position of my head in standing position was enough to go out of range of the tracker.
That is no longer a problem, indicating a major improvement.
Overall I'm happy with the new device, enough that I ordered one for both home and work.
My biggest complaint at this point remains the error when looking towards the sides of my screen, which seems like it ought to be fixable in software.
Tobii does update the software pretty regularly, so hopefully they will get to this at some point!
I learned about a couple very exciting new developments this week in open source speech recognition, both coming from Mozilla.
The first is that a year and a half ago, Mozilla quietly started working on an open source, TensorFlow-based DeepSpeech implementation.
DeepSpeech is a state-of-the-art deep-learning-based speech recognition system designed by Baidu and described in detail in their research paper.
Currently, Mozilla's implementation requires that users train their own speech models, which is a resource-intensive process that requires expensive closed-source speech data to get a good model.
But that brings me to Mozilla's more recent announcement: Project Common Voice.
Their goal is to crowd-source collection of 10,000 hours of speech data and open source it all.
Once this is done, DeepSpeech can be used to train a high-quality open source recognition engine which can easily be distributed and used by anyone!
This is a Big Deal for hands-free coding.
For years I have increasingly felt that the bottleneck in my hands-free system is that I can't do anything beneath the limited API that Dragon offers.
I can't hook into the pure dictation and editing system, I can't improve the built-in UIs for text editing or training words/phrases, I'm limited to getting results from complete utterances after a pause, and I can't improve Dragon's OS-level integration or port it to Linux.
If an open source speech recognition engine becomes available that can compete with Dragon in latency and quality, all of this becomes possible.
To accelerate progress towards this new world of end-to-end open source hands-free coding, I encourage everyone to contribute their voice to Project Common Voice, and share Mozilla's blog post through social media.