Wednesday, October 28, 2009

Input Methods

I don't actually remember learning to use a mouse. Ever since I can remember, I've just been able to click on what I want and move the mouse where I want. But when you think about it, the mouse isn't actually the most intuitive input device. You move it sort of where you want the cursor to go, but the direction and distance doesn't really match the action of the cursor on the screen. And most people are temporarily confused when learning about computers for the first time by the fact that you can pick up the mouse and move it to a completely new location without changing what's on the screen at all. Furthermore, you click over a certain location when you want to activate something. But you have left-click, right-click, and zillions more options to choose from. And sometimes you don't actually click to select something. We have learned to use mice and done so much work with them that it requires no conscious effort to do something on the screen.

I do remember learning how to type properly, though. I went through multiple programs and did lots of practice on my own, and by the time I reached sixth grade I could type about 60 words per minute. The problem is, after three more years of keyboarding, I have barely improved at all, because I had already nearly reached the upper limit of how fast my fingers could go. This is a major limit to typing, but as I will talk about later, it still has certain features that no other current technology can offer.

We (and I am often guilty of this) frequently talk about being able to input or write something “as fast as you can think.” And it's true that you can only think so fast. For instance, before I started writing this sentence, I stopped typing for a few seconds to get my thoughts together about where this post was going. But when you get a thought about something you're trying to write, you are still limited by the speed that you can type it. For that matter, the same thing happens when speaking. It's just that it seems much more natural for some reason, and that we've somehow trained ourselves not to notice. This problem is basically unsolvable, but it's one of the things to keep in mind when talking about input; how close can you get to that limit? Some technologies, like typing and speech recognition, can do fairly well. Others, like texting, entering text on graphing calculators, and using those little tiny keyboards that come on way too many mobile devices, do very poorly. Writing by hand comes somewhere in the middle. Generally, if entering something into a device is slower than hand-writing it, it's not worth my time.

There's also the question of errors. How likely are you or the machine to screw up? When there's an error, you have to spend time correcting it, and could potentially screw something major up, like deleting an irreplaceable file or buying something you didn't really want. We have to accept some errors or it would be nearly impossible to go through our lives, but too many and people start to get really frustrated. If you wanted to never miskey a letter when typing, you would type at about one-quarter of the speed you could reach otherwise. Most typists simply accept the fact that they will make errors and simply backspace them just as fast as they made them, go on, and not even notice that they spend half as much time fixing errors as they do actually keying new text. If you had to never click the wrong button, you would have to hold your hand steady and triple-check before clicking every button. Obviously, if you're about to do something that cannot be undone, some caution is warranted. But most of the time, you have to find the right balance between speed and errors.

Currently, most people use a keyboard and mouse to work with their computer. If they're working with a mobile device, they might use a touch-screen and mini-keyboard or telephone keypad instead. And this works perfectly well. The problem is that they're not perfectly efficient and they tend to create ergonomic problems.
Option 1: Touch screen
Touch screens make plenty of sense on many levels, and are about as intuitive as you can get. Touch a place on the screen, and the cursor moves there. Want to zoom? Just put two fingers on the screen and pinch it based on which way you want to go (assuming your screen supports multi-touch). Unfortunately, conventional touch-screens are very uncomfortable to use because you have to reach way out in front of you to use them. If you were to solve that by placing the screen on your desk, then you have to strain your neck to look down. Either way, there's that nasty problem of blocking your view of the screen with your fingers.

Option 2: Voice Recognition
Voice recognition seems like a great idea on the surface. You don't have to learn to type or use a mouse, and you can just tell the computer what to do instead. You can easily dictate documents, too, and most people can speak faster than they can type, so that works better.

In reality, it's not that great. First of all, voice recognition only has a stated 99% accuracy, and in many cases, it's a bit less than that. The errors are not intolerable, but they are certainly annoying. In actuality, the number of errors is less than most people make while typing, but typing errors can be corrected in less than a second by fast typists. Voice errors take a long time to fix; you either have to reach for the keyboard and fix it or go through the laborious process of scrolling to the word and reading it out to be selected (and if it doesn't recognize that word, you might have a hard time selecting it) and then hope that it recognizes the word the second time around. Most good packages can “learn” and thus have less errors over time, but our voices are so complex that a computer has a hard time understanding what would be simple to us.

However, if we decided voice recognition was the way to go, we could work hard on the technology and greatly improve the accuracy, which is getting better all the time. Unfortunately, that's not the extent of the problems. First of all, there's the issue that speaking makes noise. If you're in an office environment without closed doors, not only will loads of people be distracted by what other people are doing, there's a chance that the software might get confused too. Also, people like the fact that others can't tell what they're typing. It would be hard to do something even semi-privately in a crowded area using voice recognition. Next, there's the fact that it's much harder to erase large blocks of text. If you type a sentence that doesn't work, you can just backspace it in a couple of seconds or even correct it in places using your arrow keys. Correcting text takes forever, because it's not easy to speak about appearances. I'm going to reference a Science Olympiad event called “Write It Do It” here. In this event, you get something built out of Legos or Tinkertoys, and you have 25 minutes to write directions for your teammate. Then he has to try to build an exact replica with only your directions. The same basic problem exists here. It's much simpler to point and click at the word you want to delete than to describe where it is located on the page. Revising long documents could potentially take hours just because simple tasks like scrolling and deleting words require long, multi-part commands. And finally, our voices don't like talking for long periods of time. Just as people today hurt their wrists and fingers from typing too much, workers who talked for eight hours a day without stopping wouldn't be very happy about how they felt afterwards.

Option 3: Touchpads/Graphics Tablets
Touchpads are those annoying tiny little things you see on laptops. Graphics tablets are basically big touchpads that you use with a pen. Either way, you point at an absolute location on the screen, rather than move the pointer relative to its current position like you do on a mouse. If you're confused, think about how you can pick the mouse up and move it to the other side of the keyboard and the pointer will stay in the same spot. That's relative, because your movement of the mouse moves the pointer relative to its original location. In contrast, in absolute location, tapping in the upper-left corner of the pad will always move the pointer to the upper-left corner of the screen. Hitting in the middle will always move to the middle of the screen, regardless of where the pointer was before.

Basically, touchpads offer many more options than a mouse, because you can manipulate them with more than one finger at a time, provided the pad and software supports multi-touch. Some concepts have taken this to the next level (see They also may offer better ergonomics.

Option 4: Thinking
It can't possibly go mainstream anytime soon, but being able to manipulate your computer simply by thinking certainly offers a lot of possibilities. First of all, it completely eliminates the issue we've always had before of somehow transferring what you're thinking into a physical medium so what you're doing can be tracked. Second, all the problems that have been discussed earlier are solved: you don't have to reach anywhere, it's silent, and you don't even have to move.

There are a few problems I can think of, though. First of all, there has to be some way to distinguish between what you're considering and what you actually want to do. If I'm typing or using voice recognition, I can think before deciding what I'm going to write, then input it. If the computer is just trying to turn my thoughts into words, will it just place random words that are going through my mind on the screen? Will it know when I'm just paused to collect my thoughts and enter completely unrelated things?

Also, we're used to having to do something to have something happen. It would not be a particularly easy transition. People might turn away to do something else and not realize that they hadn't turned it off and strange stuff would be happening on the computer. Perhaps having some sort of dummy device would be helpful (say, a mouse that didn't do anything). Would you be able to reliably think about where you wanted the mouse pointer, and once again, would it be able to tell the difference between considering and actually wanting to click somewhere?

I'm sure there are many other things you can think of; these were just my first thoughts. Feel free to add your thoughts by clicking on the “Comments” link; if there's something good, I may add it to here. And by the way, I noticed that I have overused semicolons in this conclusion; they are quite useful, though.

Soren "scorchgeek" Bjornstad

Microsoft is not the answer.
Microsoft is the question.
The answer is "No."


  1. Good essay, Soren. It will be fascinating to see what changes come about in how we interact with machines in the future. I would not at all be surprised in some sort of thought control became possible in the next 10 to 20 years.

    -- Uncle Jeff

  2. Random thought: I love semicolons.

  3. Scorch,

    This was a nice review of what we've done, are presently doing, and what we might do in the future. I hope I can keep up. I know one thing for sure, you will make sure everyone is informed and educated!