Speaking of History…
The most overlooked, under-rated feature of the Mac operating system has to be the speech recognition and text-to-speech tools. Most folks activate them, play with them for a few minutes, then turn them off and forget about them. But with a little help from Applescript (and a good microphone) you can control your Mac verbally and create your own additions to Apple’s speakable items.
The text-to-speech part of the tools goes all the way back to 1984 and the introduction of the Mac, believe it or not! Yep, even though MacInTalk (as it was called then) didn’t make it into the OS until System 6, the software for text-to-speech had already been written at Apple and was being licensed to software developers.
Speech recognition didn’t come along until the early 90’s, with the Casper project. Later refined into what we know know as Mac Speech Recognition, Casper was ahead of its time, as most Mac technology usually is (think 3.5 inch floppy disks, SCSI, USB, etc.).
System 7 brought the release of both Applescript and Speech Recognition, so naturally Apple provided the ability to extend the effectiveness of the recognition software by adding Applescript support. When OS X came along, that support was carried over along with the text-to-speech software and new voices.
Say What You Will
The simplest command to play with is the say command. Although it is in the Standard Additions dictionary, it is still a necessary part of the Applescript speech tools. After all, when you write a cool script that uses speech recognition, do you still want to have your Mac respond with a dialog box? Or a spoken reply?
Here’s a fun one that will make your friends and family do a double-take. Save the script below as “Thank You” in your Speakable Items folder (~/Library/Speech/Speakable Items/). This is where you will always save additions to speech recognition; the file name will be the words you must say to start the script.
set theOptions to {"You are very welcome.", "You're welcome.", "No problem, dude!", "Don't mention it.", "Forget it.", "I hope you tip well!"}
set theChoice to some item of theOptions
say theChoice displaying theChoice with waiting until completion
You’ve taught your Mac some manners! When you say, “Thank you,” your Mac will choose a reply from the list and say it back to you. Try it sometime when there’s someone looking over your shoulder, right after you have asked your Mac to perform some command.
You may have used say previously for giving feedback. The two optional clauses I’ve added above are displaying and waiting until completion. These two are only meaningful if speech recognition is on. The first displays the spoken phrase (or some other text, if you choose) above the round speech recognition floating doodad (I won’t say “widget,” since those belong on the Dashboard). The second one determines if the speech recognition waits until the spoken string is finished before continuing to listen for new input. And don’t forget that you can add the using clause to select a different voice for speaking.
Brave New Word
If, like the folks I mentioned above, you’ve played with speech recognition and couldn’t find a use for it, take a look at what you can do with a little help from Applescript. Let’s start with a simple script:
tell application "SpeechRecognitionServer"
set theResponse to listen for {"yes", "no"} with prompt "Hello. Do you like me?"
if theResponse is "yes" then
say "I like you, too."
else
say "I don't care whether you like me or not."
end if
end tell
The Speech Recognition Server is another Apple “helper” application like System Events, Image Events and Database Events designed specifically for use with Applescript. It only has 3 commands, but within those commands lies the power to create lots of speech-driven fun and usefulness.
This script uses the simplest of the Speech Recognition Server’s commands, listen for. It listens for any item in a list of phrases, words, or numbers and returns the item that was spoken. When you run the script, you will be presented with the speech recognition doodad. You’ll need to press the escape key to get the Mac to “listen” to you (unless you’ve customized your Speech Preferences, in which case use your setup as you usually do). The Mac will wait for your answer and won’t respond to any speech except the two words we asked it to listen for, “yes” and “no.”
Here’s a more practical example. I often forget to eject my iPod before I quit iTunes. And I hate having to use Expose to find the desktop and then right-click the iPod to use the context menu to eject it. There had to be an easier way, and here it is:
--get mounted disks
set theDisks to list disks
set filteredDisks to {}
--filter the list for only ejectable items
repeat with aDisk in theDisks
tell application "Finder"
if disk aDisk is ejectable then set end of filteredDisks to aDisk
end tell
end repeat
set theCount to count items of filteredDisks
--if only 1 item, we'll eject without question
if theCount = 1 then
set ejectMe to item 1 of filteredDisks
else if theCount > 1 then
--otherwise, we'll ask which one to eject
tell application "SpeechRecognitionServer"
set ejectMe to (listen for filteredDisks with prompt "Which disk do you want to eject?" displaying filteredDisks)
end tell
else
say "No ejectable disks." displaying "No ejectable disks."
quit
end if
tell application "Finder" to eject disk ejectMe
delay 2
say "Ejected disk " & ejectMe displaying "Ejected disk " & ejectMe
Save this as “Eject a disk” in the Speakable Items folder.
If you want to show your user the acceptable responses you can use the displaying {list of string} addition to the listen for command. However, the user will only see the list if the Speech Commands Window is open. And if you only want to wait a short time for a response and then go on, you can use giving up much like you do with display dialog.
Apple’s speech recognition is only designed for executing commands or scripts and not for dictation or data entry. But using some scripting, you can create scripts that fill in things for you. Here’s an example of a number entry script:
--set up variables
property numList : {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, "hundred", "thousand", "million", "negative", "minus", "done"}
set theText to {}
set negative to 1
set theChoice to ""
--give feedback that we're listening
say "number"
tell application "SpeechRecognitionServer"
--loop until we're done
repeat until theChoice is "done"
--listen for input
set theChoice to listen for numList
--accumulate our input for later processing
if theChoice is in {"negative", "minus"} then
set negative to -1
else if theChoice is not "done" then
set end of theText to theChoice
end if
end repeat
end tell
--figure out what I just said!
set theInput to interpretNumber(theText, negative)
--now input it in the current document
tell application "System Events"
keystroke (theInput as text)
end tell
on interpretNumber(theList, negative)
local hundreds, thousands, millions
set hundreds to 0
set thousands to 0
set millions to 0
--loop through the items
repeat with anItem in items of theList
if anItem < 99 then
set hundreds to hundreds + (anItem as integer)
else if anItem as text is "hundred" then
--{2,"hundred"} would be 2*100
set hundreds to hundreds * 100
else if anItem as text is "thousand" then
--bump the hundreds number to thousands
set thousands to hundreds
set hundreds to 0
else if anItem as text is "million" then
--same idea as above - allows {1,"hundred","million"} to work right
set millions to hundreds
set hundreds to 0
else
display dialog "Don't understand " & (quoted form of anItem as text) & "."
end if
end repeat
return (negative * ((millions * 1000000) + (thousands * 1000) + hundreds))
end interpretNumber
Save this as “Number Input” or something you’ll remember that is not similar to another file in the Speakable Items folder. Not a terribly long script, but now you’ve given yourself the ability to input numbers – to dictate to your Mac! If you have specialized input problems like entering passwords or other data that you hate, you can also script it to work in the same way as the above script. Use the speech recognition software to input data and System Events to input it to your current document.
Now, if you’ve opened the dictionary for the recognition application, you’ll see that it also contains listen continuously for and stop listening for identifier. These two are used together. The difference between listing “continuously” and just listening is this: While both cause your script to pause during execution and wait for a phrase to be said, the “continuously” also allows other phrases to be spoken while your script is in “listening” mode.
Why would you want this? If you are already using speech recognition, you may already understand why: Speech recognition has commands for menus, front windows, the current application, etc. all resident at the same time. If you want NO OTHER commands to be executed while your script waits, then don’t use “continuously.” If, on the other hand, you create a “stay open” script application that will execute custom commands you write, and you still want all the regular speech commands available, then use listen continuously for.
This command also requires an identifier, even though it’s shown as optional in the dictionary. The identifier is used so that when you want to quit using the list of phrases you were listening for, you can tell your script to stop listening for that list of phrases. If you want your phrases listed in the speech commands window, use with section title.
Here’s an example of a stay open script application that you might use to input things you use frequently, saving you some typing:
on run
repeat --keep listening until we're done
tell application "SpeechRecognitionServer"
--listen for phrases
set theChoice to listen continuously for {"insert date", "insert my home address", "insert my email address", "close my info"} with identifier "mine" with section title "My Info"
end tell
if theChoice is "insert date" then
--insert date
tell application "System Events"
keystroke (current date) as text
end tell
else if theChoice is "insert my home address" then
--insert address
set myAddress to "123 Main St." & return & "Independence, MO 64055"
tell application "System Events"
keystroke myAddress
end tell
else if theChoice is "insert my email address" then
--insert email addr.
tell application "System Events"
keystroke "kevinb@macscripter.net"
end tell
else
exit repeat
end if
end repeat
end run
on quit
-- stop listening
tell application "SpeechRecognitionServer"
stop listening for identifier "mine"
end tell
--remember to continue quit!
continue quit
end quit
As you can see, by combining speech recognition with System Events, you can build some really powerful timesavers!
The Last Word
I hope I’ve shown you some fun things to think about and maybe try. Apple’s speech tools are pretty darned nice, considering that for a long time no other PC OS had anything like them. Furthermore, if you enable the entire set of vocal commands (look under the “Commands” tab in the “Speech Recognition” tab of the Speech system prefs), you get a whole host of abilities, like adding appointments to iCal, getting information from Address Book, the ability to use the menu system verbally, and others.
The best piece of advice I can give you is to get a good microphone if you want to use the speech recognition. I have a set of USB headphones known as the “C-Media USB Headphones” that has a nice microphone (very sensitive) and I’ve got the sound input ratcheted all the way down to almost zero, and it hears me just fine. I do recommend either a noise-cancelling mic or a very quiet place to experiment, otherwise your TV will start launching programs!
'Til next time, have fun. Crunch code!