The Voice Recognition Software Paradox

 
Voice Software Really Is Coming

There are some major moves in the voice recognition software marketplace. IBM Research has announced that it is unleashing powerful speech software that will make speech technologies part of everyday life.

As part of an effort to bring IBM Research’s vaunted intellectual capital directly into the marketplace, the opening of IBM Research’s speech assets will make it easier for IBM customers and partners to gain a competitive advantage in new markets by integrating the industry’s most advanced speech recognition software.

Then Dragon NaturallySpeaking has now launched version 10 with appreciable improvements in speed and accuracy. As one reviewer put it,

Never mind the PC’s keyboard, here’s Dragon NaturallySpeaking
Dragon NaturallySpeaking 10, the new voice recognition program from Nuance Communications Inc., is fast enough and accurate enough to render my keyboard obsolete. According to Nuance, NaturallySpeaking 10 is about 20 percent more accurate than its predecessor. Better still, it’s about 50 percent faster, so there’s very little delay between the sound of my voice and the appearance of text.

All the other biggies on the Internet scene, in particular Google and Microsoft, are all beavering away as well. The reason for this is that the Internet will be moving in large part to cell phones as the mobile web becomes the online space of choice.

If there is a paradox it is that this is not a hot item of discussion among consumers. You can check that out by using the new Google tool, Insights. A search for voice recognition software shows the following trend:

Voice Recognition Software Trend

Voice Recognition Software Trend

In interpreting such a picture, there is a major caution to be made. Google normalizes these results against all the searches that are done. Here is how Google explains the normalization process.

Key Points Of Normalization:

*New York doesn’t appear on the list for the term ‘haircut.’ Does this mean that people in New York don’t search for this term at all?

Remember, Google Insights for Search shows the likelihood of users in a particular area to search for a term on Google on a relative basis. So, just because New York isn’t on the top regions list for haircut doesn’t necessarily mean that people there don’t search for that term at all. Consider the following scenarios. It could be that people in New York:

  • don’t use Google to find a barber or hair salon
  • use a different term for haircut-related searches
  • search for so many other topics unrelated to haircuts, that searches for haircut comprise a small portion of the search volume from New York as compared to other regions

Since Googling for answers is now the preferred research method, even keywords that are growing modestly in popularity will show a downward trend. Unless Google reveals the total number of keyword searches that are done, one cannot determine whether consumers are more or less interested in voice recognition software. What is quite certain is that, although they may not presently show massive interest versus YouTube or Facebook, within 12 months that picture will be changing.

Related: Insights Into Google Search

Technorati Tags: , , , ,

Voice Recognition Improves Hospital ER Care

Voice to text software cuts reporting delays.

Tiffani Mozingo, administrator for the Picture Archiving and Communication System (PACS) at Sky Lakes Medical Center, in Klamath Falls, Ore. had some dramatic news to report at a recent Medicexchange conference on patient safety. The PACS expedited diagnosis and treatment by getting radiological information into physician’s hands more quickly and with greater accuracy.

Use of this technology enabled Sky Lakes to reduce average TAT (turnaroundtime) for normal reports to the emergency room from almost 78 hours in November 2006, to 5.4 minutes in November 2007. Abnormal reports take less than 15 minutes. The reporting system also yielded a substantial economic benefit. Once Sky Lakes went “live” with the reporting suite, the hospital was able to reassign all 10 of its radiology transcriptionists.

This represents an astonishing improvement, and shows the potential for the immediacy and efficiency of speech technology. It suggests an exponential growth in a wide variety of uses for voice recognition software.

Technorati Tags: , , , ,

Keep Your Snoopy Eyes On The Road Ahead

 
You don’t need maps for driving now.

Given the safety concerns of using a cell phone while driving, the words of that old Paul Evans song from the 60s (Seven Little Girls Sitting In The Back Seat) may strike a chord or perhaps a ringtone is now more fitting.

Nuance Mobile and TeleNav have just made it a whole lot easier. They’re encouraging you to throw your maps away. They do this by delivering speech-enabled GPS navigation to mobile phones. Here is part of their description of what they offer:

Using the Nuance Mobile Speech Platform on select devices, TeleNav now provides the ability to enter destinations for driving directions and business category searches by voice, so that subscribers no longer need to use a telephone keypad. Users can simply state the destination address, or select from a database of more than 10 million points of interest by stating a category, such as ?pharmacy,? or by stating a specific business name, such as ?Walgreens.?

Mobile users can conveniently access TeleNav GPS Navigator on a device they already carry with them and receive information in real-time based on their current location. Voice destination entry, which makes navigation services easier to use on mobile phones, has the ability to significantly enhance an already fast-growing market for mobile navigation. Industry analyst firm, In-Stat predicts that the total number of mapping and navigation mobile phone subscribers could exceed 70 million worldwide by 2012.

It certainly seems to be the right time for such voice technology.

Technorati Tags: , , , , ,

Voice Search On Mobile Phones For A Better User Experience

 
Fingers do not walk well on cell phones.

Bill Meisel, president of TMA Associate and the non-profit Applied Voice Input Output Society arranged the Voice Search Conference held in San Diego, California, March 10-12. One question posed there, according to Usability News, was Will Voice Search be THE Usability Breakthrough for Mobile Phones?

The dilemma according to Meisel is:

It’s not unusual for user interfaces to get “stuck” on one model. The layout of keyboards hasn’t changed for decades, for example, despite some efforts to make it easier to use (by putting oft-used letters under the strongest fingers). The telephone’s 12-button keypad is similarly persistent. Persistence of the user interface is a major barrier to increased use of mobile devices beyond communication.

He believes that “voice search” will come to dominate mobile phones. He has some powerful companies who share his view: just think Google, Microsoft and Nuance to name but three. This is an idea whose time has come.

Technorati Tags: , , ,

Voice Recognition Technology To Stop Hospital Cross-Infections

Hospitals like our homes are becoming increasingly filled with computers. There are clearly many benefits but the computers bring with them certain risks. As a recent article by Steven Reinberg pointed out, Stomach Flu Is Spread By Contaminated Computer Keyboards. As a U.S. health officials report stated, the highly contagious norovirus, often called the stomach flu, can be passed from one person to another through contact with commonly shared items such as computer keyboards and computer mice.

Steven Davidson and Gregg Malkary even go so far as to call mobile computers Dangerous Devices. They advise that:

Hospitals should explore opportunities to invest in mobile computing devices that can be more easily cleaned and sanitized at point of care with standard commercial cleansers. These devices ideally would be water resistant and hermetically sealed to prevent the entry of microorganisms.

Developing such devices represents some real challenges. Mobile computers in hospitals are in some cases called COWs: that stands for computers on wheels. Perhaps there’s more in the acronym that meets the eye.

 
Let your voice do the walking.

Another medical announcement this week may suggest another complementary approach. The Pembroke Regional Hospital Board recently approved the purchase of a new voice recognition dictation system for its diagnostic imaging department. Catherine Junop, vice-president of human resources and organizational services at the hospital, said such systems are becoming the standard in hospital used by radiologists to file their reports. The chief advantage of course is that the System speeds medical information sharing. It also incidentally means that less fingers need to touch keyboards. Perhaps a small side benefit of voice recognition technology in hospitals will be less opportunity for the transfer of infectious diseases.

Technorati Tags: , , ,

Fingerless Cell Phones

 
Lose the keyboard then the display.

Engadget Mobile often has some intriguing glimpses of what may be in our future. On the left below is their picture of what is rumored to be the Samsung i900. In their opinion, this is a phone that takes the keypadless, finger-friendly formula that’s oh-so-popular these days and injects some Windows Mobile 6.1 into the equation. They feel it will be a strong competition for a similar phone, the LG KS20.

Using a touchscreen clearly is becoming more popular with the Apple iPhone. However finger-friendly this way of working may be, it’s still not a very precise way of controlling your phone. Perhaps this is only a step towards what we have suggested before, the ultra-simple keyless cell phone. Cell phones are above all devices for handling sound. Why use any other kind of input? The image below on the right shows how simple it might be.

Samsung i900
keyless cell phone

Technorati Tags: ,

Speech Technology Will Be Really Big – Watch Google

 
Phonemes wanted – talk to Google

If you want confirmation that speech technology is the next big technical and economic opportunity, then keep an eye on Google. This year they encouraged the formation of the Open Handset Alliance. This undermines the walled gardens created by the existing telecom companies. The picture now is very much a more level and competitive playing field.

It is interesting to see how Google is now developing its own stake in what will be a highly profitable marketplace. Marissa Mayer, Google’s vice president of Search Products & User Experience, in an interview (Google wants your phonemes) revealed one part of the effort.

You may have heard about our [directory assistance] 1-800-GOOG-411 service. The reason we really did it is because we need to build a great speech-to-text model.

The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. … So 1-800-GOOG-411 is about that: Getting a bunch of different speech samples so that when you call up, we can (understand) with high accuracy.

This approach is adopted because Google Is All About Large Amounts of Data. Peter Norvig, director of research at Google, believes the following:

The way to get better understanding of text is through statistics rather than through handcrafted grammars and lexicons. The statistical approach is cheaper, faster, more robust, easier to internationalize, and so far more effective.

We wanted speech technology that could serve as an interface for phones and also index audio text. After looking at the existing technology, we decided to build our own. We thought that, having the data and computational resources that we do, we could help advance the field. Currently, we are up to state-of-the-art with what we built on our own, and we have the computational infrastructure to improve further. As we get more data from more interaction with users and from uploaded videos, our systems will improve because the data trains the algorithms over time.

Google is certainly in a privileged position to gain access to large amounts of data that can be used to improve other services. However it seems somewhat paradoxical to be using number crunching to better understand language and speech.

Others take a different view. For example, Powerset is building a consumer search engine based on breakthrough natural language processing technology licensed from PARC and developed internally. The search engine aims to leverage the structure and nuances of natural language to ultimately transform the way humans interact with computers.

It will be interesting to see which approach wins out.

Related: Can You Hear The Future?

Technorati Tags: , ,

Sound Will Drive Mobile Web Growth

Until now mobile Web growth has been slow

Although cell phones are ubiquitous, only limited numbers of individuals use them to surf the mobile Web. The telecom companies have not helped mobile Web growth by often charging excessive broadband transmission costs. In addition the cell phone has limitations with a small display screen and tiny keys for input. Although the slow rate of growth might appear unsurprising, there is a better way.

The Logic For Sound Is Strong

Whereas the keyboard interface with a cell phone is far from ideal, the cell phone is designed expressly to handle sound. In parts of the developing world, the cell phone will be the only device that many will use to surf the mobile web. The huge potential of these developing world markets will well justify the investment needed to develop strong voice recognition systems.

Market Readiness Is Poor

As yet, the market readiness for sound based mobile systems does not seem to be strong. This may be caused by two factors. Consumers may not be aware of what is possible with voice recognition technology and are therefore not expressing a desire for mobile devices that involve it. The other reason is that systems developers have grown up with systems that use keyboard inputs. These can be so effective that developers may not wish to be challenged by what they see as a more noisy method of communicating with the device.

Heartening Signs This Month

Starting with GOOG-411 and CALL-411 voice-based search systems, there have been a number of announcements that herald signs of growth.

  • Voice on the Go Inc. announced that their Voice on the Go(R) service for the Apple iPhone (PDF) is now available. It enables hands-free access to all of the features on the iPhone using voice recognition software, making it a lot easier to use your iPhone while driving a car. (Tips of the hat to Edward Kirk and Patrick Moore)
  • TravellingWave, a next-generation mobile user-interface technology, announced an “always-listening mode” (PDF) in which mobile users will no longer be required to press a button to switch modes between Keypad and Voice-Recognition. The TravellingWave VoicePredict? product is a highly accurate speech interface with an always-listening mode, with high levels of noise reduction. This enables users to speak-and-type text into a mobile device faster and more easily. The VoicePredict interface further enables users to speak from the same distance that they usually type, so that users do not need to speak close to the mouth or use noise-cancelling microphones. (Tip of the hat to Carolyn Mathas)
  • Traffic.com launched TrafficOne Mobile, which consists of easy-to-integrate, packaged solutions to deliver high-quality, personalized real-time traffic information across popular mobile platforms. Via the Mobile Web, a Wireless Application Protocol (WAP) site provides quick and easy real-time traffic via any wireless browser. There is also an Interactive Voice Recognition (IVR) hotline for cell phone users giving automated toll-free and hands-free access to traffic information.
  • Vlingo has launched the industry’s first developer program for mobile application providers that offers a way to integrate a voice user interface into their applications with no up-front costs. (Tip of the hat to Fraser MacInnes)

What are the majors doing?

Given Bill Gates‘s involvement in sound based technology, Microsoft clearly is the one to watch. Microsoft shipped Windows Vista to consumers last January with a heavy-duty voice recognition system that allows it to do far more than just recognize simple voice commands. More recently, Live Search Maps v2 is now out, the so-called Gemini version. With this you can even talk to your map, meaning that client-side voice input is now available with Live Search Mobile.

An equally strong player is Nuance Communications. Nuance , well known for automatic speech recognition and speech synthesis software, have acquired a whole series of companies this year to allow diversification in mobile and vertical markets like healthcare. On the mobile front, Nuance bought two interesting companies back in August, namely Tegic and VoiceSignal. Tegic are the company responsible for T9, the predictive text system used on many mobile phones. VoiceSignal have embedded speech technology that can be found inside handsets from Blackberry and Palm today. They also power mobile phone technology for the visually impaired, with the ability to speak the text from a mobile screen so that you know where you are in the menus. (Hat tip to Martyn Davies)

To an extent the unknown quantity here is Google. The Gphone rumour is never completely quashed and it may well be that they come out with a very cheap and innovative mobile device suited for the huge Asian markets.

It is quite clear that these three companies are highly attracted by the potential of the sound based mobile Web. Their competitive efforts to gain the lion’s share will certainly boost the rate of growth of this marketplace.

Technorati Tags: , ,

Free 411: GOOG or CALL

Both Google and Microsoft have now introduced their voice-based local search facility for your phone. This service is only currently available in the United States. Even though I live less than 20 miles from the US border neither service works here. Last week it was a little clearer. Calling GOOG-411 ( 1-800-466-4411 ), I was told that the service was not available. Calling CALL-411 ( 1-800-225-5411 ), I got a busy signal. This week it is less satisfactory. The Google service gives me information on Langley, Washington when I ask for Langley, British Columbia. The Microsoft service tells me repeatedly, “I didn’t get that”. Presumably it’s only a matter of time until both services are available here.

Tom Spring of PC World was able to do a matched comparison. In his opinion, the result currently is a draw. Both services delivered the correct result, with Google taking a little longer since it repeats the request for confirmation. He has an interesting comment that the Google service is much simpler while the Microsoft service is slightly confusing in offering more choices. It’s perhaps no coincidence that this mirrors how both approach the regular keyword search. Google has that beautifully simple search page. Microsoft usually offers search within a portal page that flags the other services they have available.

It would be interesting to know whether either or both do user tests in deciding which format they will follow. Usability or the science of creating satisfactory user experiences regrettably does not receive the attention it should. Watching how typical users complete tasks as they use a particular service or website is an easy way to confirm that the best choices are being made. Given the expense of creating such services, it would seem foolhardy not to spend the limited extra dollars involved in checking whether the users think you have it right.

Related:
GOOG-411, A Harbinger Of The Mobile Web
GOOG-411 or CALL-411 – Voice-actuated Mobile Web
BTW, Live Search 411 Is Taking On GOOG 411

Technorati Tags: , , ,

Microsoft Outdoes Google In Voice Mobile Search

Although Google may have got more of the headlines this week for its mobile search with GOOG-411 coming out of the labs, if anything Microsoft has been doing more. It’s not just CALL-411 announced this week. The Microsoft live search blog gives details of the latest innovations. They now have Voice Input and Gas Prices in Live Search for Windows Mobile. They’ve also introduced a Beta version of Live Search for BlackBerry? devices. In addition they have improved mapping and directions in Live Search via a mobile browser.

Perhaps it’s not surprising to see this flurry of activity. In a recent interview, Bill Gates confirmed that he will be involved in only a handful of priority projects from now on. However voice technology is one he puts particular emphasis on. That’s good news. The mobile web is to an extent held back by the poor usability of mobile devices. Most mobile devices have good voice technology. It’s clear that voice recognition technology is a strong contender for delivering better user experiences.

Technorati Tags: , ,

Search the Internet for other related articles.
Loading