December 2011 - AI as AT: Artificial Intelligence Technology with Crossover Potential |
![]()
Dr. McCoy is a computer science professor doubling as a linguistics professor. Her areas of research encompass AI, natural language generation, understanding discourse phenomena, rehabilitation engineering, augmentative and alternative communication (AAC) – and assistive technology. Like other researchers with comparable backgrounds, her research on artificial intelligence has drawn her closer to AT and to communication issues confronted by people who are blind and by people who have disabilities affecting their ability to speak. Dr. McCoy is currently aiding in the ongoing development of Interactive SIGHT (Summarizing Information GraHics Textually), a contextual description of information graphics such as complex charts and graphs in newspaper and magazine articles. Although designed for blind students, the system has the potential to help sighted students with cognitive disabilities as well as neurotypical students. She expects that the crossover potential of other AI-based graphics systems and AT-type devices is only now being tapped by interdisciplinary researchers and designers. Kathleen McCoy, Ph.D., Speaks “I was very proficient in math in middle school and high school. Yet in college I didn’t want to major in math. Then one of my math instructors whispered ‘computer science’ in my ear. I had no idea what computer science was but quickly learned to love it because it was a discipline in which I was able to use my math skills to obtain direct results. I loved psychology and learning about what made people tick.” The confluence of those three interests made artificial intelligence a natural avocation, she recalls. The pivotal moment that sealed her permanent connection to AI and linguistics occurred in a psychology class. “We were discussing the use of computers to model human behavior, a process that included a language component. Language was another of my interests. I was immediately hooked on the potential of the total package -- math, computer science, psychology and language.” Her link with AT was forged when Patrick Demasco, an AT professional who ran an AAC center affiliated with the university, encouraged her to assist him in developing AAC devices for non-speakers. “I was most intrigued by individuals who were very proficient cognitively but who were unable to speak. I thought, ‘What a useful application for the work I’ve been doing for years, trying to understand language, modes of communication and how important communications capabilities are to those who can’t communicate!’ At that point I realized I had the ability to make a positive difference in the lives of these individuals by enabling them to communicate much more effectively. This quickly became my mission – and it still is.” We invite you to read, and share with others, the research-based perspective of Dr. McCoy and her insights regarding the positive impact of artificial intelligence and related technologies on the communication capabilities of individuals with disabilities. AI as AT: Accelerating the Communication Rate An Interview with Kathleen F. McCoy, Ph.D.,
Non-speakers, she explains, “communicate via speech generating devices which allow users to enter text which is then synthesized as speech – but the process of inputting text is often very slow, especially for users who have limited motor skills.” The slow rate of communication, she explains, was caused by users’ need to type their messages via a keyboard. Accelerating that communication rate, Dr. McCoy states, has been a mission made easier thanks to the advent of artificial intelligence (AI) as assistive technology combined with evolving techniques in natural language generation. Defined as the production, in writing, of an explanation in a context that is understandable to a specific audience, natural language generation has been the object of much of Dr. McCoy’s research, with AI eventually serving as the processing mechanism. AI and Natural Language Processing Are Compensatory Tools Modeling, she adds, is the key to natural language generation. “If I have a model of how an individual will perform a task or how he/she will need to speak in certain situations, I can program a device to stay a step ahead of the user. If the user becomes stuck somewhere in the process the device can provide a clue as to what an appropriate utterance might be. My job when modeling for natural language generation is to recognize which model is most appropriate to meet the language processing needs of a specific person in a specific situation.” To this point, however, most of her work has been in an experimental vein during a 25-year foundation-building research continuum. AI and Natural Language Processing: the First Brush This project, Dr. McCoy notes, “utilized computerized speech and natural language generation technology to put syntactic tissue around words like apple, eat, John in order to produce a phrase or sentence like, ‘The apple was eaten by John.’” She explains her thought process as she devised syntactic support: “I know what apple means; I know what eat means and I know what John means – and I know how those words can fit together in a sentence. Apples don’t eat. People eat -- food items. What’s needed is a full sentence that can be generated by the AAC user’s device so that listeners can understand the meaning in a way that would be appropriate for the user to have uttered it without the speaker having to perform time consuming keyboard work to generate, thus disrupting conversation flow.” In that example of early natural language processing, she says, “I used semantics about the meaning of individual words; I used ‘eat’ and the nouns that ‘eat’ expected semantically. I also used syntax such as adding “The” before ‘apple’ and adding a tense. I used ‘eaten’ a past participle verb, in order to produce a full sentence to be generated in the correct context.” In cases where the user’s original input was ambiguous or the word order could not be maintained, the system presented the user with several utterances that they could choose from. Dr. McCoy and her project research partner, Patrick Demasco, found the verbal knowledge base required to fuel the system too cumbersome. “Our prototype system in 1992 took us quite a way toward our goal but was never actually affixed to an AAC user’s wheelchair.” SceneTalker: Speeding Communication by Storing Whole Utterances Her vehicle for stored utterances is SceneTalker, an utterance-based AAC system that facilitates the selection of full messages. Devices such as SceneTalker, she explains, “have the potential to significantly speed-up the communication rate, but they pose interesting challenges, including: anticipating text needs, remembering what text is stored, and accessing desired text when needed.” Moreover, she adds, “using such systems has profound pragmatic implications, as a pre-stored message may or may not capture exactly what the user wishes to say in a particular discourse situation.” SceneTalker, she notes, is a prototype research-driven, utterance-based system designed to speed the communication rate for speech-impaired individuals by storing whole utterances for use in contextually appropriate situations that require multiple responses. She cites going to a restaurant as an example. “Most of us carry in our heads a script for what transpires when we visit a restaurant. The typical scenario calls for us to ask for a table, be seated, order appetizers and drinks, order a main course and dessert, interact with the waiter throughout and pay at meal’s end. Easily accessible utterances would be stored for each of those scenes.” Although the potential users of this system are non-speakers, she says, it could also be effective for individuals with cognitive disabilities by enabling them to step through prescribed actions supported by appropriate contextual language, a process, she says, that qualifies as a cognitive aid. Interactive SIGHT: Accessing Graphics in Popular Media SIGHT, which is implemented as a browser extension, works on simple bar charts. Once launched by a keystroke combination, SIGHT first provides a brief initial summary that conveys the underlying message of the bar chart along with the chart’s most significant features. The system then generates history-aware follow-up responses that provide further information upon user request. According to Dr. McCoy, SIGHT user evaluations with sighted and visually impaired users revealed that “the initial summary and follow-up responses are effective in conveying the informational content of graphics and that the system interface is easy to use.” Currently, SIGHT provides contextual synopses of simple bar charts and line graphs found in popular media, but its vision component is somewhat limited. Work continues to extend the capability to include other types of graphics, such as grouped bar charts or line graphs with multiple lines. SIGHT is delivered via a browser helper object (BHO), a component of Microsoft’s Internet Explorer web browser application. The BHO, she explains, is an add-in designed to provide or expand the functionality of the browser and allow developers to improve the web browser with new features. Declares Dr. McCoy: “When people who are blind ‘read’ an article, the graphics incorporated into the article are largely inaccessible. If the article is on the web the graphic is supposed to contain ALT text, which enables text written by the graphic’s author to be displayed by the browser when the cursor is run over the graphic. But the ALT text is often absent.” According to Dr. McCoy, there have been some attempts to make graphics accessible for people who are blind. “Much of that effort has been focused on what I call ‘scientific graphics’, i.e. graphics utilized for visualization purposes; when I run an experiment and want to understand what happened, I’ll graph my data, a process that lets me see the relationship between the items I’m graphing.”
Dr. McCoy’s team chose information graphics in popular media (such as USA Today) as the SIGHT experimental vehicle because such publications, she notes, make use of information graphics to enhance the content of an article. In USA Today, for example, “there’s a daily graphic on the bottom of the front page. My eyes travel immediately to that stand-alone graphic. I look at it; I don’t study it. If I’m interested in the content I’ll examine it more closely. “We find that in popular media each graphic has a purpose. Each graphic provides a message or a nugget of information that is sometimes not repeated or referred to in the story text. Someone who is blind and does not take note of that graphic and its content misses that part of the article. Those are the graphics we focus on.” Such graphics, she states, can serve multiple purposes. “When I read those graphics I don’t examine every data point. I glance at the graphic to get a sense of what it’s trying to tell me. Why? Because the graphic designers designed that graphic to illustrate a single point. The designers may have elected to use a bar chart because they wanted to highlight a specific message. They may have chosen to color one of the bars differently than the others so that they could emphasize that bar. “There are choices that are made in design. We as a team try to recognize the specific message, the underlying intention, of a graphic. We generate a piece of text that clearly presents the graphic’s major point of emphasis by analyzing the communicative signals that the graphic designer employed.” That emphasis, she continues, “can be in the form of colors or the type of graphic used. There’s information in the caption, although a caption alone rarely tells a reader the entire message. Recognizing that there are graphics to which the reader returns due to piqued interest, SIGHT enables the reader to obtain more information. We highlight a graphic’s main thrust -- but more information is also available for the interested reader.” Depending on the severity of the vision loss, vision-impaired users, Dr. McCoy explains, might use a screen magnifier or, for those with more severe vision loss, a screen reader such as JAWS to access documents. “We’ve conducted evaluations of our system with individuals who are blind or visually impaired and who use screen readers and screen magnifiers. Screen magnifiers can be effective for individuals with some vision impairment because the device can adequately enlarge the type size to facilitate readability. Often a graphic can be seen with a screen magnifier but not understood because it has to be enlarged so much that only a small portion of it can fit on the screen at a time. Our system is helpful for these individuals. “When the screen reader device encounters a graphic, it simultaneously encounters ALT text which the SIGHT system has added, which indicates the presence of an information graphic (bar chart or line graph). The ALT text instructs the reader to press “GPS for the Blind” and Other Emerging AT Trends Wayfinding – “Some members of the Association for Computing Machinery (ACM) are becoming increasingly interested in wayfinding (http://www.ap.buffalo.edu/idea/udny/section4-1c.htm), the ways in which people and animals orient themselves in physical space and navigate from place to place. “If a blind person is in a building with which he/she is unfamiliar and is attempting to locate a specific room in that building, he/she not only wants to reach the right destination but desires to do so without colliding with objects such as walls, doors and furniture, for example. He/she needs to find a route, even if the individual is not only blind but also in a wheelchair – in which case locating a stairway won’t be helpful, but an elevator, on the other hand, would be. In the formative stages are small devices that can help plan an internal or external route and disseminate that route in a way that it will be helpful. It’s like GPS for the blind.” Wayfinding, she says, continues to be assessed for its usefulness among those with cognitive impairments.” Such a device would, for example, aid a child in finding the correct bus or help a child if the child has boarded the wrong bus by providing proper directions and cues.” There are wayfinding devices in the prototype phase, she points out. Games for the blind – “Guitar Hero was a very popular game that still has fans,” Dr. McCoy recalls. In Guitar Hero, players are provided with their cues on-screen. “To date, individuals who are blind have not been able to participate. Now, however, researchers are rigging finger sensors which provide a pulse that indicates to blind players which key should be pressed.” Speech synthesizers – “Blind people read articles through screen readers. Often, however, a reader who is blind and wants to forge ahead quickly through the material accelerates the presentation so that the words are crunched together. While that accelerated presentation is incomprehensible to sighted people, blind individuals who are accustomed to it can understand it. There was a study presented this year on how understandable different voices are to people who are blind when used at this fast speed. This would be helpful in choosing a voice for a screen reader. This selection process is important for sighted designers to consider and understand. An understandable voice reading at what sighted people consider to be a normal pace might not be effective when accelerated. This highlights the importance of analyzing and evaluating the effectiveness of technology with people with disabilities as they might use technology in unique ways. This study is a reminder to the field to ascertain how individuals with disabilities utilize technology and that what is deemed effective in one situation might be far less effective in another situation.” The study, she continues, is a lesson for the computer societies that are considering the application of computer science for individuals with disabilities. “We need to engage the target users in order to ascertain how they use the technology. If I invent a device that’s effective for someone without disabilities it may not be effective for the target population because the target population will use the technology differently.”
Sign Language and mobile devices -- For the past several years, Dr. McCoy says, ACM conferences have focused on the needs of individuals who are deaf and use sign language. Sign language recognition has been a hot topic. Systems are under development that can recognize sign language and transmit sign language via video and even mobile phones. Speed is an issue. The typical video runs about 30 frames a second. That’s much too slow for signing which might involve very quick movements. Here, too, issues of comprehensibility are being examined. We’re asking, ‘What’s the most important part of this video to get across?’” Sign language avatars — Sign language avatars – animated figures who sign —- are being designed that transform spoken words into sign language. “Researchers are trying to ascertain what is most important to capture from that avatar. It’s not just the hands and the hand shapes; the face and facial expression are also factors whose significance is being judged. At this year’s ACM SIG (special interest group) ACCESS conference there was a full-day workshop on sign language translation and avatar technology.” Story telling for non-speaking children -- A child-friendly system on a mobile device are being designed to accommodate the following scenario: “There’s a non-speaking child in a school who wants to tell his/her parents about what’s occurred in school each day. This is a very difficult task for a non-speaking child. Inputting a daily account into a device would require too much time and effort on the child’s part even if the child was able to master the input process.” One solution calls for the child’s school-based caregiver to input a message for the parents. Researchers are working on a system that would combine messages spoken by teachers and caregivers during the day with other information to allow a child to tell a story of interesting events that occurred that day. For instance, she says, “a system that would recognize bar codes on swipe cards and classroom doors that log in a child’s entrance or departure. This system is connected offline to the child’s class schedule. Say the child enters a math class which is taught by a substitute teacher. The system would assemble a message that reads, ‘I was in math class today but Mrs. Smith was absent.’ The child could then use his/her device to relay that message to the parents. The system also enables a caregiver to take a photo with an accompanying voice message that also becomes part of the child’s daily story for parental consumption.” Phonic Stick – Another system for non-speakers, Phonic Stick (http://www.dundee.ac.uk/research/media/D373PhonicStick.pdf) , operates without the need of a visual interface and generates phonemes, the smallest phonetic unit in a language capable of conveying a distinction in meaning, such as, in English, the m in ‘mat’ or the b in ‘bat’. “When fully developed,” Dr. McCoy says, “the device will enable users to blend phonics into any words by positioning a joystick and to ‘speak’ these words using a speech synthesizer.” AI for people with aphasia – Under development is artificial intelligence technology designed to help people with aphasia who have lost the ability to articulate ideas or comprehend spoken words. “A speech generating device I saw this year at a conference emulated the way someone with aphasia speaks by inputting a normal spoken sentence and transforming it. Aphasia is an area of hot interest in which several organizations are developing devices. Lingraphica (http://www.aphasia.com/?gclid=CJTrn7-_0K0CFUPd4AodfRdYlw) has been active in this sector for some years but now others are active as well.” Yesterday’s Magic, Today’s Technology “Years ago, when the movie appeared, AI was thought to be a world-changing field. What we’re finding is that much of the AI capability highlighted in the movie is incrementally becoming part of everyday technology. For example, Google can be utilized as a translation vehicle when visiting abroad. I used it often when I visited Italy recently. The appearance of AI features in technology – including AT -- is now becoming almost commonplace.” Because the technology has become more ubiquitous, the public, she contends, may no longer view today’s artificial intelligence as exotic and dramatic. “But it’s the same technology that intrigued Stanley Kubrick and Spielberg when they conceived the scenario for AI. The magic of artificial intelligence is that the magic isn’t magic anymore.” RESOURCES ARTICLES Utterance-based AAC systems have the potential to significantly speed communication rates for individuals who rely on speech generating devices for communication. The challenges in designing such systems include: anticipating text needs, remembering what text is stored, and accessing desired text when needed. Among the pragmatic implications associated with these systems is accurate appropriate message capture for pre-stored messages during discourse sessions. The authors describe a prototype of an utterance-based AAC system whose design choices are based on findings from theoretically driven studies concerning pragmatic choices with which the user of such a system is faced. These findings are coupled with cognitive theories to make choices for system design.
MyTalkTools.com Geek SLP
The Language Stealers
KNOWLEDGE NETWORK MEMBERS Adaptive Technology Center for New Jersey Colleges Alexander Graham Bell Association for the Deaf and Hard of Hearing (AG Bell) U.S. Society for Alternative and Augmentative Communication (USSAAC) Center for Hearing and Communication (CHC) Funding provided by the US Department of Education under grant number H327F080003
Project Officer: Jo Ann McCann Project Director: Jacqueline Hess Newslettter Editor: Thomas H. Allen Design & Distribution: Ana-Maria Gutierrez |