Existing ways of website interaction
Computer mouse is the simplest and most trivial way you can interact with a website. It’s so standard that even devices that aren’t meant to have it try to emulate it somehow. Another interaction model is represented by touch-devices which are especially promoted by Apple. Although nowadays the majority of mobile devices are touch-sensitive, this interaction style is not so widespread in PC computers.
There’s a Pointer Events software specification which combines three of the most widely used interaction instruments, which are computer mouse, touch pen, and touch events. In other words, Poiner is an abstraction method which enables users to switch between all these instruments as and when needed.
Apple, which is one of the biggest tech market players, supports the Touch Events specification and ignores Pointer Events, which largely explains the fact that the latter still remains relatively unpopular.
Another interaction method is the gamepad, also known as joystick. Gamepads have a high support capacity, which is why we can easily use them to work with websites. Other universal interaction instrument is the keyboard. As for computer mice, it is so prevalent that keyboardless devices still include a visual keyboard or try to simulate it in other ways. Moreover, keyboard is a part of another important website interaction method that also involves screen readers, which is an assistive technology for people with special needs.
Accessibility (A11y)
Apart from website usability, website accessibility is becoming another major concern for website developers. Accessibility usually means making website accessible to people with special needs. For example, traditional website layouts can be uncomfortable to use for people with visual impairments. But making websites accessible proves to be an extremely complex task as developers not only have to weave it in websites’ composition layouts, but also make significant changes to attributes and specification.
The other issue is that accessibility does not yield profits, which makes it less of a priority in the eyes of people commissioning the websites. But Alexey Okhrimenko states that there is a way to address this problem. Voice interaction will not only improve websites accessibility, but also make the interaction much more interesting for all users, which in turn will have positive implications for websites’ commercial success.
“Voice interaction will be the industry’s standard. It will become as prevalent as computer mouse and keyboard are today, as voice is the integral part of all human beings that helps us in our daily lives,” says the expert.
WebSpeech API: an unconventional way of improving accessibility
WebSpeech API consists of two basic specifications: SpeechSynthesis, which answers for speech generation, and SpeechRecognition, which makes human speech comprehensible for computers. While the former is very widespread and accounts for 80 % of all devices, the latter is not so omnipresent, which makes it less well-researched and thus limits its use. To solve this, Alexey Okhrimenko created a presentation hand-out (available in Russian) with detailed information on how to use these specifications.
The NLP obstacle and how to overcome it
As we still have no instruments to help us speed up the process of speech recognition, developers have to make do with the Natural Language Processing (NLP), which is the science of how to work with speech algorithmically. But it will take more than 20 years to fully master that. So developers came up with a ‘dirty trick’ which is now actively used by tech giants like Amazon, Apple, Yandex, and Google in their virtual voice assistants (Amazon Alexa, Apple Siri, Google Assistant, and Yandex Alisa).
Sequential dialog (Intent)
This ‘dirty trick’ is the sequential dialog technology. All voice assistants significantly limit the areas of incoming conversation, reducing it to a mere series of commands. Humans interact with computers through communicating their objectives, which is known as specific programmed interaction. Algorithms learned to comprehend these objectives by converting them to abstractions which are easy to process. And this requires simple algorithms and search engines.
How to organize dialog
SpeechGrammar is an instrument that makes it possible to create dialogs. The JSpeech Grammar Format JSGF specification was created by SunMicrosystems, an Oracle corporation company. The developers issued a public statement allowing everyone to use the specification.
“The more we use SpeechRecognition, the more problems we encounter. For example, what will happen if someone accidentally uses a number instead of a name or indicates parameters in the wrong order? In this case, even the SpeechGrammar specification would be of little help when building a dialog. But there is a solution: the Google Home Mini device, which is a loudspeaker equipped with a microphone. All we have to do is to connect it to Wi-Fi and start communicating,” explains Alexey Okhrimenko.
Building a Google Home Mini-based website interaction system requires up to six hours of work (and that estimate excludes the time spent on reading the setting-up guidelines), a free Google Assistant program, as well as the device itself, which will cost you from 3,000 to 5,000 rub depending on a shop. Another essential is the Dialogflow service; it helps set up Google Assistant dialog apps and does all the maintenance chores like dealing with mistakes and correcting word sequences. Dialogflow also works with Yandex Alisa. You’ll also need a backend which will work as a master link. Here you can learn more about how the Google Home Mini system works.
“We understood that WebSpeech API can help us create a comprehensive voice interaction interface which will provide for more effective user-website voice communication. We’re also working on making the interface more accessible, and we have full control on how it looks and can read it in different ways. We’ve also developed an alternative option which allows us to take a peek into the future when SpeechRecognition will spread to all the platforms out there,” comments the expert.
He also added that apart from helping people with special needs, voice interaction adds more interest to the developers’ work by making them think outside of the box. So it’s a win-win for both users and website creators.
“Essentially, it’s a new toy for developers. It may be that in 10-20 years we’ll get used to it and it’ll become hackneyed and boring, but now everyone’s really intrigued. And for a good reason: it makes website interaction so much easier. You won’t have to go through the faff of going to your favorite website, looking for the search bar, typing something, and then painstakingly scrolling through the search results. You’ll simply ask the website to find what you need, and it will react immediately. The WebSpeech approach isn’t new, it has been in use for 2-3 years already. But you can only find it in small demos or toys, and WebSpeech voice-sensitive interfaces are far from mainstream. The SpeechSynthesis and SpeechRecognition specifications still aren’t good enough to enable a smooth voice interaction, so we have to circumvent it using alternative methods. That’s why tech companies put so much effort in promoting their voice assistants,” concludes Alexey Okhrimenko.
You can watch the highlights of Alexey Okhrimenko’s and other Web Standard Days speakers' presentations by following this link.