Why should you consider Embedded Speech Technology?

Favicon Vivoka Author

Written by Aurélien Chapuzet

Aurélien Chapuzet is the Digital Marketing Manager at Vivoka, leading content creation and marketing strategies.

10 September 2021

Embedded speech technology, or at least the term embedded, may be familiar to you. Where the tech world touts connectivity for its many benefits, voice technologies that run on-device, without server calls, are also popular for many reasons. In this article, we break down what embedded voice is, with its contributions, sectors and equipment, but also its possible alternatives and extensions. Often “Less is more”, embedded is the definition!


What is an embedded system?


What is called “embedded” in the field of voice technologies corresponds to the broader definition of the term. These are hardware-based systems (e.g. microprocessors or microcontrollers) designed for specific tasks. They can be considered as independent systems or as links in a larger device. The word “embedded” echoes the housing of the technology, in this case directly in the device.


The operating principle of embedded systems


Embedded technology differs from cloud technology in its operating principle. In this context, all computing processes are carried out in the “host” device, based on the technological resources available. The complexity of embedded systems, both in terms of constraints and potential, is therefore highly variable. Indeed, from a simple microcontroller to a complex assembly of microprocessors and several peripherals, the possibilities are quite different. These compositions obviously vary according to the intended purpose of the system.


A principle adapted to embedded speech technology


Voice technologies are not a bad thing for the embedded world. At a time when the cloud is attracting attention for its capabilities, the embedded world is also growing. In this field, we find the majority of existing technologies: ASR, TTS, Wake-up Word, Language Processing, Voice Biometrics etc… All these solutions are developed this time on much lighter models, remember, the embedded must work on the resources that the system is able to provide. However, less resources does not mean less performance. With the growing appeal of embedded technologies and in particular the integration of machine learning into our smartphones and other mobile devices, the research world is focusing as much on optimising machine learning models (to reduce training costs, model size and inference times), as on the accuracy of those models. Performance is becoming as important as the result.


Why choose Embedded Speech Technology?


Embedded is Private by design


The on-device operating principle offered by embedded speech technology gives it a key feature in the tech world today: Privacy by Design. What does this mean? The notion of design echoes design, as embedded systems are imagined and built on the principle of processing processes locally, which makes them naturally immutable to the issue of private data. It is simple, data does not circulate, everything is processed locally, without calling on remote servers. This faculty is nowadays very much in demand for several reasons.

Firstly, when products and services are offered to individual end-users, the processing of personal data is all the more sensitive. This personal data is discriminatory in nature and falls within the scope of the privacy of individuals. Cloud technologies are extremely challenged by these considerations today. The GDPR (General Data Protection Regulation) has further increased the sensitivity of such data processing, to learn more you can read our blog post dedicated to this topic. Even with data anonymisation or encryption protocols, as long as the transfer is possible through connectivity, the data is not totally secure.

This is where embedded technology offers great promise to manufacturers. It allows them to completely get rid of these issues and offer total data security. For certain sectors of activity, which we will discuss later, this is not simply a matter of user comfort but an essential prerequisite.

To qualify this, it is important to consider that data is a primary force for companies at different scales. Hybrid models exist, mixing embedded processes with possible connectivity for certain functionalities, which we will discuss later in the article.


Embedded is not dependent on an Internet connection


Directly linked to the previous point on Privacy by Design, the dependence on an internet connection is an additional constraint to be taken into account. When creating a product or service, it is necessary to ensure that it is accessible to users, given their environment and access to the Internet.

Today, it seems unthinkable that the internet is not available to everyone. However, the reality is quite different. On the one hand, the coverage of internet networks is not complete on our planet. For example, a cloud-based voice assistant in an off-road vehicle designed to cross mountains or deserts would have great difficulty functioning because Internet access is rare in this type of environment. In addition to this aspect, in many cases, the design of products and services isolate connectivity for protocol reasons. The defence sector, for example, favours embedded systems because the connectivity of tools is a security issue.

Embedded technology here is getting rid of externalized services on the cloud for its computational processes. This is one of the reasons why embedded speech technology is used in specific applications where the conditions of use must meet strict specifications. As mentioned earlier, machine learning models today are sufficiently powerful and optimised to offer hundreds of use cases without problems.


Embedded Speech Technology has a small technology footprint


Being dependent on the resources provided by the device in which they are hosted, embedded technologies need to be optimised. This constraint at first glance quickly turns into a key advantage for many manufacturers. Indeed, embedded is often assigned to the absence of connectivity, and this is by habit. Originally, embedded is referred to as functional systems with a very small technology footprint. These solutions must be functional on small electronic components while offering the best functionality.

It is from this paradigm that the models used to develop these technologies have always taken into account hardware constraints. Before the mass democratisation of microprocessors, many players used microcontrollers in the manufacturing process. These components, which have reduced hardware capabilities, have not hindered the integration of embedded voice technologies. At the present time of R&D, it is possible to run complex speech recognition or speech synthesis engines with minimal use of CPU and RAM. This capability is referred to as “seamless” integration, meaning that it alters the system in which the modules are implemented very little.


Embedded technology helps control deployment costs


Integrating voice functionality into a product or service involves not only the design and development of the solution, but also the budgeting. On the business model side, Cloud technology providers generally operate on a cost-per-request model (each time a call is made to the server, a charge is made in the order of a few cents) or on a subscription basis (more rare, or only for finite solutions). Main problem: how to measure the monthly budget needed when the quantity of requests made is uncontrollable? Today, ensuring the viability of a voice project is just as important as the design of the solution itself. Embedded systems, by their very nature, are likely to be distributed according to business models that are much more controllable by companies. Indeed, the following models are generally found:

  • Subscription: Periodic cost associated with the commercial operation of the embedded voice technology. This method allows the cost of the technology to be incorporated into the cost of the product.
  • Perpetual: As the embedded technology is intended to remain in the hardware, it can be sold as a finished product, as a “one-shot”. In parallel, a royalty system is established for the production of new units equipped with voice solutions.

This makes it much easier for an organisation to have visibility over its cost management. Before being an experience and an ergonomic tool, the integration of embedded voice technology must be a profitable project for the organisation undertaking it.


Who is Embedded Speech Technology for?

Embedded voice technologies can be adapted to all uses


It is true that embedded technology is, by its very nature, geared towards certain areas that have constraints to which it responds. However, any sector can take full advantage of it. Indeed, the contributions of embedded technology, particularly in terms of private data and cost control, to name but two advantages, are perceptible at all levels.

Today, even pro-Cloud players such as the GAFAMs are turning to embedded technology for certain functions. Indeed, to the detriment of personal data, the main resource of their business model, these technological giants are finding in embedded technologies more suitable than the Cloud for navigation tasks for example. The boundary between Cloud and embedded is no longer as present as it was in the past, and the agility of the technologies brings these two methods together to create high-performance solutions.

Among the users of embedded speech technology, we find companies that decide to develop their own solution with their own internal resources and experience. However, it is not (yet) within the reach of all players to create a complete voice interface. This is why design offices such as Witekio are becoming more competent in this type of development to support complex projects to be integrated into richer systems.


Feedback: Witekio, embedded specialist


Witekio is an expert in embedded software with a system-level approach to engineering and integrating intelligent systems software for any device, from hardware to the cloud. The company provides technical expertise to a wide range of customers to develop solutions that fit the uses of specific target markets while integrating the best available technologies.
With the increasing number of projects incorporating embedded voice technologies, Witekio uses the Voice Development Kit, Vivoka’s specialised voice software development kit, to meet the demand.

In learning the Vivoka development tool, Witekio experimented with different use cases, either from their own experience or from customer requests. In this context, a voice-assisted crane as well as a vending machine and a coffee machine were created. These devices, which are very different in their composition, all integrate automatic speech recognition and speech synthesis engines. As mentioned above, the resources available for each are very different, from the NXP Imxm8 microprocessor to the electronic cards embedded in the devices natively.


(Click on the image to access the Witekio case study)


Which devices integrate Embedded Speech Technology? (Imxm8, STMicroelectronics …)


Embedded voice exists through the presence of suitable electronic components. Many players today design and produce versatile microprocessors capable of powering all types of embedded systems. Among these silicon vendors (component manufacturers), we find companies such as NXP (Imxm8, which we mentioned earlier), STMicroelectronics or Texas Instruments, for example.


These components are increasingly integrated into the various devices produced, making embedded voice integration accessible. Indeed, microprocessors offer good performance with powerful CPUs and associated storage capacities large enough to host complex voice engines. Added to this are the optimisation capabilities offered by the technologies themselves. For example, the voice synthesizers offered in the Voice Development Kit are divided into 4 qualities (compact, high, pro and premium) ranging from 10MB of storage (6MB of RAM) to 558MB required (198MB of RAM). It is therefore easy today to find a compromise between technical rigour and quality of results.

What also needs to be taken into account is that embedded voice technologies do not only fit into powerful microprocessors. Any embedded system, whether from an industrial or custom manufacturer, can be used. There will of course be specification constraints, just as the presence of an on-board or remote microphone will be a necessary element for the use of voice commands.


Is it possible to mix Embedded Speech Technology and the cloud?


Embedded technology has many advantages when it comes to voice technologies. Privacy, low resource requirements, cost control… However, it is not necessarily adapted to all use cases, nor all objectives. It must be borne in mind that an embedded system is designed to operate independently. It does not communicate with any other device.
Embedded voice solutions are powerful tools provided they are specialised in a specific field, with known, defined and anticipated users and uses. This is why the hybrid method is often studied in client projects.

This architecture consists of deploying technology engines that operate on-board, on-device, while maintaining connectivity with suitable devices. This connectivity can be a remote link with a Cloud server or an On-Premise design, i.e. with a local server.

However, the use of the Cloud will corrupt the Privacy by Design aspect of the system. This is not to say that privacy in general is undermined, anonymisation of data with well thought out protocols keeps this argument paramount. It is for this sacrifice that the system can finally have access to the Cloud and its many advantages:


Accessing external services


Cloud connectivity allows access to APIs of many external services. These gateways can drastically increase the scope of a voice system by providing it with multimedia databases or third-party applications. For example, being able to consult online encyclopaedias such as Wikipedia or use streaming services (youtube, spotify etc.) are uses reserved for connected solutions.


Enabling data analysis


In contrast to 100% embedded operation, Cloud technologies use and circulate data between different services. In this network, data can be collected, archived and labelled for several reasons. First of all, it can be used for analysis to evaluate the service and help make decisions about improvements and corrections. The data can also be used to train the speech recognition models in a philosophy of continuous improvement of the service.


Remote update and maintenance management


Just as data can be sent to remote servers for processing, it can be sent directly to the connected system. In this way, it is possible for the company owning the product or service to make updates or patches remotely and on a large number of units. Predictive maintenance processes should also be taken into account. Indeed, being able to remotely control the life cycle of a product allows companies to ensure its durability, which in turn contributes to improving the user experience.


Ensuring wide accessibility to the service


Cloud technologies, because of their connectivity, are easily adaptable to the user. In the case of voice assistants, speech recognition engines can easily switch from one language to another depending on the user. In the case of embedded devices, this is something that needs to be anticipated as the solution is more difficult to adapt. Accessibility, especially in technology, is a key factor today. The cloud allows for high responsiveness and control over huge amounts of units, which makes it attractive to many companies.


What are the alternatives to embedded solutions?


A hybrid construction, between cloud and embedded


Embedded is an incredibly powerful and practical tool in many contexts, provided that specialisation is respected, but it also has shortcomings that the Cloud transgresses. It is in this paradigm that hybrid solutions are emerging. They make it possible to get the best of both worlds. On the one hand, on-device voice engines that operate independently, according to the use cases planned and proposed to users for the agreed purpose.

On the other hand, access to external services, data communication and remote interaction possibilities with cloud servers. Basically, the marriage of the two types of technologies is an effective solution to compensate for their respective shortcomings.


On-premise installation with the use of local servers


Beyond the cloud, there are also local server-based operations. This solution comes close to the virtue of the embedded, because the data from the users does not navigate on large-scale systems but on smaller network infrastructures, at the scale of a business unit. The local server can also be equipped with wider cloud connectivity, allowing for different communication methods between the devices and the local server, while still enjoying the benefits of the internet.

This on-premise operation is very versatile and will suit many companies that require certain specifications in their usage.


Conclusion on Embedded Speech Technology


Embedded voice technologies are a more than viable solution. It is a type of technology that is very suitable for certain use cases that require features that embedded technology can provide. These may include system independence, privacy or control of deployment costs for example. However, there is no real answer between cloud and embedded. It is up to you, software editor, electronics manufacturer or any other professional, to define what best suits your needs. These technologies are not intended to be in conflict with each other, but to complement each other and cover the expectations of companies wishing to integrate voice into their processes.

For successful development and integration, we recommend using specialised tools for which you can get support from their creators. Vivoka offers the Voice Development Kit to develop embedded voice technologies via a powerful SDK with documentation to facilitate its use. For more complex projects where your resources are lacking, integrators like Witekio are the right people to assist you.