Embedded vs. Cloud voice technology? You decide.

Embedded voice technology, or at least the term embed, may be familiar to you. Where the tech world touts connectivity for its many benefits, voice technologies that run on-device, without server calls, are also popular for many reasons. In this article, we break down what embedded voice is, with its contributions, sectors and equipment, but also its possible alternatives and extensions. Often “Less is more”, embedded is the definition!

What is an embedded system?

“Embedded” in the field of voice-activated technology corresponds to the broader definition of the term. These are hardware-based systems (e.g. microprocessors or microcontrollers) designed for specific tasks. You can consider them as independent systems or as links in a larger device. Embedded echoes the housing of the technology, in this case directly in the device.

The operating principle of embedded systems

Embedded technology differs from cloud technology in its operating principle. In this context, the “host” device carries out all computing processes, based on the technological resources available. The complexity of embedded systems, both in terms of constraints and potential, is highly variable. Indeed, from a simple microcontroller to a complex assembly of microprocessors and several peripherals, the possibilities are quite different. These compositions obviously vary according to the intended purpose of the system.

A principle adapted to embedded voice technology

Voice technologies are not a bad thing for the embedded world. At a time when the cloud is attracting attention for its capabilities, the embedded world is also growing. In this field, we find the majority of existing technologies: ASR, TTS, Wake-up Word, Language Processing, Voice Biometrics etc… All these solutions are developed this time on much lighter models. Remember, the embedded must work on the resources the system is able to provide. However, less resources does not mean less performance. The research world is focusing as much on optimising machine learning models (to reduce training costs, model size and inference times), as on their accuracy. Indeed, with the growing appeal of embedded technologies and the integration of machine learning into mobile devices, performance is becoming as important as the result.

Why choose Embedded Voice Technology?


Embedded is Private by design

Offline voice technology offers the on-device operating principle which gives it a key feature in the tech world today: Privacy by Design. What does this mean? The notion of conception echoes design, as embedded systems are imagined and built on the principle of processing processes locally. That makes them naturally immutable to the issue of private data. It is simple: data does not circulate, everything processes locally, without calling on remote servers. This faculty is nowadays very much in demand for several reasons. Firstly, when companies offer products and services to individual end-users, the processing of personal data is all the more sensitive. This personal data is discriminatory in nature and falls within the scope of the privacy of individuals. Today, these considerations challenge a lot Cloud technologies. The GDPR (General Data Protection Regulation) has increased the sensitivity of such data processing. To learn more, you can read our blog post dedicated to this topic. Even with anonymisation or encryption protocols, the data is not totally secure as long as the transfer is possible through connectivity. This is where embedded technology offers great promise to manufacturers. It allows them to completely get rid of these issues and offer total data security. For certain sectors of activity, this is not simply a matter of user comfort but an essential prerequisite. To qualify this, it is important to consider that data is a primary force for companies at different scales. Hybrid models exist, mixing embedded processes with possible connectivity for certain functionalities, which we will discuss later in the article.

Embedded is not dependent on an Internet connection

Directly linked to the previous point on Privacy by Design, the dependence on an internet connection is an additional constraint to take into account. When creating a product or service, it is necessary to ensure that it is accessible to users, given their environment and access to the Internet. Today, it seems unthinkable that the internet is not available to everyone. However, the reality is quite different. On the one hand, the coverage of internet networks is not complete on our planet. For example, Internet access is rare in mountains or deserts. In this case, a cloud-based voice assistant in an off-road vehicle would have great difficulty functioning. In addition to this aspect, the design of products and services often isolate connectivity for protocol reasons. The defence sector, for example, favours embedded systems because the connectivity of tools is a security issue. Embedded voice technology here is getting rid of externalized services on the cloud for its computational processes. This is one of the reasons you may prefer using embedded speech technology in specific applications, where the conditions of use must meet strict specifications. As mentioned earlier, machine learning models today are sufficiently powerful and optimised to offer hundreds of use cases without problems.

Embedded Voice Technology has a small technology footprint

Knowing that they are dependent on the resources the device hosting them provides, embedded technologies need to be optimised. This constraint at first glance quickly turns into a key advantage for many manufacturers. Indeed, we often assigne embedded to the absence of connectivity, and this is by habit. Originally, embedded is referred to as functional systems with a very small technology footprint. These solutions must be functional on small electronic components while offering the best functionality. Starting from here, the models used to develop these technologies have always taken into account hardware constraints. Before the mass democratisation of microprocessors, many players used microcontrollers in the manufacturing process. These components, which have reduced hardware capabilities, have not hindered the integration of embedded voice technologies. At the present time of R&D, it is possible to run complex speech recognition or speech synthesis engines with minimal use of CPU and RAM. This capability is referred to as “seamless” integration, meaning that it alters the system in which the modules are implemented very little.

Embedded voice-first technology helps control deployment costs

Integrating voice functionality into a product or service involves not only the design and development of the solution, but also the budgeting. On the business model side, Cloud technology providers generally operate on a cost-per-request model (each time a call is made to the server, a charge is made in the order of a few cents) or on a subscription basis (more rare, or only for finite solutions). Main problem: how to measure the monthly budget needed when the quantity of requests made is uncontrollable? Today, ensuring the viability of a voice project is just as important as the design of the solution itself. Embedded systems are likely to be distributed according to business models that are much more controllable by companies. Indeed, the following models are generally found:

  • Subscription: Periodic cost associated with the commercial operation of the embedded voice technology. This method allows the incorporation of the cost of the technology into the cost of the product.
  • Perpetual: As the embedded technology is intended to remain in the hardware, it can be sold as a finished product, as a “one-shot”. In parallel, a royalty system is established for the production of new units equipped with the voice technology.

This makes it much easier for an organisation to have visibility over its cost management. Before being an experience and an ergonomic tool, the integration of embedded voice technology must be a profitable project for the organisation undertaking it.

Who is Embedded Voice Technology for?

Embedded speech technologies are adaptable to all uses

It is true that embedded technology has areas of focus that have constraints to which it responds. However, any sector can take full advantage of it. Indeed, the contributions of embedded technology, particularly in terms of private data and cost control, are perceptible at all levels. Today, even pro-Cloud players such as the GAFAMs are turning to embedded technology for certain functions, to the detriment of personal data, the main resource of their business model. Indeed, these technological giants are finding embedded technologies more suitable than the Cloud for navigation tasks for example. The boundary between Cloud and embedded is no longer as present as it was in the past, and the agility of the technologies brings these two methods together to create high-performance solutions. Among the users of embedded speech technology, we find companies that decide to develop their own solution with their own internal resources and experience. However, it is not (yet) within the reach of all players to create a complete voice interface. This is why design offices such as Witekio are becoming more competent in this type of development to support complex projects to integrate into richer systems.

Feedback: Witekio, embedded specialist

Witekio is an expert in embedded software with a system-level approach to engineering and integrating intelligent systems software for any device, from hardware to the cloud. The company provides technical expertise to a wide range of customers to develop solutions that fit the uses of specific target markets while integrating the best available technologies. With the increasing number of projects incorporating embedded voice technologies, Witekio uses the Voice Development Kit, Vivoka’s specialised voice software development kit, to meet the demand. In learning the Vivoka development tool, Witekio experimented with different use cases, either from their own experience or from customer requests. In this context, they created a voice-assisted crane as well as a vending machine and a coffee machine. These devices, which are very different in their composition, all integrate automatic speech recognition and speech synthesis engines. As mentioned above, the resources available for each are very different, from the NXP Imxm8 microprocessor to the electronic cards embedded in the devices natively.

(Click on the image to access the Witekio case study)


Which devices integrate Embedded Speech Technology? (Imxm8, STMicroelectronics …)

Embedded voice exists through the presence of suitable electronic components. Many players today design and produce versatile microprocessors capable of powering all types of embedded systems. Among these silicon vendors (component manufacturers), we find companies such as NXP (Imxm8, which we mentioned earlier), STMicroelectronics or Texas Instruments, for example.   The various devices produced increasingly integrate these components, making embedded voice integration accessible. Indeed, microprocessors offer good performance with powerful CPUs and associated storage capacities large enough to host complex voice engines. In addition to this, technologies themselves offer certain optimisation capabilities. For example, the voice synthesizers in the Voice Development Kit are divided into 4 qualities (compact, high, pro and premium) ranging from 10MB of storage (6MB of RAM) to 558MB required (198MB of RAM). It is therefore easy today to find a compromise between technical rigour and quality of results. What you also need to take into account is that embedded voice technologies do not only fit into powerful microprocessors. Indeed, you can use any embedded system, whether from an industrial or custom manufacturer. There will be specification constraints, just as the presence of an on-board or remote microphone will be a necessary element for the use of voice commands.

Is it possible to mix Embedded Voice Technology and the Cloud?

Embedded voice technology has many advantages when it comes to voice technologies. Privacy, low resource requirements, cost control… However, it is not necessarily adapted to all use cases, nor all objectives. It must be borne in mind that the design of an embedded system allows it to operate independently. It does not communicate with any other device. Embedded voice solutions are powerful tools provided they are specialised in a specific field with known, defined and anticipated users and uses. This is why the hybrid method is often studied in client projects. This architecture consists of deploying technology engines that operate on-board, on-device, while maintaining connectivity with suitable devices. This connectivity can be a remote link with a Cloud server or an On-Premise design, i.e. with a local server. However, the use of the Cloud will corrupt the Privacy by Design aspect of the system. This is not to say that privacy in general is undermined. Anonymisation of data with well thought out protocols keeps this argument paramount. It is for this sacrifice that the system can finally have access to the Cloud and its many advantages:

Accessing external services

Cloud connectivity allows access to APIs of many external services. These gateways can drastically increase the scope of a voice system by providing it with multimedia databases or third-party applications. For example, being able to consult online encyclopaedias such as Wikipedia or use streaming services (youtube, spotify etc.) are uses reserved for connected solutions.

Enabling data analysis

In contrast to 100% embedded operation, Cloud technologies use and circulate data between different services. In this network, data can be collected, archived and labelled for several reasons. First of all, decision-makers can use it for analysis to evaluate the service and make improvements and corrections. They can also use the data to train the speech recognition models in a philosophy of continuous improvement of the service.

Remote update and maintenance management

Just as data can be sent to remote servers for processing, it can be sent directly to the connected system. In this way, it is possible for the company owning the product or service to make updates or patches remotely and on a large number of units. You should also consider predictive maintenance processes. Indeed, being able to remotely control the life cycle of a product allows companies to ensure its durability, which contributes to improving the user experience.

Ensuring wide accessibility to the service

Because of their connectivity, Cloud technologies easily adapt to the user. In the case of voice assistants, speech recognition engines can easily switch from one language to another depending on the user. But you will need to anticipate in the case of embedded devices, as the solution is more difficult to adapt. Accessibility, especially in technology, is a key factor today. The cloud allows for high responsiveness and control over huge amounts of units, which makes it attractive to many companies.

What are the alternatives to embedded solutions?


A hybrid construction, between cloud and embedded

Embedded is an incredibly powerful and practical tool in many contexts provided that specialisation is respected. But it also has shortcomings that the Cloud transgresses. It is in this paradigm that hybrid solutions are emerging. They make it possible to get the best of both worlds:

  • On the one hand, on-device voice engines that operate independently, according to the use cases planned and proposed to users for the agreed purpose;
  • On the other hand, access to external services, data communication and remote interaction possibilities with cloud servers.

Basically, the marriage of the two types of technologies is an effective solution to compensate for their respective shortcomings.

On-premise installation with the use of local servers

Beyond the cloud, there are also local server-based operations. This solution comes close to the virtue of the embedded. Indeed, the data from the users does not navigate on large-scale systems but on smaller network infrastructures, at the scale of a business unit. You can also equip the local server with wider cloud connectivity. It allows for different communication methods between the devices and the local server, while still enjoying the benefits of the internet. This on-premise operation is very versatile and will suit many companies that require certain specifications in their usage.


Conclusion on Embedded Voice Technology

Embedded voice technologies are a more than viable solution. It is very suitable for certain use cases which require features that embedded technology can provide. These may include system independence, privacy or control of deployment costs for example. However, there is no real answer between cloud and embedded. It is up to you, software editor, electronics manufacturer or any other professional, to define what best suits your needs. These technologies do not intend to be in conflict with each other, but to complement each other and cover the expectations of companies wishing to integrate voice into their processes. For successful development and integration, we recommend using specialised tools for which you can get support from their creators. Vivoka offers the Voice Development Kit to develop embedded voice technologies via a powerful SDK with documentation to facilitate its use. For more complex projects where your resources are lacking, integrators like Witekio are the right people to assist you.

For developers, by developers

Try our voice solutions now


Sign up first on the Console

Before integrating with VDK, test our online playground: Vivoka Console.


Develop and test your use cases

Design, create and try all of your features.


Submit your project

Share your project and talk about it with our expert for real integration.

It's always the right time to learn more about voice technologies and their applications