Natural Language Understanding: What You Need to Know Before Diving in

Written by Vivoka

Discover | Goal | Latest | Natural Language Understanding

The future of Warehousing: Voice Directed Warehouse Operations

Embedded vs. Cloud voice technology? You decide.

Toward a “voice first” world using voice control?

Natural Language Understanding Intuition

NLU engines use computer software to understand the meaning of human language in text or speech format. Usually people start appreciating an NLU engine only when it reaches a similar level of understanding to that of human. Generally, what makes building NLU complex, is the difficulty to make a software imitating human behavior and thinking as human.

Human language

Human language is said to be around 150,000 years old. On one hand, during all these millennia, human understanding (or perception) skills have never stopped evolving. On the other hand, human communication skills also continue evolving at the same time, depending on cultural, geographic, political, environmental and many other factors. Here the understanding and communication skills should evolve synchronously which makes people “updated” or “modern” in their speaking and understanding behavior.

Many questions regarding Natural Language Understanding are essential to understand how it works:

“How can a computer acquire the speaking and language understanding skills of a human?”
“How can we adapt it to a format that a computer can understand?”
“How can NLU handle the evolution of human communication?”

Human–Machine Analogy

There are three main theories on how humans build their understanding skills and acquire their knowledge:

The learning theory considers that language is acquired through reinforcement. Just like conditioning for example.
The nativist theory assumes that language is something we are born to learn. It supposes we have a “language acquisition device” in our brain and believes that all languages share universal basic elements such as grammar, nouns or verbs.
Then, researchers started to suggest that, instead of having a language-specific mechanism for language processing, language development is influenced by different factors of genetics AND the environment. These different factors need to interact in order for a child to learn how to speak properly.

All of these theories agree with the idea that “Understanding” is something that humans learn by training and need to practice through communications: that means they need examples. In this context, the quality of learning and the quality of examples become crucial. This has inspired NLU: Nowaday, technological progress allows industry (and research) to reproduce the same process on computers, especially thanks to Machine Learning (ML). This is because ML is based on “Neural Networks”, “Learning” and “Examples” concepts. According to the state of the art, the most performant NLU are based on ML.

The Rise of Natural Language Understanding

Research on natural language understanding started in the 60’s, a decade that saw the release of Weizenbaum’s ELIZA, a chatbot that attempted to incorporate a NLU component. ELIZA had very limited use cases, and finally this project was abandoned due to the lack of data at that time, in addition to the computing complexity and the need for strong hardware.

The last decade, with the rise of GPUs, Datacenters and the ubiquity of available numerical data, ML has been more and more popular. This allowed NLU to achieve human-like performance. That is why conversational agents like Siri and Alexa, and more recently ChatGPT, have been democratized.

Fundamental of Natural Language Understanding

NLU can be summarized in two tasks: Intent Identification, and Entity Recognition.

Intent

Intent is the general meaning that could be “concluded” from a given sentence. It can be explicitly expressed as in “Turn on the lights”, where the requested intent is “Turn-on”, or implicitly like in “I can’t see anything” expressing the same intent, but differently. Let’s assume the reference sentence of this intent is the sentence “Turn on the lights”, and the given six variations as shown in Table 1 in a smart-home context.

Intent	Reference	Variation	entities
Turn_on	Turn-on the lights	1.Turn the lights on	Lights: device
		2.Please turn-on the lights	Lights: device
		3. Put on the lamps	Lamps: device
		4. I can’t see anything
		5. It very dark here
		6. Let there be light

Table 1. Illustration of an example of one intent, its seven examples (one reference plus six variations) and entities.

Variation and Similarity

Table 1 shows that the reference example and the Variation (1) share the same words and (mostly) the same syntactic structure. This means that they are “close”: They deal with the same topic – their intents are “similar”. This is also the case with Variation (2) since the “optional” word “Please” does not change the meaning. The intent is the same with or without “Please”. Usually, similar sentences have the same intent. The more sentences share words, the closer they are.

When speaking naturally, people can express the same intent but with different words. This depends on the vocabulary that they use and which can change depending on their culture, age, language skills, etc. For example in Table 1, variation (3) has the same intent as the reference even if they do not share the same words. This is explained by the similarity of the used words: synonyms. Here, the NLU engine, like humans do, should be capable of guessing that “Put on” is a synonym of “Turn on” and “the lamps” is a synonym of “the lights”. Synonyms provide a variety of expressions of the same intent, and allows people to communicate naturally. The more sentences share synonyms, the closer they are.

Another one common thing between Variations (1), (2) and (3) is that they explicitly express the intent: the used verbs “Turn-on” and “Put-on” illustrate the wanted intent, and the words “lights” and “lamps” target the wanted device. Here we can think that the explicitness/clarity/directness might be represented as a score: a highest score means a highest explicitness.

However, traditionally in NLU the bibliography is more interested in cases with very low explicitness: the “ambiguous” cases. For example, the word “passage” in “I will look at the passage”. Does “the passage” mean “a section in a book” or “a channel”? Sometimes, it turns out that people can express intents in this ambiguous way. For example, in Table 1, Variations (4), (5) and (6) do not share the same words or synonyms with the reference. Here, the context becomes essential for understanding not only the meaning of the word, but also of the whole sentence and its intent. Variation (6) would be interpreted differently in another context.

Entity

Another crucial concept for NLU is entities: These are words that are targeted by the intent in the concerned sentence. For example, “lights” in Variation (1), “light” in Variation (6), or “lamps” in Variation (3) are targeted as “devices” by the intent. Synonyms are considered as the same entity. The intent name is a user/personal choice. In addition, there are standard entities:

Named Entities: These are categories such as names, companies, locations, brands, etc.;
Numeric Entities: Includes numbers, currencies, percentages;
Dates: Includes dates.

Challenges and biases

Even if NLU isn’t a global challenge anymore as researchers have unlocked its “secrets”, there are still challenges and biases that need to be handled. On one hand, language is complex: People can understand subtlety, and metaphors for example. On the other hand, people can use uncommon words, or foreign words that may introduce biases for NLU, especially regarding Entities.

Final thoughts on Natural Language Understanding

You get it, Natural Language Understanding is an impressive technology which is revolutionizing conversational AI. But, it can’t be used by itself. Indeed, NLU is a technology which only processes text data. Therefore, in order to match Cloud understanding abilities while staying on the Device, you will need to leverage an automatic speech recognition (ASR) system in addition to the language understanding (NLU). Then, your system will be able to comply with any voice commands, whatever the way your users want to speak.

Finally, the most performant NLU today are greedy in terms of data preparation and memory consumption. This makes most of NLU engines today restricted to the Cloud and excluded from embedded system architectures…until Vivoka challenged this….

This article was written in collaboration with PhD Firas Hmida from Vivoka’s R&D department, expert in Machine Learning and Natural Language Processing.

C'est toujours le bon moment pour en apprendre plus sur les applications de la technologie vocale

Découvrir le contenu