Since its launch in November 2022, ChatGPT has become a hot topic and has taken up more and more space in the media sphere. More domains are integrating Large Language Models (LLM) as part of their services. However, the question of whether AI’s expanding presence in our lives is beneficial or detrimental remains a subject of debate. Today, the trends show that most people are excited to reap the benefits of LLMs and conversational AI. But what exactly are large language models? How are they created and why are they so powerful? You’ll find the answers to these questions in this blog post!
Before diving into the world of LLMs, it is important to understand the intuition behind them and their limitations. This article aims to equip you with the knowledge you need to understand the technology and shed some light on the (in-)consistencies of ChatGPT.
This article was written in collaboration with our R&D team members: Firas Hmida, PhD Machine Learning & NLP and Nora Lindvall, NLP master student.
Language Model Intuition
Before answering the question “does ChatGPT really rely on NLU?”, let’s see how LMs work and the intuition behind their technology.
If you ask people if “It is raining cats and mice” sounds natural, most answers would be “no, it should be it is raining cats and dogs“. It is partially right. From an NLP point of view, this word sequence is not frequent/probable. More precisely, humans speak using words that “often” occur together in a well-defined order. Even if we provide the first part of the sentence “it is raining”, a native English speaker tends to produce “cats and dogs” because it is a frequent word combination in English. This is how Language Models (LMs) work.
Current LMs are Neural Networks that have been trained on texts that humans produced (so considered “ground truth”). The mechanism we use during the training process aims to teach the language model to guess the next word for a given set of words. For example, the model takes as input “Once upon a [blank]” and should fill the [blank] with “time” but not with “region”. We tend to talk about language models in terms of probability. A “natural” (humanly acceptable) utterance is a sequence of words with high probability.
At this level (and to simplify things), during the training pipeline, the language model can predict/guess all the possible words. You usually rank them depending on their probability of occurring in a given utterance. That means that the LM should handle all words in a given language. The more words a language includes, the larger the LM.
Large Language Models
We call them “Large” Language Models (LLMs) because these models have a high memory and size requirement. They reach several gigabytes, due to the inclusion of billions of parameters. Parameters can be thought of as adjustable settings that allow the model to learn. With more parameters, the model can grasp more complex concepts. LLMs like GPT3, GPT-4, and ChatGPT, which are used in production, rely on numerous supercomputers running on data center servers.
Over time, these models are trained on massive datasets, leading to their continuous growth in size and significant increase in power. Due to the large amount of data that they use during training, LLMs can perform a wide range of tasks with limited or no human guidance. It includes essay writing, answering questions about science and technology, summarizing documents and even coding. However, their fundamental purpose is to predict the next word in a sentence, similar to the autocomplete feature when composing an email.
Why are LLMs so Powerful?
If LLMs, like ChatGPT and others, just perform predicting which word will come next for a given sentence, it is crucial to understand that this constitutes a highly specialized “reasoning” or “thinking” from a human point of view – only one way of thinking.
In fact, the initial concept of language models was introduced by Claude Shannon in the 50’s. What is new today is the rise of computing that can be reached thanks to data center servers, and their combination with Machine Learning Algorithms.
So why are they so strong?
There are two essential components that contribute to the success of these models:
- The first aspect involves their ability to blend word contexts in a manner that greatly enhances its proficiency in predicting the next word;
- The other component of this key factor lies in the training methodology. Large Language Models undergo training using massive quantities of data gathered from various online sources. These sources encompass books, blogs, news websites, Wikipedia articles, discussions on platforms, and conversations from social media.
Throughout training, we provide a bench of text sourced from one of these platforms and tell the model to predict the next word. If the model’s prediction is incorrect, we make slight adjustments to the model until it produces the correct answer. When considering the objective of training an LLM, it aims to generate text that could feasibly have been found on the internet. Since it cannot memorize the entirety of the internet, it relies on encoded representations to make compromises. This may occasionally result in slight inaccuracies, albeit hopefully not significant ones.
Will ChatGPT take over?
Don’t let the human-like interaction fool you – ChatGPT may seem to have a life of its own, but it is just an illusion. Behind the scenes, it simply generates output based on its database of texts written by humans. It predicts the next word based on extensive context. It is by no means conscious or has its own will. Contrary to what we can see in movies, there is no need to worry about ChatGPT suddenly turning against humanity, seeking world domination. Dry as it may sound, it is just a model spitting out predictions. One last thing: if you want to know the opinion of Yann LeCun, one of the pioneering researchers behind deep learning, we suggest you check this interview out.