Attention, Please! How ChatGPT uses the power of attention
Exploring AI Attention and its similarities and differences with Human Attention
Attention is a critical ability for living creatures. It is through attention, that we learn from and respond to the environment around us. Without the ability to pay attention, we would be rendered dysfunctional for all intents and purposes.
But what if I told you, that the ability to pay attention is what makes AI models such as GPT-4 (the model that powers ChatGPT) and Lambda (the model that powers Google Bard) so incredibly powerful and efficient at understanding your query and generating answers? Yes, you read it right. Large Language Models (LLMs) have the ability to pay attention!
In fact, the ability of LLMs, to pay attention to the syntax and semantics of textual data is way more superior than that of human beings. That’s the reason why they are so insanely good at understanding us as well as generating amazing (often mind-boggling) responses to our queries.
In this article, we'll explore the fascinating world of AI attention and how it compares to human attention. We'll cover topics such as how machine learning models pay attention, the differences, similarities, benefits and limitations of AI and human attention, and what the future of AI attention might look like. Whether you're a seasoned expert in AI or just getting started, this article will provide valuable insights into one of the most important cognitive functions of the modern age. Let's dive in.
Understanding Human attention
Human attention is a complex cognitive process that allows us to focus our mental resources on specific information or stimuli while filtering out distractions. Attention plays a critical role in virtually every aspect of human cognition, including perception, memory, learning, decision-making, and problem-solving.
There are different types of attention that humans can use, depending on the task at hand. One of the most common types is selective attention, which allows us to focus on a specific stimulus while ignoring irrelevant ones. For example, if you're trying to read a book in a noisy coffee shop, selective attention allows you to tune out the background noise and concentrate on the text.
Another type of attention is sustained attention, which allows us to maintain focus on a task over a prolonged period of time. This is important for activities that require prolonged mental effort, such as studying for an exam or writing a lengthy report.
Divided attention is a third type of attention, which involves the ability to focus on multiple stimuli at the same time. Also known as task switching, this type of attention is important for multitasking, such as driving a car or listening to a lecture while taking notes.
Human attention is not a single process, but rather a combination of different neural systems that work together to allow us to focus our mental resources on specific information. Some of the key brain regions involved in attention include the prefrontal cortex, parietal cortex, and thalamus.
The prefrontal cortex is involved in the executive control of attention, which includes processes such as goal setting, planning, and monitoring. The parietal cortex is involved in spatial attention, which allows us to focus on stimuli in a specific location in space. The thalamus acts as a relay station for sensory information, sending it to different parts of the brain for further processing.
Attention is a limited resource, meaning that we can only focus our mental resources on a limited amount of information at any given time. This is known as the attentional bottleneck. For example, if you're trying to listen to two people talking to you at the same time, you'll likely find it difficult to understand either of them because your attentional resources are divided between the two stimuli.
Understanding AI attention
Explaining attention or self-attention as it is referred to in the context of Large Language Models (LLMs) can be quite complex. I would like to simplify this for you by using the least amount of technical jargon possible. The best way to do that is by using a real example.
So I asked ChatGPT (the chatbot that uses GPT-3 and GPT-4 models) the following question and here’s what it replied:
There are two stages in which ChatGPT responds to this query. The first is the natural language understanding phase, where the bot tries to understand what’s being asked? The second phase is the natural language response in which the bot generates an optimal response to the query.
The new age LLMs use a deep learning algorithm known as transformer to achieve this. The transformer has two major parts, i.e., encoder and decoder. The encoder understands and analysis each piece of text to generate a possible set of responses. And the decoder chooses which responses to combine together to form the most optimal response.
The first phase requires a complex set of tasks which entails tokenisation of each word of the user input and applying a set of algorithms to generate possible responses.
So to understand this in the context of our example, ChatGPT takes my query, “What is the capital of Japan? Is it beautiful?" and breaks it down into individual tokens, i.e., ‘What’, ‘is', ‘the’, ‘capital'… and so on.
It then takes each of these tokens and converts them into what are known as word embeddings. Word embeddings are a vector representation of each word that captures its meaning in the context of the sentence. Each word embedding may contain hundreds or even thousands of possible representations that may try to convey the meaning of the word in context to the given input.
Now comes the fun part, the self-attention that we have been talking about all through this article.
For each word in the sequence, the self-attention mechanism calculates a score for every other word in the sequence, including itself. These scores represent how relevant each word is to the current word, given the context of the sentence. This is known as self attention calculation. For example, in our input, the word ‘capital’ is more relevant to the word ‘Japan’ rather than ‘beautiful’.
Using the scores calculated by the self-attention mechanism, the encoder calculates a weighted sum of the embeddings for each word in the sequence. The weights are proportional to the relevance scores, so words that are more relevant to the current word will have a higher weight. This produces a set of context-aware representations of each word in the sequence.
Finally, we come to the natural language response stage where the model generates a response to our query. The decoder takes the context-aware representations generated by the encoder and generates the output sequence, one word at a time. The decoder uses the attention mechanism to focus on different parts of the input sequence as it generates each word of the output sequence.
Here’s how you can visualise this process:
→ The capital of Japan + [word with the most optimal representation] = is
→ The capital of Japan is + [word with the most optimal representation] = Tokyo.
→ The capital of Japan is Tokyo. + [word with the most optimal representation] = And yes,
→ The capital of Japan is Tokyo. And yes, + [word with the most optimal representation] = many people
→ The capital of Japan is Tokyo. And yes, many people + [word with the most optimal representation] = ………..
…………………………………………………………
→ [FINAL OUTPUT]: The capital of Japan is Tokyo. And yes, many people consider Tokyo to be a beautiful city with a lot to offer in terms of culture, cuisine, and entertainment.
Similarities between Human and AI attention
When you analyse how LLMs work, you discover some striking similarities between certain aspects of AI attention and Human attention. That makes sense. After all, AI is designed by humans, so it is only logical that some of its characteristics in context of attention are going to be similar to that of human beings. Listed below are the top four similarities between Human attention and AI attention.
Selecting relevant information: The query that we used in the example above was rather simplistic. But as you already may be aware, ChatGPT and Google Bard have the ability to handle much more complex queries which involve writing complex essays or computer code. As humans, we are able to respond to complex queries using our ability to selectively focus on certain aspects of the query and derive relative information. That is exactly how AI does it. It focuses on selective aspects of the query and retrieves information that serve those aspects.
Contextual knowledge: A lot of our discussions as humans are based on contextual knowledge. For instance, if you are speaking with your friend who lives in the same city as you, you have a preset context in this discussion. For one, you are friends so you already know a lot of information about each other. Additionally, you live in the same city, so you both share that information as well. LLMs, do the same thing, albeit in a limited manner. The answers that ChatGPT or Google Bard provide are based on the context of the discussion that you have been having in the thread.
Using feedback loops: Both humans and AI tend to use feedback loops for adjusting their focus of attention. If I asked you a question such as ‘what is the best Pasta?’ You might give me a response like, ‘bolognese’. But then if I said, ‘no, I was asking what is the best place to get Pasta in your city’, you will immediately change your focus on the basis of my feedback and recommend me a restaurant you are aware of. This is exactly how large language models adjust their attention. It is based on the feedback the user is providing them. This can be both in the form of text as well as more objective feedback such as a rating system (you may have noticed the thumbs up and thumbs down buttons next to every response in ChatGPT or Bard).
Improvement with practice and exposure: The more things we pay attention to, the more things we learn. As humans, that’s how we grow. We practice different skills and gain exposure to different forms of knowledge. AI is no different in this regard. The more conversations you have, the better the LLMs get. The more topics and subjects they get exposure to, the better their understanding gets of the user’s queries.
Differences between Human vs AI attention
There are similarities in how Humans and AI pay attention. But if you have read the sections above, by now you realise that the mechanism and architecture of AI attention is fundamentally different from Human attention. This leads to many disparities and differences between the abilities of humans and AI. The following are the top five in my opinion.
Processing speed: One of the main differences between AI attention and human attention is the processing speed. AI can process large amounts of information at a much faster rate than humans. LLMs can analyse and process millions of words per second, while humans are limited to a much slower rate.
Consistency and Focus: Another difference is that AI attention is more consistent and focused than human attention. Humans have limited attention spans and can get distracted easily, which can impact their ability to concentrate and focus on a task. On the other hand, an AI can maintain a consistent level of attention and focus on the task at hand for an extended period of time.
Ability to handle complex information: LLMs can handle complex information and make connections between different pieces of information with ease. They can also identify patterns and make predictions based on these patterns. Humans, on the other hand, may struggle to make sense of complex information and may need more time and effort to connect different pieces of information.
Limited understanding of context: While LLMs can understand the context of words and sentences to some extent, their understanding is still limited compared to humans. Humans can understand the meaning of words and sentences based on their knowledge of the world, culture, and language. They can also pick up on subtle cues like sarcasm or irony, which LLMs may struggle with.
Creativity: Humans have a greater capacity for creativity and imagination compared to AI. While LLMs can generate text that appears to be creative, they are still limited by the data they have been trained on and can only generate responses based on this data. Humans, on the other hand, can generate original and novel ideas based on their experiences and imagination.
The future of AI attention
The future of self-attention in AI language models looks promising as it has been effective in capturing complex relationships between different parts of the input. Self-attention is particularly effective for tasks like language modelling, where the model needs to capture complex relationships between different parts of the input.
With the advent of GPT-4, it has become quite evident that the future lies in multi-modal AI models which combine text with other forms of data such as images, videos and audios. The self attention mechanism will continue to be the backbone of such models, providing coherence and comprehension across different mediums of data.
We are soon to see major improvements in the evolution of AI attention with new capabilities being added every day. And although 2023 already feels like the ‘decade of AI’, we are only getting started. If you are an AI researcher, I highly recommend focussing on the development of AI attention. If done in an ethical and regulated manner, AI attention can prove to be the most useful invention for mankind yet.