Can GPT-4 Pass the Turing Test?
Discussing the limitations of the Turing Test and How it could be Enhanced to Test Contemporary AI Tools
I have a fun quiz to start off this article with. I am assuming you all know the famous science communicator and media personality Neil DeGrasse Tyson right?
If you don’t, then you can always google him, or not, either way, it doesn’t matter to this quiz. All you need to know about him is that he is an Astrophysicist and a famous science communicator, similar to Bill Nye and others.
Quiz
So here’s the quiz:
I have shared below, two excerpts. One is from an Essay written by Neil DeGrasse Tyson titled “On Being Round”. The other is from GPT-4 writing in the style of Neil. Can you identify which one is AI generated?
Excerpt 1
Apart from crystals and broken rocks, not much else in the cosmos naturally comes with sharp angles. While many objects have peculiar shapes, the list of round things is practically endless and ranges from simple soap bubbles to the entire observable universe. Spheres tend to take shape from the action of simple physical laws. So prevalent is this tendency that often we assume something is spherical in a mental experiment just to glean basic insight even when we know that an object is decidedly non-spherical. In short, if you do not understand the spherical case, then you cannot claim to understand the basic physics of the object.
Excerpt 2
As I ponder the vast expanse of the cosmos, it strikes me that the elegance of gravity, which crafts stars and planets into perfect spheres, tells only part of the story when we venture beyond these familiar shapes. Here I am, gazing into the depths of the universe, where galaxies stretch out in pizza-shaped spirals millions of light-years across, teeming with stars, gas, and cosmic dust, while potato-shaped moons orbit their planets with a peculiar defiance of gravity's spherical rule. This cosmic diversity beckons me to explore the interplay of forces—gravity, rotational dynamics, and the chaotic ballet of cosmic collisions—that sculpt the universe's myriad forms. Through this lens, I embark on a journey to understand how the simplicity of spheres coexists with the complexity of nature's boundless imagination, revealing a universe far more intricate and fascinating than meets the eye.
If your answer was Excerpt 1, then sadly, you are wrong. This is the introductory paragraph from Dr. Tyson’s 1997 essay written for the Natural History Magazine. In this essay, he discussed the following question:
Thanks to gravity, stars and planets are spherical. So why are there pizza-shaped galaxies and potato-shaped moons?
If your answer was Excerpt 2, then you are right! This is an AI generated paragraph by GPT-4, written for the following prompt:
GPT-4 Prompt
In the style of Neil DeGrass Tyson, write a one paragraph introduction to an essay on the following question: "Thanks to gravity, stars and planets are spherical. So why are there pizza-shaped galaxies and potato-shaped moons?"
The Turing Test
I am sure you are all familiar with the all popular Turing Test designed by Allan Turing in 1950 in his famous paper, “Computing Machinery and Intelligence”.
In this paper, Turing suggested a hypothetical game called the “Imitation Game”. It involves three participants—a man (A), a woman (B), and an interrogator (C), who is in a separate room from the other two. The interrogator's task is to determine which of the two, the man or the woman, is the woman. Both A and B try to convince C to make a correct identification. Turing then suggests replacing the man (A) with a machine and asks, "What will happen when a machine takes the part of A in this game?".
In this revised game, the interrogator's job is to determine which participant is the human and which is the machine. The interrogator communicates with both A (the machine) and B (the human) through a text-based medium to prevent the interrogator from deriving clues from their voices.
Turing proposed that if the interrogator could not reliably tell the machine from the human, the machine could be said to have passed the test. In other words, if the machine could mimic human responses well enough that a human judge could not distinguish between them based on their responses alone, it could be considered to exhibit intelligent behaviour.
The Missing Element
I believe there is a critical aspect that Allan Turing did not take into account when designing the test.
Turing assumes that the human and the machine have had no communication prior to the test and have no context of each other whatsoever.
This is not by any means a takedown of the Turing test. It was, for its time, an incredibly smart way of assessing the mimetic abilities of Artificial Intelligence. However, fast-forward to 2024, I believe the aspect that I mentioned above plays a critical role if we are to actually conduct a proper Turing test today.
As per December of 2023, ChatGPT has over 180 million users of which 100 million are weekly active users of the platform. That means that more and more people in the world are becoming context aware of AI chatbots. Humans are incredibly good at pattern recognition. So once a person becomes used to using a Chatbot, it becomes extremely difficult for the bot to camouflage. We start to recognise its patterns and can easily discern between, say, a YouTube script written by a tool like ChatGPT vs a Human written YouTube script.
As someone who actively uses ChatGPT, Bard and other such AI technologies, it has become increasingly easy for me to recognise AI generated YouTube scripts, articles, social media posts, and emails.
In other words, I am context aware. So if you put me into a Turing test today with GPT-4 or the likes on the other room, the probability of me identifying the AI massively improves.
Enhanced Turing Test
The quiz that I shared in the beginning of this article was my attempt at designing, roughly, what could be an enhanced version of the Turing Test. It takes into account, the following elements:
The Interrogator (you) is (most likely) context aware, i.e., have used an AI Chatbot in the past.
The AI, instead of just trying to sound human, uses a camouflage of mimicking the style of a popular human celebrity (Neil DeGrasse Tyson, in this case). This increases the difficulty for the context aware Interrogator (you) to identify the AI.
Now, obviously the quiz that I shared is by no-means a bullet-proof lab designed test. I am sure there are drawbacks and issues with it. My only intention with this article was to bring to attention the importance of being context-aware in such scenarios. Knowing how an AI behaves makes a huge difference. Because if you know how an AI responds, you can easily identify it in most cases. This opens up the question of whether machine intelligence can only be defined through the paradigm of mimesis. If being more ‘human-like’, is the measure of achieving intelligence or AGI in the broader definition of it, then technologies such as GPT-4 are miles away.
What do you think about the Turing Test? What do you think are its drawbacks or positives? What are your thoughts on improving/enhancing the test to meet modern day requirements? I would love to know your thoughts on this in the comments.
I think I've written about the Turing test twice now, and it's always fun to revisit the concepts. Just an observation: I'm pretty sure that Turing would have been quite satisfied with today's computers as "exhibiting intelligence", whatever that slippery phrase really means. I think he would see this and say, "yes, this is exactly what I'm talking about."
It has only been in the ensuing decades that our own cultural perceptions about AI have evolved, with us demanding ever more before we say, "yes, this is convincingly human."
I think Turing would have rejected ELIZA, but thumbs-upped ChatGPT.