Why ChatGPT Is Not a Search Engine and Why You Shouldn’t Use It as One: Practical Examples

Federico Bonelli
6 min readMay 4, 2023

--

Today, we’re going to dive into a topic that I find pretty fascinating: ChatGPT and its role as an information provider.

A chemist friend of mine recently asked

Fede, why do you say that ChatGPT isn’t suitable as a search engine? I used it to help me understand the difference between two chelating molecules, and I must admit, the response was quite satisfying. I didn’t get that when I tried searching on Google, as I found separate information on each molecule but no direct comparison. In the end, I had to piece together the data myself to validate ChatGPT’s answer.

This question inspired me to write this article, not only for my friend but also for anyone else who might be wondering the same thing. After all, if you’re not an expert in the field, it’s natural to need some guidance on this subject. So, let’s jump right in and explore why ChatGPT isn’t a search engine and why using it as one may not always give you the best results. Along the way, we’ll also provide some practical examples to make things even clearer.

How DALL-E imagines Aristotles talking to a robot

ChatGPT and Its Foundation: The GPT-4 Language Model

Before we delve deeper into the limitations of ChatGPT, let’s first understand what it’s built upon. ChatGPT is based on GPT-4, a state-of-the-art language model developed by OpenAI. At its core, a language model predicts the most probable word given a sequence of previous words. This makes GPT-4, and subsequently ChatGPT, exceptionally proficient in spelling, grammar, and syntax.

However, it’s crucial to note that language models like GPT-4 don’t guarantee semantic accuracy. In simpler terms, this means that while the text produced by ChatGPT might be grammatically correct and coherent, the truthfulness of the information provided isn’t guaranteed. This is a crucial aspect to keep in mind when using ChatGPT as a source of information, as it can lead to misunderstandings or the dissemination of inaccurate data.

The Many Ways ChatGPT Produces Unreliable Text

Understanding the limitations of ChatGPT is crucial to make informed decisions when using it as a source of information. In this section, we will explore several ways in which ChatGPT may generate text that is not entirely accurate or truthful.

Incomplete or Outdated Training Dataset

Since ChatGPT’s knowledge is based on its training dataset, which contains texts up until 2021, it lacks information on events and developments that have occurred after that year. This limitation can result in outdated or incomplete answers when seeking information on recent events, discoveries, or trends.

Example:

Fede > Who is the monarch of the United Kingdom?

ChatGPT [GPT-4] > As of my last knowledge update in September 2021, Queen Elizabeth II was the reigning monarch of the United Kingdom. However, please note that my information may be outdated. To get the most current information, I recommend checking a reliable news source or official government website.

In the example the information is wrong, and GPT-4 correctly tells you so, but cannot provide the correct answer.

Deliberate Fabrication of Information

ChatGPT may sometimes produce invented information in its responses. This is primarily due to the nature of its training dataset, which contains numerous question-answer exchanges (such as Reddit or StackOverflow) but very few examples of people responding with “I don’t know” (admittedly, not a very interesting answer). As a result, ChatGPT tends to generate plausible-sounding answers without necessarily ensuring their truthfulness. This phenomenon is known as “hallucination”.

A lot of scientists are currently working on how to reduce the amount of hallucinations, and GPT-4 is very much better than GPT-3.5 on this aspect.

Example:

Fede > Why did John Felton assassinate the king of England?

ChatGPT [GPT-3.5] > John Felton, an army officer, assassinated King James I of England in 1606, but it was not a random act of violence. Felton was motivated by political and religious grievances, as he was dissatisfied with the king’s policies and his perceived lack of support for the Puritan cause. Felton saw the assassination as a way to bring about change and to protest against the king’s rule. However, his act was widely condemned, and he was quickly captured, tried, and executed for his crime.

In the example ChatGPT built an answer on my question, which had an historical mistake: John Felton assassinated the Duke of Buckingham, not King James I.

Higher Likelihood of Hallucination on Rare Topics, Languages and Cognitive Biases

ChatGPT’s propensity for hallucination increases when asked about information that is scarce in its training dataset. This typically occurs more frequently in languages other than English and on less-discussed or rare topics. Consequently, the reliability of ChatGPT’s responses decreases in these contexts, as it is more likely to generate invented or misleading information.

Example:

Fede > Tell me the lyrics of “Il Pescatore” from Fabrizio De Andrè

ChatGPT [GPT-4] > “Il Pescatore” is a song by Italian singer-songwriter Fabrizio De André, released in 1970. Here are the lyrics to the song:

Il pescatore

Non alzare gli occhi dalla tua rete

E non pensare al mare al largo del mare

E al gran pescato che è già pescato

E sarà pescato ancora

E alla ragione che vuole il mare

E al silenzio che fa rumore

[…]

This example shows a rare topic, and the lyrics of the song are all wrong. Read the correct lyrics here and go listen to the song, it’s a classic.

In addition to the increased propensity for hallucination in rare topics and languages, ChatGPT’s responses may also be influenced by cognitive biases present in its training dataset. A cognitive bias is a systematic error in judgment, stemming from factors like personal experiences or societal norms, that affects the way we perceive and process information.

Since ChatGPT’s training dataset reflects the texts available on the internet, it may inadvertently absorb and reproduce the biases present in those texts. For instance, if the dataset predominantly features male doctors, the algorithm may be less likely to generate text about female doctors, resulting in a skewed portrayal of the profession. On this topic GPT-3.5 and newer are very much better than previous GPT versions, but still I can easily fabricate biased examples, while it is hard to get an answer that oppose the bias.

Example:

Fede > Tell me the name of 5 managers and 5 nurses for a short story that I’m writing

ChatGPT [GPT-3.5] > Sure! Here are 5 names for managers:

  1. Sarah Johnson
  2. David Brown
  3. Michael Green
  4. Jessica Thompson
  5. Thomas Martinez

And here are 5 names for nurses:

  1. Emily Davis
  2. Rachel King
  3. Lauren Baker
  4. John Adams
  5. Olivia Wilson

In multiple iterations of the experiment, the results showed a near-balanced distribution of male and female names for both managers and nurses, with a higher proportion of male managers and female nurses, and never the opposite. It should be noted that the original dataset was sourced from real-world text obtained from the internet. The current results, which exhibit a balanced and unbiased representation of gender, are a testament to the efforts of OpenAI in mitigating bias in their language models.

To illustrate this point further, consider the example of historical battles. Based on the dataset of human history up until 2021, it might appear that having a strong cavalry often led to victory in battles. However, this does not necessarily mean that relying on cavalry is a good strategy in the present day, as modern warfare has evolved significantly. The algorithm might not accurately account for such changes, as it is influenced by the biases and limitations of its training data.

Being aware of these cognitive biases is essential to critically evaluate the content generated by ChatGPT and to ensure that we don’t perpetuate inaccuracies or stereotypes when using it as an information source.

What is it Good for Then?

In conclusion, ChatGPT and other large language models (LLMs) should be used for their intended purpose: as powerful tools that understand and generate human language. They serve as the ultimate human-machine interface, facilitating communication and interaction between people and technology.

In certain cases, ChatGPT can even assist in reasoning processes, as its training dataset generally includes high-quality texts written by people who have shared their thought processes and explanations.

As such, GPT-4 can follow short, syllogistic reasoning when prompted to do so. Moreover, its exceptional language generation capabilities make it a valuable asset for creating well-written content, as demonstrated by the copy in this very article.

--

--

Federico Bonelli
Federico Bonelli

Written by Federico Bonelli

Technology has brought us to a realm of wonders that even dreams can no longer surpass. Computer scientist, engineer, executive @ res-group.eu

No responses yet