The era of GPT-X
As we step into the era of industrial AI, the general public is gaining access to cognitive tools that were once out of reach. Unlike earlier technologies that primarily facilitated knowledge storage, retrieval, and search, these new tools actively assist with thinking and cognitive tasks. However, it is essential to be precise when describing the kind of “intelligence” they exhibit. A deeper understanding of their functionality and limitations helps prevent the misconception that they are “conscious,” “sentient,” or “alive.”
Background:
Language lies at the heart of how we understand ourselves and the world. Across disciplines such as philosophy, linguistics, cognitive science, and artificial intelligence, one of the fundamental questions concerns the relationship between human thought, intelligence, and language. It is no surprise that Alan Turing’s renowned test for intelligence — the “Turing Test” — is based entirely on language comprehension.
The pursuit of modeling language as a natural phenomenon has long been central to the quest for understanding human intelligence. Influenced by major breakthroughs in formal systems within mathematics and logic in the early 20th century, the effort to develop formal models of natural language gained momentum. This endeavor culminated in Noam Chomsky’s Syntactic Structures, marking a pivotal moment in the field.
In the 1960s, linguists distinguished between syntax and semantics. Noam Chomsky famously illustrated this distinction with syntactically correct yet semantically nonsensical sentences, such as “Colorless green ideas sleep furiously.” Syntax appeared so fundamental that it seemed almost independent of a language’s statistical properties. However, in everyday communication, we do not always strictly follow syntactic rules or rigid grammatical structures.
Despite these deviations, syntax was considered so essential that Chomsky proposed the existence of a universal grammar — an innate linguistic framework possibly rooted in biological or genetic factors. Setting aside debates over innate grammar and the poverty of stimulus argument, the idea was that if the correct syntactic rules were established (regardless of their origin), one could theoretically generate all possible texts in a given language through algorithmic processes. The real challenge, however, lies in ensuring that these generated sentences are not only grammatically valid but also meaningful and accurate.
The issue of semantics is far from simple, making it clear that traditional approaches alone were insufficient. This realization spurred a growing interest in knowledge representation. The kind of digital knowledge representation we encounter daily — such as Wikipedia or similar sources — barely scratches the surface of the broader “knowledge representation” problem.
In reality, capturing and structuring knowledge proved to be an immense challenge. No manually curated knowledge base could ever encompass the vast scope of human information, particularly common sense knowledge. Common sense knowledge includes information that most people intuitively understand but is rarely explicitly documented in conventional knowledge repositories. Some examples that you can’t find on a usual knowledge base are:
- If a glass of water is knocked over, the water will spill.
- People need to breathe to stay alive.
- Turning off the lights in a room will make it darker.
- Objects fall to the ground if they are not supported.
- People generally feel cold when the temperature drops.
Each piece of common sense knowledge also appeared to have exceptions, making the task seem like a never-ending rabbit hole. The early attempts at writing down everything in projects like Cyc or YAGO failed in a total disaster and wasted millions of dollars, time, and resources.
Another challenge is reasoning, particularly common sense reasoning, which involves the human-like ability to make assumptions about the nature and essence of typical situations encountered daily. For instance, folk psychology refers to our inherent capacity to interpret other people’s emotions and intentions, connecting them to motives and behaviors.For example:
- He’s frowning, so he must be angry.
- She didn’t tell the truth because she knew it would hurt his feelings.
- He’s in a bad mood, so he’ll probably snap at someone.
- People from that city are always rude
Additionally, naive physics represents our everyday understanding of the physical world (which may not always be accurate in terms of actual physics). Examples include:
- What goes up must come down
- A dropped object falls straight down
- A solid object cannot pass through another solid object
These abilities also aid us in comprehending cause and effect, as well as performing temporal reasoning. Similar to common sense knowledge representation, common sense reasoning poses challenges. Early efforts in this area often involved extensions of formal logic, while more recent approaches have utilized Bayesian inference to address the problem. The type of reasoning we employ daily cannot be easily formalized, and it has long been acknowledged, particularly in psychology, that we do not adhere to classical logical systems for everyday reasoning. One intriguing area of research is abductive reasoning, which involves generating hypotheses from observations that cannot be framed within inductive or deductive logic. For example, suppose you return home to find a broken window. In that case, you can automatically generate multiple hypotheses to explain your observation (such as a burglar breaking in, a ball from children playing, or a bird).
Language models can bypass both problems by defining a seemingly trivial problem of predicting the next word in the text. Language models are generative in nature. Just like grammar that can generate sentences, LMs also can generate sentences but without explicitly incorporating any hard-coded grammar in them, they not only can learn the syntax of the language but they can go far beyond simple grammar! They also start to show reasoning abilities without any explicit reasoning engine embedded in them.
ChatGPT demonstrates that language models are capable of generating text that maintains proper syntax while also conveying accurate semantics and knowledge. These models challenge the traditional notion of requiring distinct knowledge bases and inference engines. Instead, they seamlessly integrate knowledge bases, inference engines, and syntax models within the transformer model’s weights. Now it’s easier to see that what ChatGPT does is a mixture of all the efforts in “knowledge representation” and “knowledge inference” and nothing more. From this perspective, ChatGPT is not “conscious” or “sentient”; rather, it is an incredibly sophisticated tool unlike any we’ve had before. Its uncanny ability to imitate can easily deceive us into believing that it possesses consciousness.
We must know that despite ChatGPT being an incredibly advanced system in terms of scale and connectivity, it does not create new knowledge. This means it cannot venture into the world and formulate fresh conclusions. Instead, ChatGPT generates pieces of information, which may not necessarily exist explicitly, based on its prior learning. In essence, the generation process involves producing information that is inherently present but has not yet been articulated. For example, you might know two pieces of information like below but never thought about the result that you could logically infer from them:
Piece of information 1: You know that plants need sunlight to grow and thrive, as they use it for photosynthesis.
Piece of information 2: You are aware that the intensity of sunlight varies depending on the time of day, with the strongest sunlight usually occurring around noon.
From these two pieces of information, you could infer that plants placed in an area with direct sunlight during the peak hours of the day would potentially grow faster and healthier due to the increased intensity of sunlight available for photosynthesis.
This observation is intriguing, as it does not suggest that ChatGPT is incapable of generating compelling content or engaging in what we ambiguously call “thinking.” On the contrary, it can produce highly valuable information that enhances our understanding. In many cases, this aligns with what is commonly considered “thinking” or even “original thinking” in humans. However, it is certainly not the full extent of human cognition.
We possess other faculties that, at least for now, remain beyond the reach of current AI — most notably observation, curiosity, and a more profound intuition. The overemphasis on cognition has led many to regard it as the most fundamental function of the brain. While this perspective has been challenged, it continues to exert significant influence across various disciplines, including computer science and artificial intelligence.
Another aspect to consider is that ChatGPT had access to an immense body of knowledge during its training, which would be impossible for any single person to read in his/her lifetime. Therefore, even if its logical reasoning or ability to extract correlations might be imperfect or somewhat noisy compared to humans, it can compensate for these shortcomings with the incredible wealth of knowledge stored within its network connections.
Data, data, and more data:
It’s remarkable how much knowledge can be acquired from the world simply by accessing text! However, it is important to note that language models do not represent the full scope of “language.” Language encompasses more than just written text; it is a complex phenomenon that includes text, speech, phonology, action, and embedded cognition, all of which are learned through interactions with the external world — elements these models have never been exposed to.
The guiding principle of current NLP research can be summarized by a quote from renowned linguist John Rupert Firth:
“You shall know a word by the company it keeps” (Firth, J. R. 1957:11)
For example, take the word “bank” in the following contexts:
- She deposited money in the bank.
- The floodwaters reached the top of the bank.
In the first case, “bank” clearly refers to a financial institution but in the second example “bank” refers to the land alongside a river or slope. This idea seems very natural but limited, however, all words have the same story!
Firth’s ideas gained traction among his peers, leading to the emergence of the ‘London School’ of linguistics. His thoughts later influenced “cognitive semantics” developed by George Lakoff and others, which eventually gave rise to the concept of “embodied cognition.”
Distributional semantics serves as the foundation for this approach, asserting that “linguistic items with similar distributions have similar meanings.” This principle laid the groundwork for embedding methods to represent word semantics and facilitated the integration of language into deep learning frameworks. Firth’s approach to semantics was different from many of his contemporaries and predecessors, who primarily focused on identifying systems of analytical, logical, and categorical structures. In contrast, Firth abandoned creating metaphysical systems and emphasized contextualism.
From a deeper philosophical perspective, semantics arises from relationships within the world, a concept highlighted by Wittgenstein when he stated that the meaning of a word lies in its application. Here, “application” extends beyond mere contextualism in text; it encompasses an individual’s entire interaction with the world in order to comprehend the meaning of words and sentences.
Determining the extent to which “application” can be derived from text is challenging, but the impressive performance of these models suggests that they possess substantial knowledge about the usage of words in various contexts and real-world situations. This remarkable intuition even extends to an understanding of the world’s physics, as demonstrated by the following example, GPT-4 can accurately navigate in an imaginary map in a world that has physical properties like direction.
Another example is the property of how physical objects can fall.
Or abductive reasoning that requires a deep understanding of common sense and how we infer and fill the gaps in our world by our strong generative models in our brains:
There is still much to explore, and these advancements continue to impress us. However, this approach is highly inefficient. Despite being trained on nearly all available text in the world, these models only approximate human-level common sense — something a human baby typically acquires after just a short period of real-world interaction.
No human needs access to the entirety of the internet’s text — including all news, books, articles, and blogs — to understand fundamental concepts like how a piece of paper works, the nature of human affection, or the function of a government. Yet, it is important to recognize that in certain domains, large language models surpass human understanding.
In short, these incredible capabilities come at a cost: they require exposure to vast amounts of data just to grasp even the simplest things!
Exploring the New Frontiers
From our understanding, large language models (LLMs) can comprehend numerous aspects of the world and possess a unique advantage that we lack: they are trained on the entirety of human knowledge. This distinctive property can assist us in exploring research possibilities within the realm of all potential new information. There are many gaps in our understanding of the world, partly due to our inability to examine the vast amount of already collected data, observations, and theories. In other words, we, as a collective species, may be oblivious to the inferences that can be drawn from the information we possess. For instance, it was recently discovered that some ancient images resemble fractals, even though we have known about fractals for decades. The problem was that no one had ever connected the dots to see that these two pieces of information, which we have had for quite some time, could help us uncover something new. Imagine if LLMs could delve into the knowledge we’ve accumulated from various sources and reveal entirely new insights that have eluded us for a long time.
This possibility unveils perhaps the most thrilling aspect of LLMs as computational engines capable of extracting knowledge at a level we have never experienced before.