Often when I engage in dialogue, I get stuck on the meaning of words. I generally spend way more time than socially appropriate insisting on laying out their definitions. It may seem trivial, but it is not. Language is very powerful. The words we use give us the ability to transcend space and time, manipulate and organize the world, and represent reality as Truth or bend it with lies. Words are the cornerstone of civilization, so I desire to choose how I use them carefully.
Recently, I have become interested in modelling human language in a way that we can interpret it with computers. This pursuit will yield exciting technological advancements to enjoy and harrowing externalities to plan for. The optimistic technologist and the pessimistic naturalist are both right and both wrong here, for the future is and always will be terrible and wonderful. However, I do not wish to attempt any specific predictions of the future in this essay, we thankfully have Black Mirror for this.
What I am interested in is what can be learned from the process of deconstructing and rebuilding language in a way that other potential intelligence can understand. The more I learn about language, the more I realize that I have no idea what language actually is. Language is full of contradictions that we rarely consider. One thing always shares meaning with many other things. Context is fluid across the span of a sentence, paragraph, story, or lifetime. Words are subjective tools used to describe objects to other subjects, which just may be the messiest conundrum in all of existence. Fortunately, one of the best ways to learn is to teach. I hope that the process of teaching natural language to machines (or, more appropriately, enabling them to learn) will allow us to learn what language really is and ultimately, who we are.
The above pursuit generally falls into two fields, natural language processing (NLP) and natural language understanding (NLU). NLP makes sense to me and it is a field rich in progress. It encapsulates techniques and theories used to reliably ‘process’ language to achieve certain tasks. For example, sentences can be analyzed for sentiment by computing the frequency of positive/negative words according to an expert-created lexicon. Or, words can be converted into vectors and fed into machine learning algorithms such as Latent Dirichlet Allocation, which automatically organizes them into topics based off of the density of their clusters when mapped into n-dimensional space. These things, albeit technically complicated, make sense.
What does not make sense is the second field, NLU. What does understanding mean? The Oxford Dictionary defines understanding as ‘to perceive the intended meaning of’. This however, like most definitions, is not very helpful. I do not perceive the intended meaning of other people’s words all the time, just ask my wife, family, friends, colleagues, and so on. If we accept this definition, which we probably should because it’s from Oxford, I would posit that humans often misunderstand one another when speaking the same language. That being the case, how could we teach machines to understand language? Clearly, the idea of language needs to be better understood. To gain clarity, I ask the question, what is language?
Language is many things.
Language is a map. It maps symbolic abstractions onto objects as well as commensurable affective states perceived by the subject.
Language is a storage device. Human lives are finite. When we die, all of the unarticulated knowledge we amass in our lifetime disappears. If we convert such knowledge into language, it lives, granted it is remembered or better yet, written down.
Language allows consciousness. Consciousness relies on abstraction, specifically the ability to abstract one’s self as a distinct entity, separate from surrounding world. Abstraction, because it is divorced from actual things, relies on a proxy. This proxy is language, the representation of things.
Language is a tool. Sharper than any sword, language can be used as the most powerful tool to manipulate and shape the world. With language, we can build things, methodologies, and ideas and communicate them intra and inter personally. Buildings are constructed out of an array of materials (wood, nails, plaster, iron, etc.) and a variety of tools (saws, hammers, trowels, welders, etc.). Ideas are constructed out of words, which are both the material and the tool. Thousands of years ago, the mighty Roman Empire built great buildings, most of which have fallen. At the same time, the Jews they oppressed built an idea and wrote it down. It still stands today.
Language is a paradox. Just like culture, which can only be understood from within culture, language can only be understood with language. Furthermore, we assume language describes things as they are, but it really describes things as they appear in relation to us.
All of the above points are meaningful paths to explore as we try to discover what language is. However, it is this last point that speaks most clearly to why language is such a difficult phenomenon to study, codify and make machine interpretable.
In order to build intelligent agents capable of human-level communication, language must be machine interpretable. If language was just symbolic representations such as speech and text, then language could be understood statistically, as soon as our algorithms and computational power are up to the task. If such were the case, NLU could be solved.
I will attempt to gain clarity on this issue by using an example. The word ‘hammer’ is a word we are all familiar with. It generally maps onto an object with a long handle and a blunt, heavy and flat apex used to strike and apply significant force onto a concentrated area, like the head of a nail. If this were all that it was, then we could convert the alphabetical characters h,a,m,m,e, and r into a unique float vector. We could add this vector to a massive lookup table holding all the words and their unique vectors, eagerly awaiting statistical manipulation. We could even analyze millions of images of hammers, convert them in matrices and aggregate them together, yielding in a unique, machine-readable ‘meta hammer’ matrix that we map to the word vector. Whenever a machine receives this unique word-vector, it could output an image of a hammer, and vice-versa.
Unfortunately, language is not that simple. Anyone who has tried to pound tent stakes into the ground when camping knows that a grapefruit-sized rock is also a hammer. The ‘meta hammer’ we previously described and this rock appear very different, but they share a common characteristic. Both objects can be grasped by a human hand, are light enough to be tossed about, heavy enough to transfer significant force, have a surface large enough to strike the desired target, and are rigid enough to maintain structure upon contact for optimal force transfer.
As it turns out a hammer is anything that meets the above criteria. This means to truly understand what a hammer is, one must understand the capabilities of human hands and intuitive physics (in relation to human bodies), as well as the ability to specify a defined goal that can be obtained through smashing an object.
No problem one might say. With enough time, all hammer-like things can be catalogued, for there are no rules that a thing can only be one thing. All hammer-like things can be appended to the hammer category, such as sideways-held tennis rackets and miniature baseball bats. However, what a hammer is to one is not always a hammer to another. For example, with years to develop my fine motor skills, I can use an empty glass bottle as a hammer, should the accepting surface be soft enough. A child, with less refined motor ability, would likely break the bottle while attempting this task. A thing that breaks into shards of glass when struck upon a target is not a hammer.
I think the process of discovering alternative meanings to words is inexhaustible. I only need to mention the existence of Thor’s hammer to prove this. Thor’s hammer is very clearly a hammer, but it used to smash the chaotic forces of nature, often represented by dragons and giants, into order and Truth. This is a different kind of hammer, but definitely still a hammer.
The trickiness of the problem detailed above lies in two areas, prior knowledge and subjectivity. Prior knowledge is all of the known information effecting a phenomena that is not explicit. Humans have a lot of prior knowledge and machines essentially have none (although some Bayesian probabilistic models allow space for conditional learning). In other words, machines are naïve to everything except for what they are explicitly told. Unless humans have specifically included conditional information in the feature-space for training a machine learning model, the machine will remain completely ambivalent to such information.
However, encoding the necessary human-level prior knowledge to solve such a problem would be impossible. This is because while humans are naïve to everything we don’t know, we are particularly naïve to everything we don’t know we know. We do not explicitly know that the hammer we used to drive a nail into the wall to hang a picture is also, metaphorically, the same thing that Thor uses to smash chaos into order. For some weird reason, we can make the connection.
Machines cannot make this connection. Word embeddings, which are algorithms trained on massive text corpuses that map vectorized words into n-dimensional space based off of word associations, do a decent job at this. For example, using word2vec, a popular word embedding model, one can subtract the vectors for ‘man’ from ‘king’ and add ‘woman’, and the result would be ‘queen’. This is absolutely amazing. It works so well its almost incomprehensible. Despite these advancements, attempts to codify all prior knowledge would be thwarted by sheer volume, the fluidity of context, and most of all, our own ignorance.
The next and even more challenging problem is that of subjectivity. As we discovered earlier, an object is not defined by itself. Instead, an object is defined by its relation to the subject. The delicate glass bottle is not a hammer to the clumsy child. The rock is not a hammer to those with hands too small to grasp it. Thor’s hammer is only a hammer to Thor (or Chris Hemsworth), who alone can lift it and smash things. Not only can objects hold multiple interpretations depending on the subject, objects can only be represented linguistically by subjects who call them into being. A hammer is not just the thing. A hammer is a thing with contextually dependent properties to be used for smashing things. If we could get passed (unlikely) the contextually dependent properties conundrum, the fact that a hammer is only a hammer if it is to be used for smashing things presents a whole host of difficulties.
To know what a hammer is, one needs to know what it is to want to smash something. It doesn’t take very many months of life for a human to learn how useful (or at least interesting) it is to smash things. Smashing is part of the repertoire of potential human action. To make this point, let’s use dogs for an example. Smashing is not a part of the repertoire of potential dog action. Dogs do not have the physical ability to grasp a hammer or swing one. Therefore, dogs have no need to desire smashing or to understand the concept of a hammer. Digging? Sure, that is a potential dog action. Shovels, no, because their paws do not allow the use of such a tool. At first glance, dogs appear to understand simple words, like ball. However, a simple thought experiment illustrates they do not, for they do not accurately perceive the meaning of the word (if we’re sticking with the Oxford definition). To dogs, a ball is a thing to chase. To humans, a ball is a thing to throw. These are different things. We can meet the dogs halfway because we also understand the concept of chasing, but they do not, and will not understand the concept of throwing. No other species can, and for the better. I do not care to imagine a world where squirrels could not only drop nuts randomly over unsuspecting victims of chance, but precisely aim and throw them. Chimps, with their highly mobile shoulder joints come close. However, their relative lack of hip rotation prevents any reasonable generation of throwing power, thus rendering the action useless and unincorporated.
My point is that words represent actions and states as perceived by creative, manipulative, and specifically embodied creatures. Our entire lexicon is built around the experience of operating a human body in a specific environment. To truly understand human language, one must be a human. For example, one of the most human actions we do is specify goals. We assess the present, fantasize about a preferred future, and then create a plan for action to manipulate the present into the desired future. Sometimes, smashing is an action required in this process, so we use a hammer. A machine however, can only carry out the process of smashing if it is programmed to do so. A machine can only carry out any action if it is specifically programmed to do so. To understand a word, one must be capable of the action represented by the word. The machine must be capable of hammering to understand the word hammer. Carrying this idea all the way implies that for a machine to understand all words, it must be able to perform all actions.
My aim is not to be too esoteric. I am writing this piece not as a theoretical exploration, but as a way to collect my thoughts and clarify my approach to NLP/NLU. I believe this exercise has aided in that goal and I hope you’ll agree. While the task of understanding language blew up in complexity before me (and I only scratched the surface), I believe I found some pieces worth collecting. I am pretty sure I have learned that the concept of NLU is ridiculous (please challenge me if you disagree, I want to learn). However, NLP has immense potential to grow into. In NLP, our goals should not be to help machines ‘understand’ our language, but rather reflect back to us information about ourselves that we didn’t know, or didn’t know that we already knew and took for granted. So much progress can still be made from algorithms like the previously mentioned sentiment analysis or topic modeling. Furthermore, we are just entering the uncharted territory of a language renaissance with the development of powerful transformer algorithms like BERT. This is very exciting.
I do not think NLU and actual human-level conversation with other intelligent agents is possible or should even be our goal. If anything, I am amazed that humans are capable of it with other humans. However, I do see NLP as an incredibly powerful and growing field with great promise. While human-level conversation may not be theoretically possible, NLP may yield machines capable of human-like conversation. With this goal, an intelligent agent may be capable of conversing by reflecting our own use of language back at us with great sophistication. In this light, I envision a future where we appear to converse with machines, but actually converse with our conscious and subconscious selves. Such conversations could reveal to us a deeper understanding of our own use of language, who we are, and ultimately, how we think. This process could guide us towards our virtues, shine light onto our dark spots, and break down the walls of semantic traps that bind our perception the world. While such a machine could not technically understand a conversation, it may not even matter if it could potentially pass the Turing test (when a human cannot tell if they are interacting with a computer or another human). Once we cross that bridge, I do not think we get to decide what is real and what is not.