2.1 It needs to understand the differences between utterances, sentences and propositions.
The spoken language can be broken down into three significant layers – utterance, sentence and proposition. The most concrete of the three is utterance which refers to the act of speaking. It involves a specific person, time and place but does not encode any special form of content. Utterances are identified as any length of speaking by a single person with distinct silence before and after. It need not be grammatically correct and can be meaningful or meaningless.
A sentence refers to the ‘abstract grammatical elements obtained from utterances’ .5 Unlike utterances, sentences cannot be defined by a specific time or place. They are grammatically complete and correct strings of words that express a complete idea or thought. An example of this would be if someone asked “Would you like a cup of tea?”. The reply could be “Yes, I would like a cup of tea.” or “No, I would not like a cup of tea.”. These two utterances can be considered sentences as they are both grammatically complete and correct and express a complete thought. However, if the reply was either “No, thank you.” or “That would be nice.”, they would be considered utterances but not sentences as the first is not grammatically complete and the second does not express a complete thought. The person who had asked the question, however, would still be able to perfectly understand the reply.
These non-sentence utterances play a large role in our day-to-day communication, and while they cannot be considered sentences, they still contain the abstract idea of a sentence. This brings us to our third layer of language, propositions. Propositions are concerned with the meanings behind the non-sentence utterances as well as whole sentences. It can be defined as the meaning behind an utterance and are claims or ideas about the world that may or may not be true.
As suggested by the Chinese Room Argument, computers are able to recognise utterances and sentences through a series of commands and are even able to respond accordingly. However, computers seem to face difficulty understanding and constructing propositions.
2.2 It needs to understand that the spoken language is able to encode meaning in different ways, such as patterns and unspoken meanings.
Speech is also able to encode meaning in ways that the written language cannot such as through tone, context, shared knowledge etc. It allows things to be left unsaid and indirectly implied. An example to demonstrate this would be if someone asks, “Have you finished writing your essay?” Your response could be “I started writing it but…” and end it there. While in the written language, a ‘but’ would indicate that there is more to come in the sentence, in the spoken language, the person would understand that the sentence is finished and fully comprehend the reply in association with the facial expressions conveyed with that utterance. True language comprehension requires computers to identify hidden and context-dependent relationships between words and clauses in texts.
2.3 It needs to have the ability to understand concepts, mental representations and abstract relations.
The semiotic triangle shows the relationship between concepts, symbols and real world objects. Together, they form the building blocks of language. Words contain representations of the world, as well as abstract, relational concepts embodied by the words. Concepts are the abstract ideas that these words represent. As such, they possess perceptual qualities as well as sensations through association. The word “dog” can possess the following representations: mammal, furry, barks, has a tail, man’s best friend, canine. The language processor must be able to know what each word represents, and be able to put it together to know what it is referring to even in the absence of immediate stimuli.
Insofar as a computer programmer can programme for the meaning of each word to be the multiple representations for each concept, similar to a dictionary, as well as the various grammatical and syntactic rules of language, then we can say that the computer would be able to learn the basic building blocks of the human language. This, however, limits the ability of the computer to the data and codes it has been programmed with.
2.4 It needs to have the ability to understand that others have different perspectives from itself.
As we have seen, language is very much influenced by thought. We can express words, sentences, utterances, but only if we are able to conceptualize them first. Do machines possess this mental aspect of language – the faculty of imagination and rational thought? It seems like the language that they know and use to communicate with one another, or with humans, is based upon only an algorithm.
This leads straight into the Theory of Mind (ToM), which is the ability to attribute mental states to oneself and others, and to understand that others have different perspectives from one’s own.6 The ToM has been found in non-human primates and humans, and is closely connected to the way we empathize with others. It explains our emotional intelligence as well as our intuitive desires to understand other humans and why they think the way that they do.
Our minds have sometimes been likened to machines (based on its abilities to process information, solve mathematical problems, and store important data), yet can we switch the nouns around to say that the machine functions like a mind? If we are on the side that is compelled to say that computers understand language, then we must say that they have a mind, since understanding language is a trilateral feat in which the mind plays an important role.
However, even though a computer may seem to understand our basic utterances that have their meanings entrenched in the structures of the world, it seems to meet measurable difficulty when it comes to understanding nuances. Aspects of speech like sarcasm, dry humour, and puns may not be understood by the computer unless they have been programmed to, and even this programming might require a technology far reaching beyond our times. But this reveals an important fact – sans programming, sans deliberate action in creation, the computer just cannot adapt to these aspects of human language.
2.5 It needs to have the ability to understand when different languages are being used.
Computer programmes can already identify what national language is being used through profiling algorithms. These pick out the words used in the text, and match them with the most commonly used words of a particular language, to identify what particular language is being used. It becomes trickier, however, when code-switching or hybrid languages such as Singlish or Spanglish are used. These mixed languages combine vocabulary, grammatical and syntactic rules from at least two different languages. In the case of Singlish, a relatively simple sentence could combine the lexicon and grammar from 6 different Singaporean languages and dialects. Decoding this would require a highly sophisticated computer, if at all possible.
2.6 It needs to adapt to the constantly changing and evolving spoken language.
Some linguists consider that the nuances contained in language make it impossible for computers to ever learn how to interpret. In addition, these nuances are not static but constantly evolving – application of these nuances rely not only on syntactic, phonetic and semantic rules, but also on social convention and current events.7 The word “bitch” could mean different things when spoken by different racial groups, in different social situations, and with different tones.
2.7 It needs to process speech, which has added complexity beyond written language.
It is difficult for computers to achieve good speech recognition. Spoken language differs widely from written language and there is wide variation in spoken language between individuals, such as differences in dialects.8 Researchers are looking into identifying what language is being spoken based on phonetics of the sounds of human speech. However, it is not as easy to distinguish phonemes and individual words when spoken, particularly when different speakers have differing accents, tonality, timbre and pitch of voice, pauses and pronunciations. One way to overcome this is through machine learning by providing massive amounts of data for the computer to discern patterns.
In the following section, we look at some of these methods of machine learning that have been developed to enhance the language comprehension skills of a computer.