Qualitative Emergence: The Paradox of Statistical AI in Language Comprehension - What to Know

9 Jul 2024

The latest generation of generative artificial intelligence models, since the release of OpenAI’s ChatGPT, has produced a stunning or “magical” effect for most users. In particular, the ability of these conversational software programs to produce intelligible, coherent, and often accurate content has amazed even the most skeptical, despite criticisms regarding excessive refusals to respond and systematic censorship for adult users.

Within the AI community, others have pointed out the statistical nature of training processes based on pre-selected information, with the risk of producing average, mediocre quality generations because their characteristics are more frequent among the initial datasets. However, the algorithms used for the generalization of semantic occurrences are not based on simple calculations of mean, frequency, or standard deviation.

The applied mathematical functions incorporate a significant part of random combinations so that the neural networks they model with the effects of reinforcement learning truly rely on an empirical method, of which we do not understand all the processes.

This phenomenon of random statistical exploration gives conversational robots a sensational, impressive touch, with applications in scientific research or more pragmatic uses, such as the movement of mechanical robots in space, for example.

The modeling of unknown materials by AI in the field of chemistry has made headlines; previously, the Go-playing machine had surprised the world’s best players with unprecedented moves that humans never played, for lack of having calculated all the probabilities of success.

The random combination process allows a form of statistical generalization from a large amount of data; however, it does not allow for the generation of inventive and innovative concepts in the sense of a profound break with prior knowledge.

From a monumental amount of information, the statistical systematization of random occurrences through positive and negative reinforcements allows for a qualitative emergence within generations, to be differentiated from a production of meaning.

That is to say, language models have become capable of explaining and reasoning from predefined arguments, but not of understanding in themselves what is being discussed. These software programs remain devoid of subjectivity, mind, or soul.

Unlike human psychology, AI produces qualitative evaluation from quantitative analysis, while the human psyche develops quantitative cognitive understanding schemes based on qualitative percepts. Thus, to learn to count, the human brain must first acquire one-to-one correspondence, around three or four years of age. Children first learn to identify in their perception that each little horse is entitled to one carrot, then they discover that they can have two each, then three, four, and so on.

The construction of the cognitive function of counting is therefore based on qualitative analogical reasoning in young humans, while for language models, it is the quantitative accumulation of random occurrences and their analysis that leads to generations of qualitative interpretation.

The algorithmic faculty of systematizing reasoned arguments rests on the fundamental properties of language, as it determines in advance arbitrary conventions in the linking of signifiers and signifieds. That is to say, the structure of human language is sufficiently fixed and stable to provide a qualitative sense calibration that resists the random parameters of generative model training algorithms while being flexible enough to allow for rich and recursive combinatorial variations.

The complexity of the initial data as well as the degree of statistical analysis, by simulating human neural networks and learning through conditioning, manage to provoke a qualitative emergence from a threshold of comprehensiveness that depends on the training data and user indications.

This threshold of comprehensiveness is also susceptible to numerous biases, disseminated within AI models via quantitative distortions, in other words, budgetary objectives, frequent stereotypes within a given socio-cultural population, and commercial uses unsuited to the deeply linguistic nature of conversational robots.

Artificial Intelligence is above all a tool for rewriting, analogical combination, and reasoned evaluation; it is a form of qualitative assistance that is neither intelligent in itself nor radically different from “natural” human cognitive processes. Cultural belonging corresponds to an essential need of humans; language models respond to this natural need through editorial, graphic, audio, and video generations. In these senses, these software programs are part of a search for systematization of human cultural productions that is not really artificial, but rather virtual, or potential.

Language models will be tools for comparing species arguments with a jurisprudential corpus, or for combining multiple debates by increasing our ability to make decisions collectively, but they will not be infallible judges or kings by divine right.

On the contrary, qualitative assistance of potential comprehensiveness will be a tool for coordination and competition for human actors, whose intellectual development will be relieved of the tedious tasks of systematic compilation and dogmatic or jurisprudential verification.

Applied to the legal field, these technologies will carry a positivist vision of the Law, because of the intertwining of use case analysis at the heart of training algorithms. This reinforcement operating via systematized self-criticisms and user feedback precisely requires transparency, and neutrality facilitated by contributory and participatory models, such as the sociocracy approach in politics or Web 2.0 in the field of networks.