Language and AI: Guy Emerson on using AI with language data

This article is shared from Cambridge Language Sciences.

It's a funny time to be working on language and AI. A year ago, this might have seemed like a niche interest, but these days ‘large language model’ is practically a household term. I recently had the chance to speak about language and AI at an event run by the Centre for Science and Policy, and I thought I would write a summary, because it seems a lot of people are interested in learning about how AI can be used with language data.

Different practices of AI

It might help to start with a more general question: what is AI? In his book ‘Artificial Dreams’, Hamid Ekbia contrasts three different things that are referred to as ‘AI’. One is AI as an engineering practice: building computational systems to automatically carry out tasks. Another is AI as a scientific practice: developing computational models of intelligent human behaviour. But these two practices pull in quite different directions, leaving a gap that is filled by AI as a discursive practice: using psychological terms to describe engineered systems.

This discursive practice can be misleading, particularly when it comes to language. Normally, the only kind of language we come across is language produced by humans. This means we naturally view text in human terms, leading to a temptation to anthropomorphise systems that output text. This is a strong psychological effect (including for AI researchers), and it gives false intuitions about the kinds of mistakes such a system would make. The dangers of anthropomorphising ‘AI’ have been discussed by many researchers, including Melanie Mitchell and Murry Shanahan. In many ways, the two of them have quite different perspectives, but they agree on the importance of talking about AI systems in plain language.

So, in plain language, what is the engineering practice of AI? The dominant paradigm is machine learning, which involves training a model on data to perform a task. Why do we want to do this? That depends on whether we care about the data, the task, or the model. Better understanding the data is the usual goal in the digital humanities and computational social science. High performance at the task is the usual goal in machine translation and indeed most of NLP (Natural Language Processing). Finally, using a machine learning model as part of a scientific model is the goal in areas like computational linguistics and computational neuroscience.

Focus on goals, not tools

Being clear about the goal makes it easier to cut through the hype. Vague assertions like a model being ‘powerful’ are meaningless unless we can clarify what functionality the system is intended to have, and how that functionality has been evaluated. In this light, large language models (LLMs) can look suspicious. If our focus is on understanding the data, we are faced with an opaque relationship between the LLM and the data it was trained on. If our focus is on a task, we first need to be clear about what the task is, and what would constitute good performance. Finally, if our focus is on cognitive modelling, a model is implausible if it's trained so much text that a person would need multiple lifetimes to read it all.

That's at a high level, but what about specific applications of LLMs? One of the most talked-about applications is search. However, if we think about what users want to achieve, and how we could design a system for those user-oriented goals, then we're unlikely to want a system that regularly generates false information – which is exactly what LLM-powered chatbots will do.

There has already been a case of a lawyer unwittingly using ChatGPT as a ‘super search engine’, only to be presented with entirely fabricated results. The lawyer was rightly sanctioned for not doing their due diligence, but the fact remains that a search engine that fabricates results is not fit for purpose. For an in-depth look at the many facets of search, and how we might design intelligent and effective systems, I can recommend the recent survey paper by Chirag Shah and Emily M. Bender, ‘Situating Search’.

Leaving the hype and speculation to one side, there is still plenty of exciting work on language and AI being carried out in Cambridge, across all three kinds of goal mentioned above, and I'd like to share a few examples with you.

Understanding data – new opportunities to work at scale

With understanding data as the goal, NLP tools are opening up opportunities to work at a scale that would be otherwise infeasible. A great example comes from the Africa's Voices Foundation, a non-profit spun out of the University of Cambridge, where I had the opportunity to work on a project a few years ago. Africa's Voices have pioneered a methodology for citizen engagement using interactive radio, where listeners are asked for their views and invited to respond via SMS. The sheer quantity of responses means that tools for organising and exploring the data are invaluable. The process ultimately relies on human interpretation, but the technology facilitates working at scale.

A second example was highlighted in a keynote at our Annual Symposium in 2020. Ruth and Sebastian Ahnert analysed networks of correspondence in the Tudor period, looking at 130,000 letters between 20,000 people archived in the British State Papers. To make sense of so much data, they used tools from network science to understand the macroscopic structure of the network. This can indicate who is playing an important role in the network, and which letters might merit closer reading. For example, they identifed well-connected women who have been largely overlooked in traditional accounts. A book on their findings, ‘Tudor Networks of Power’, is coming out in a couple of months.

Automating tasks – progress and challenges

Turning to task-based goals, a classic example is machine translation. Unlike the data-focused examples we've just looked at, we're not interested in the original training data, but rather in how the model performs when applied to new data. Furthermore, we don't need the model to carry out the task in the way a human would – just as a bike is biologically unnatural but still useful for human transport, a machine translation system might be a poor cognitive model but still useful for human communication.

There has been dramatic progress in machine translation in the past few years, with online systems becoming increasingly reliable for an increasing range of use cases. However, there are still plenty of outstanding challenges. The amount of ongoing research is enormous, but I could share a couple of examples of work in Cambridge. On the more technical side, Stahlberg and Byrne have found that current models are forced to strike a delicate balance between different kinds of error, and there is unlikely to be a simple solution. On the more practical side, machine learning models are prone to exaggerating any biases in the training data, which can lead to systematic errors. To mitigate gender-biased mistranslations, Saunders et al. have developed a method for constraining a system to guide it towards appropriately gendered outputs.

Cognitive modelling – when the model itself is the goal

Last but not least, we have models as the goal. This includes my own research on distributional semantics, where the aim is to model how people can learn the meanings of words from the contexts in which they appear. For example, if you don't know the word ‘kueh’ but you hear me say ‘we served kueh to our guests’, you might immediately have a strong intuition about the meaning (perhaps something edible and delicious?), just based on that one short sentence. Modelling this process at a realistic scale requires computational tools. In just one year, a person will hear or read roughly ten million words, which is too large to process by hand (but still small from the perspective of large language models). If you want to read more, I've written a survey paper on distributional semantics, highlighting how there is often a trade-off between linguistic expressiveness and computational tractability.

As a final example, which we featured on our website a few months ago, Burridge and Vaux developed a model of speech processing, which converts a low-level audio signal to high-level features. For example, vowels can be described in terms of the tongue position (high or low, front or back) but this is not immediately visible in the audio signal. They trained a machine learning model to do this conversion, improving over more traditional signal processing methods, and closely matching human perception. As an example application of the model, they looked at dialect variation in American English. According to dialect surveys, the words ‘merry’, ‘marry’, and ‘Mary’ are pronounced identically across most of the US, except for an area around New York. Applying the trained system to recordings of speakers from New York and elsewhere, the contrast could be clearly seen.

So that was a whistle-stop tour of language and AI, encompassing three kinds of goal: the data, the task, or the model. I hope I've also shared a sense of how varied the research is – all of which is welcome at Cambridge Language Sciences!