skip to content

Section C

Theoretical and Applied Linguistics

 

LI18: Computational Linguistics

This paper is available for the academic year 2022-23.

This paper provides an introduction to computational linguistics, covering the fundamental techniques which can be used to model linguistic phenomena computationally at the levels of morphology, syntax, semantics and pragmatics. Students are taught how such techniques are implemented, evaluated and applied to natural language processing (NLP) tasks. An overview of the use of such techniques is provided, along with an introduction to several applications (e.g. machine translation, sentiment analysis and dialogue systems). At the end of the course, students will understand basic computational linguistics techniques as well as their limitations and current performance levels when applied to linguistic research and to real-world tasks.

The course will follow the main text book used for Computational Linguistics worldwide: Speech and Language Processing - An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky and James. H. Martin (2008, Second Edition, Prentice-Hall). This book will be accessible to all those taking the paper. More specialised reading is listed in each chapter of the book. These and other relevant readings will be introduced to students during the lectures. Relevant readings are freely available on the Web (and will be downloadable as pdf documents).  Additionally we will be drawing on updated topics introduced in the new draft version of Jurafsky and Martin available online at https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf.  Material for these will be summarised in the lecture notes.

Aims

  • To introduce the fundamental techniques of natural language processing (NLP)
  • To develop an understanding of the possibilities and limitations of those techniques
  • To understand the framework within which NLP continues to develop
  • To gain insights into current and future applications
  • To develop practical skills for solving NLP problems

Scope

  • Focus on basic natural language processing techniques at the levels of morphology, syntax, semantics and pragmatics
  • Focus on text (rather than speech) processing
  • No prerequisite courses in computational linguistics or computer science are required. The course is an entry level course accessible to any undergraduate student in linguistics, and does not require any prior programming skills.
Topics: 

Proposed lecture schedule/topics to be covered:

Michaelmas Term

1.Introduction: broad overview of NLP research, language models, complexity of language applications
2. Regular expressions, text normalization and edit distance 
3. Finite state techniques
4. N-gram language models
5. Naïve Bayes and sentiment classification 
6. Sequence labelling for part of speech and named entities
7. Constituency grammars and treebanks
8. Constituency parsing

Lent Term

9. Compositional semantics
10. Distributional semantics
11. Neural networks and neural language models
12. Word senses and WordNet
13. Computational discourse
14. Dialogue systems and chatbots
15. Information extraction and question answering
16. Machine translation

Preparatory reading: 

Daniel Jurafsky and James Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2nd edn, available online: https://www.cs.colorado.edu/~martin/slp.html

Teaching and learning: 

The first part will provide an introduction to the course, and cover morphological and syntactic processing of language. The second part will focus on computational semantics and pragmatics, and introduce some well-known NLP applications.

There are sixteen one-hour lectures in total, eight in Michaelmas Term and eight in Lent Term. You will also have seven supervisions, normally three during Michaelmas Term, three in Lent Term and one in Easter Term. Additionally, there are six two-hour Python practical labs held through Michaelmas and Lent.

The paper's Moodle site can be found here.

Assessment: 

80% of the assessment is based on a take-home written examination; 20% of the assessment is based on the Python for Computational Linguists lab which is assessed in week 7 of Michaelmas and Lent terms.

Course Contacts: 
Dr Nigel Collier