This paper is available for the academic year 2024-25.
This paper provides an introduction to computational linguistics, covering the fundamental techniques which can be used to model linguistic phenomena computationally at the levels of morphology, syntax, semantics and pragmatics. Students are taught how such techniques are implemented, evaluated and applied to natural language processing (NLP) tasks. An overview of the use of such techniques is provided, along with an introduction to several applications (e.g. machine translation, sentiment analysis and dialogue systems). At the end of the course, students will understand basic computational linguistics techniques as well as their limitations and current performance levels when applied to linguistic research and to real-world tasks.
The course will follow the main text book used for Computational Linguistics worldwide: Speech and Language Processing - An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky and James. H. Martin (2008, Second Edition, Prentice-Hall). This book will be accessible to all those taking the paper. More specialised reading is listed in each chapter of the book. These and other relevant readings will be introduced to students during the lectures. Relevant readings are freely available on the Web (and will be downloadable as pdf documents). Additionally we will be drawing on updated topics introduced in the new draft version of Jurafsky and Martin available online at https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf. Material for these will be summarised in the lecture notes.
Aims
- To introduce the fundamental techniques of natural language processing (NLP)
- To develop an understanding of the possibilities and limitations of those techniques
- To understand the framework within which NLP continues to develop
- To gain insights into current and future applications
- To develop practical skills for solving NLP problems
Scope
- Focus on basic natural language processing techniques at the levels of morphology, syntax, semantics and pragmatics
- Focus on text (rather than speech) processing
- No prerequisite courses in computational linguistics or computer science are required. The course is an entry level course accessible to any undergraduate student in linguistics, and does not require any prior programming skills.
Proposed lecture schedule/topics to be covered:
Michaelmas Term
1.Introduction: broad overview of NLP research, language models, complexity of language applications 2. Regular expressions, text normalization and edit distance 3. Finite state techniques 4. N-gram language models 5. Naïve Bayes and sentiment classification 6. Sequence labelling for part of speech and named entities 7. Constituency grammars and treebanks 8. Constituency parsing
Lent Term
9. Compositional semantics 10. Distributional semantics 11. Neural networks and neural language models 12. Word senses and WordNet 13. Computational discourse 14. Dialogue systems and chatbots 15. Information extraction and question answering 16. Machine translation
Daniel Jurafsky and James Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2nd edn, available online: https://www.cs.colorado.edu/~martin/slp.html
The first part will provide an introduction to the course, and cover morphological and syntactic processing of language. The second part will focus on computational semantics and pragmatics, and introduce some well-known NLP applications.
There are sixteen one-hour lectures in total, eight in Michaelmas Term and eight in Lent Term. You will also have seven supervisions, normally three during Michaelmas Term, three in Lent Term and one in Easter Term. Additionally, there are six two-hour Python practical labs held through Michaelmas and Lent.
The paper's Moodle site can be found here.
Assessment will be by a combination of in person assessment and practical tasks assessment
(i) In person 3hrs assessment (80%)
(ii) Practical tasks assessment (20%): Practical (Laboratory) tasks involve submitting a Python script and explaining the answer to an examiner at a sign up session held in Week 7 of Michaelmas and Lent terms. Students will have 20mins to demonstrate and explain their answers, with a further 5 minutes provided for feedback.
Dr Nigel Collier |