Course: Using computers for Linguistics

LI0056: Using Computers for Linguistics

(Also known as COMPUTATIONAL LINGUISTICS 1 Half course,
RULE-WRITING FOR SYNTAX AND MORPHOLOGY,
and USE OF COMPUTER CORPORA)

Prerequisites for the course

No prior programming expertise is required. Familiarity with basic UNIX commands, X-windows, an editor (such as EMACS) and email will be indispensable. Competence in these basics can be acquired through (a) the department's own computer induction course early in Term I, and (b) practice.

Structure of the course

The course has a practical and a theoretical component.

Practical component

The practical work of this course is designed to familiarize linguists with some of the computational tools that are available for help in conducting linguistic research. The tools surveyed here fall into several (sub)categories:

We will spend about three weeks each on GDE, KIMMO, and corpora. For the first two of these, students will be expected to compose working sets of rules which cover interesting linguistic data (of their own choice, to be negotiated with Prof. Hurford). These rule-sets, and the accompanying commentary, will constitute part of the assessable work of the course (see below). The work on corpora will explore various available corpora and discuss how they might be used. For the section on corpora, the following book will be found useful:
McEnery, Tom, and Andrew Wilson, 1996 Corpus Linguistics, Edinburgh Textbooks in Empirical Linhuistics, Edinburgh University Press.

Theoretical component

This will consist of weekly lectures by Prof. Hurford, presenting a survey of mechanical techniques for parsing natural language sentences, in relation to the kinds of grammars specified by linguists. The survey will describe, in terms suitable to non-programmers, such concepts as bottom-up and top-down, deterministic and nondeterministic techniques, and will cover finite state machines, recursive transition networks, augmented transition networks, definite clause grammars, and chart parsers. There will also be substantial discussion of the application of these computational techniques to tasks such as speech recognition, morphological analysis, machine translation, and automatic question answering. Reading for this part of the course will be assigned from week to week.

Class Hours

To be arranged.

Assessment

Assessment will be by two small practical projects and an essay, each worth one third of the assessment for the course

  1. A set of GDE rules, for some interesting linguistic data, with accompanying commentary. This set of rules should cover about a page or two of A4 paper, and the accompanying text would appropriately also cover a page or two.

  2. A set of KIMMO rules, for some interesting linguistic data, with accompanying commentary. Likewise, this set of rules should cover about a page or two of A4 paper, and the accompanying text would appropriately also cover a page or two.

  3. An essay of about 2000 words (precise topic to be negotiated) dealing either with some aspect of corpus linguistics or with the theoretical component of the course.

The deadlines for these pieces of work will be:

  1. GDE exercise: end of Week 4.
  2. KIMMO exercise: end of Week 7.
  3. Essay: end of Term I.