LING 073 / CPSC 013 — Spring 2021
Computational Linguistics

Professor:Jonathan North Washington
Office:Pearson 105
Office phone:x6134
Office hours:M 13:30-15:00
& by appointment
message me on Google Chat/Hangouts for Zoom link
Meeting time:TTh 14:00-15:15
Alternative lab hours: T 21:45-23:00, F 10:40-11:55
Meeting modality:Online synchronous
Classroom:Gather (see Moodle for meeting URL)
Course website:
Course wiki:
IRC channel:
Course Piazza site: LING 073
Course Moodle site: LING073-01-S21


This course is designed to give students an understanding of the main concepts in the field of Computational Linguistics (as distinguished from Natural Language Processing), and impart the skills needed to solve the types of problems encountered in this field. Here's the official description:

This course explores the possibilities for creating computational resources for languages for which vast collections of text don't exist. Students will choose a language lacking in computational resources and develop tools for it. The focus will be on creating nuanced symbolic representations of the language that can be employed by computers, to the benefit of both language researchers who wish to test grammatical models, and language communities which lack the social capital to benefit from corporately developed resources. Topics covered include input methods and spell-checking, morphological analysis and disambiguation, syntactic parsing, building corpora, and rule-based machine translation, with an emphasis on open source technologies.
Prerequisites: LING 001 (or equivalent), or CPSC 021 (or equivalent), or permission of the instructor.

The primary goal of the course is for students learn to develop computational models for languages lacking in existing resources. Additionally, students will:

Towards these goals, and related to particular course activities, students will

The general structure of the course will be centred around student projects. At the beginning of the course, each student will choose an under-resourced language to work on (in consultation with the professor), and will spend the semester developing materials for the language as lab assignments. In general, we will spend two days on each topic (Tuesday and Thursday), where the first day (Tuesday) will be more focussed on discussing the topic (overview, general issues and solutions, etc.) and the second day (Thursday) will be a lab day dedicated to guided lab work on the problem. The week's assignment will generally be due after the lab day (by the end of the day Friday), so the lab day provides an opportunity to get started on the lab and get assistance from the professor or course assistant on difficult areas.

This course has a prerequisite of LING 001 (or equivalent), or CPSC 021 (or equivalent), or the permission of the instructor. Any background beyond the introductory level in either computer science or linguistics (or both!) will give students an advantage, but nothing beyond a previous intro to at least one of them is necessary. All required skills will be imparted throughout the course. There will be no conventional programming required of students, but we will be using command-line tools and several different types of declarative syntax. No previous knowledge of linguistics is required for students with CS background, but a focus of the course will be coming to understand linguistic phenomena by implementing models of them computationally. The challenge of the course is less about learning the computational formalisms or understanding the patterns in the languages from a formal linguistics perspective (though skills in both fields will be strengthened by the course), and more in learning to use the formalisms to implement models of the linguistic patterns computationally. It is expected that some students will grasp these different aspects of the course with different levels of ease, which provides a great opportunity for students to share knowledge and skills with one another.

Required Materials

No textbook is required, but you will need to have access to the following resources:

We'll be using Gather (listed above, and linked to from the Moodle course) for our meetings.

We will be using Swarthmore GitHub ( for a number of purposes. Most assignments will be submitted by a script that automatically clones the relevant repo as class begins.

Also, you'll need to be able to access Moodle ( Some materials we use for the course will be available there (readings, etc.), as will your grades, so make sure you can access it as soon as possible. If you have any trouble with it, notify me as soon as you can. Non-Swarthmore TriCo students may not have access to Moodle immediately at the beginning of the semester—let me know if this is the case for you, and I will make sure you have access to resources in some other way.

We'll be using Piazza (, listed above, and linked to from the Moodle course) for responding to readings. You can also use it as a way to ask questions about assignments and other course content.

The course website (listed above, and linked to from the Moodle course) contains the schedule for the semester, which will be updated regularly with links to various resources. It's recommended to check the website at least a couple times per a week. I will make announcements about any major changes.

The course wiki ( will be where we organise the resources developed in the course. It is there not just for your professor and classmates, but for anyone in the world to access, and so may end up attracting the attention of speakers of the languages or other people interested in them. It will also be a model for future students of this course to look to.

Office Hours

I hold regular office hours (listed above), and can be available at other times by appointment—just send me an e-mail letting me know when you might prefer to meet.

If you are having any trouble with class, such as with understanding a concept or completing an assignment, please don't hesitate to ask me for help. I'm here to help you learn, so I encourage you to take advantage of my availability.

But even if you're not having trouble, it never hurts to come to office hours from time to time. We can discuss course content, ideas for the final project, or whatever's on your mind in a more relaxed atmosphere.

Lab hours

You are required to attend at least one lab session, either at the scheduled time (usually a Thursday) or one of the alternative times (listed above). The course schedule says "LAB" for the lab days. Please let me know whether you plan to attend that meeting or one of the other times.


Course etiquette

Show up on time and silence cell phones. Food and drinks are generally not allowed in lab, per the policies for the room. However, I don't mind as long as you don't damage the equipment or disturb your classmates. If you need to step out of the class for any reason (bathroom, emergency phone call, etc.), please do so with minimum disruption (i.e., don't ask for permission).

Due to the nature of the course, we will be using computers in almost every class. This brings about the potential for a number of distractions, so please use the computer only for relevant classroom activities. In other words, please refrain from any sort of non-class-related activities, including messaging (e-mail, social media, etc.), homework for other courses, or even catching up on course reading. Even the best multitaskers are still not participating fully when they're engaging in unrelated endeavours. If it's too difficult to avoid the temptation of these other distractions, you may try strategies like disabling the computer's internet connection, using a filter for web usage, or similar.

Note on pronouns: if you'd like to be referred to by a pronoun that you think I might not guess correctly or if you notice me referring to you by some other pronoun than what you'd prefer, please let me know so that I can get it right.

Class material

All material covered during course-related activities—including assigned readings, quizzes, and labs—should be assumed to be required course content, and will be assumed background for later activities. It is each student's responsibility to attend all classes to learn the material covered. If you must miss a class (e.g., for an athletic or religious reason), it is courteous to notify your professor ahead of time if at all possible, but it will be your responsibility to learn about missed material from classmates. It is not my responsibility to make up for your absence or re-teach the material. (That said, let me know if you're having trouble making something up, and we'll figure something out.) With so few class meetings dedicated to each topic and the cumulative nature of the topics, missing one day can be a very big deal—so I really recommend trying not to miss class.

The assigned readings are to be read in advance of the class dates they're assigned for. The readings complement in-class activities and provide the necessary background; however, you should not assume that they will be fully summarized or reviewed in class. Students should be prepared to evaluate, integrate, or respond to the readings in class discussions.

Any excuse for missing any course-related activities will need to be handled by your class dean. Please see the Medical Excuse Policy (, and remember to contact your class dean as soon as you can so that they can work with you.

Turning in assignments on time

Assignments will generally be due at the beginning of class on Thursdays. Work on the assignment must be complete in order to move on to the next topic, so it is essential that assignments be submitted on time.

You will submit assignments almost exclusively on github and the course wiki (each assignment will say explicitly how to submit it), both of which keep timestamps. These two methods also both allow for incremental submissions, so you may often commit and push (github) and save your work (wiki) as you work on it. This means both that I can see exactly what was there at the deadline, but also that partial work may be there as of the deadline.

Any work submitted between the deadline and when the assignments are graded (usually not before the next day) will receive only half credit—e.g., if you submit about 75% of the assignment before the deadline and 100% of the assignment is there when it is graded, you can at most receive 87.5% on the assignment.

Academic Integrity

Using words or ideas from another source without attribution constitutes plagiarism, and misrepresenting another student's work as your own (or allowing another student to misrepresent your work as their own) is cheating. Please see the student handbook for the College's policies on academic misconduct ( Suspected cases of academic misconduct will be pursued to the full extent of College policy, including referral to the College Judicial Committee.

You are always expected to do your own work on assignments. You may (and are encouraged to) ask one another for and provide one another with assistance on assignments. If you are providing assistance, you must not provide the solution—you may only provide guidance that will help the other student(s) find the solution on their own. If the work in this course were a real-world FOSS project, providing the solution would be okay (and perhaps even encouraged), but the requirement that each student be evaluated on their own work is incompatible with this model (at least on the surface).

With every assignment you should include an AUTHORS file in the top directory, just like you might find in an open source project. If you receive assistance on any assignment from anywhere (a classmate, a website, a native speaker of the language, a stranger on the internet, etc.), please acknowledge them in the AUTHORS file.

In some instances you may work with your classmates. For lab assignments where you are working in a group or with a partner, you may divide the work as appropriate, within the parameters of the assignment. Ideally this means team-coding, where each person takes turns at the keyboard, with alternation also between who's committing the code. You may also discuss generalities of lab assignments with your classmets, such as what is expected from you. And of course, any discussion of course materials is strongly encouraged.

In short, submitting work that is not your own or providing a classmate with a solution will be considered academic misconduct and will be addressed as such (see above-mentioned policies). So please just be honest. And if you have any questions about what's considered acceptable, ask me first.

Online teaching considerations

Engagement. I expect everyone to engage with the course (see "Engagement" below), but I recognise that engagement will look different for every student. This is no different from how I conduct teaching in person—only the range of what engagement will look like is different—but it's worth mentioning explicitly here. See also the section on course etiquette below.

Social time. The course meeting platform should allow students to join any time, even when I am not present. encourage everyone to arrive a few minutes early if possible to just hang out and get to know your classmates better. Furthermore, the meeting is recurring (meaning we'll use the same link for all our meetings this semester), so you may use the meeting any time outside of class time as well, e.g. to discuss assignments with one another or similar. At the end of each class, I will also wait to leave the meeting until everyone else has left. This is to encourage you to stick around if there's anything you'd like to talk about.

Communication during class. The Gather chat will be available for use during class. We may leverage it for certain purposes, but I don't expect most of our interactions to take place through that modality. If you'd like to speak and can't find a moment to interject into whatever is going on, please raise your hand physically so that I can see in the video, or use the Gather hand-raising feature. I may not notice the latter, so you may also simply interrupt if needed. There will probably be a lot of awkwardness around these issues, and that's okay.

Privacy: cameras. No one is required to turn on their camera if they don't want to. I do hope that most of you will become comfortable turning on your cameras in most environments, and encourage you to build your comfort with this if you haven't yet. It is especially okay to disable your camera if the class is being recorded.

Feedback. There's a form on Moodle for anonymous feedback on the course. I encourage everyone to periodically consider what's going well and what they would like to see changed about the course and let me know via the form.

Office hours. To join regular office hours, please first message me at my Swarthmore email address on Google Hangouts or Google Chat, and I'll hop into the virtual classroom. Consider this the equivalent of knocking on my office door. I won't be sitting on Gather waiting for students to join the meeting, but I will be available and will get a notification if you message me on one of those services. We can also conduct the entire conversation through chat if you'd like. This is an option outside of regular office hours too, though I cannot guarantee an immediate response at other times. And as always, if regular office hours are not convenient, I'm happy to schedule a meeting at another time—just send me a message (by email or one of the chat services) and let me know what might be convenient for you.


Most of your assignments will be graded with a fine-grained measure of completion and correctness based on normal letters grades and grade points (A = 4.0, B = 3.0, C = 2.0, D = 1.0, and F = 0.0), with the standard modifiers + (one-third of a grade point higher) and - (one-third of a grade point lower). In addition, intermediate grades using parentheses or a slash may be used, giving the following correspondence between letter grade and grade points:


What you will be graded on for each assignment varies in specifics, but generally it will include mastery of the relevant linguistic generalisations and interaction with the language and existing resources, mastery of the computational formalism(s), evaluation of what you did, organisation and cleanliness of your code, documentation (both on the wiki and e.g. in your README), and completion of the assignment.

Course Grade Components

The grade in this course is broken down into the following components. Each component is expounded upon following the table.
Lab assignments:70%
Midterm demonstrations:5%
Final project:10%

Lab assignments (70%)

Lab assignments will be due nearly every week of class. Each assignment will be a new tool (or analysis) for the language you are working with throughout the course.

Usually at least one class session will be dedicated to working on the assignment, so you can get a head start on it, and work through any problems that might come up during the assignment.

Some labs may not be entirely applicable to all languages; these labs will include an alternate option, with data for another language provided. You may only submit this alternate assignment for credit if you've consulted with the professor first. It's your responsibility to start each assignment early enough to consult with the professor in time to do the alternative assignment if your language will not work for the assignment. Such assignments will make it clear what's necessary for you to identify in the language, and the professor is available to help you figure out how your language fits the requirements.

Midterm demonstrations (5%)

Your midterm demonstration will be a short presentation clearly outlining what you have developed so far in the semester, how well it performs, and some examples of one or two issues unique to your language that you find particularly interesting (whether solved yet or not). The amount of time available for this presentation will be announced ahead of time and will depend on how many students are in the class—it will probably be very short (on the order of a couple of minutes). You'll be expected to use the time efficiently and not go over. A short question-answer section may also be included.

Final project (10%)

The final project will expand your work throughout the semester into one final domain, to be chosen from among the topics discussed over the last days of class, or another topic relevant to the course.

You should consider ahead of time what you might be interested in—that may be either interesting to work on or useful for a language community—and speak with the professor about how to approach the problem. Several options will be provided which include some guidelines for how to complete them; there will be some options both for those who are less technologically adventurous but are willing to do difficult work with a language and for those who are more technologically capable but not as interested in doing linguistic analyses. If you're not sure what might be a good idea given your background and strengths in the course up to that point, please talk with the professor.

If the project involves a translation pair, then you may collaborate in groups as you did when working on translation pairs for lab assignments. Your project should include, among other things, an evaluation component (i.e., test how well what you did works), and should be released publicly with an open source license (even if not fully useful [yet]). During our final exam time, you will give a poster presentation on your project. The content of the project and the presentation will each constitute half of your final project grade. More information on the project will be provided later in the semester.

Engagement (15%)

I do not grade on attendance, but you will be graded on engagement in the class, and this requires attendance. Beyond simply showing up and participating, you're encouraged to contribute to discussions by asking questions, answering prompts, making relevant comments, working with classmates on in-class activities, etc. You will not be ridiculed for asking even simple questions—I want to make sure everyone grasps the concepts, and many are not as straightforward as they may first seem (or as I think they are).

You are also expected to come to class prepared for discussion; this includes having completed readings and assignments due by class.

You are encouraged to engage in relevant discussion electronically as well—e.g., via Piazza and in issues posted on GitHub. The course will also have an IRC channel, which you're encouraged to be logged into when you can. This is a good way to get support from your classmates (and your professor!) outside of class. Just be sure not to share solutions to assignments!

Assigned readings for a class period are included in engagement. You should read the assigned reading, respond to the prompt on Piazza at least one day (24 hours) before the class meeting, and respond to two classmates' responses (with an attempt to make sure everyone's response is responded to at least once) by the class meeting.


If you believe you need accommodations for a disability or a chronic medical condition, please contact Student Disability Services via email at to arrange an appointment to discuss your needs. As appropriate, the office will issue students with documented disabilities or medical conditions a formal Accommodations Letter. Since accommodations require early planning and are not retroactive, please contact Student Disability Services as soon as possible. For details about the accommodations process, visit the Student Disability Services website ( You are also welcome to contact me privately to discuss your academic needs. However, all disability-related accommodations must be arranged, in advance, through Student Disability Services.

Schedule (subject to adjustment)

weekdatetopicdue / to read
(by class)
1 11 Feb

Introductions, syllabus

What (and why) is CL (and NLP)?

Environment setup

2 16 Feb

Linguistic communities

Models of development, FOSS

Resource identification

Corpus assembly

Long (2007) - Chilean Mapuches in language row with Microsoft

language selection

18 Feb


3 23 Feb

Input methods

Lebedev (2004) - Where once was a comma

lab 1 - documentation of resources + Initial corpus assembly

25 Feb


lab 2 - keyboard layout (due on Friday)

4 2 Mar

Morphological typology

Grammar documentation

Bird (2009) - Natural Language Processing and Linguistic Fieldwork

4 Mar


lab 3 - Grammar documentation (due on Friday)

5 9 Mar

FSTs and morphology

Analyser evaluation

Kornai (2013) - Digital Language Death

11 Mar


lab 4 - Basic morphological analyser (due on Friday)

6 16 Mar

FSTs and phonology

Generator evaluation

Pedersen (2008) - Empiricism Is Not A Matter of Faith

18 Mar


lab 5 - Basic morphological generator (due on Friday)

7 23 Mar

Morphological disambiguation

Manual disambiguation

Disambiguator evaluation

Moshagen & Trosterud (2019) - Rich Morphology, No Corpus – And We Still Made It. The Sámi Experience

25 Mar

Spring break!

8 30 Mar

Other applications

Corpus-based approaches

lab 6 - Basic CG disambiguator

1 Apr

midterm project demos

9 6 Apr

Machine translation

Lexical transfer

Forcada et al. (2011) - Apertium: a free/open-source platform for rule-based machine translation

8 Apr


lab 7 - Lexical transfer (due on Friday)

10 13 Apr

Lexical selection

Mahelona (2020) - Te reo Māori Speech Recognition: A Story of Community, Trust, and Sovereignty

15 Apr


lab 8 - Lexical selection (due on Friday)

11 20 Apr

Contrastive grammars

Romero (2016) - Bill Gates speaks Kʼicheeʼ! The corporatization of linguistic revitalization in Guatemala

22 Apr


lab 9 - Contrastive grammar (due on Friday)

12 27 Apr

Structural transfer

Bird (2020) - Decolonising Speech and Language Technology

29 Apr


lab 10 - Structural transfer (due on Friday)

13 4 May

RBMT evaluation

Janhunen & Gruzdeva (2016) - Bringing the orthography of an indigenous language to the digital age: The case of Nivkh in the Russian Far East

6 May


lab 11 - Polished basic RBMT system (due on Friday)

14 20 May

(19:00–22:00 UTC-4) Final project presentation