Reflected Text Analysis beyond Linguistics
From September 9 to 13, I will be giving a class on Reflected Text Analysis beyond Linguistics, as part of the DGfS-CL fall school 2019 at the IMS at Stuttgart University. The class is also part of the CRETA Coaching.
This post serves as course page, containing the material, agenda etc.
Agenda
| Day | 14:00-15:30 | 16:00-17:30 | |
|---|---|---|---|
| Monday | Introduction, Overview, Annotation | ☕ | Annotation exercise, Inter-Annotator Agreement |
| Tuesday | Machine learning overview and evaluation, algorithms | ☕ | Algorithms |
| Wednesday | Introduction into shared task, hands on session | ☕ | Hands on session |
| Thursday | excursion to the German Literature Archive, Marbach (starting at 1pm!) |
||
| Friday | Hands on session, shared task evaluation | ☕ | What to do next, closing discussion |
Material
Participants are asked to install the following things on their computers (this can be done during the first day of the class)
Python
- Python: If your computer already has Python 2, there is no need to update. If you’re installing Python from scratch, please use Python 3.
- pip: The Python package manager
- The Python libraries
nltkandrequests.
Detailed instructions for Windows, Mac OS X and Linux can be found here (PDF file). The file test_install.py can be used to test the installation.
Text Editor
For editing Python files, participants will need a plain text editor. We recommend the following:
Slides
Monday
- Slides
- Example annotation guidelines: STTS tag set (German parts of speech), Penn Treebank tag set (English parts of speech)
- Texts for annotation exercise: Lewis Carroll: Alice in Wonderland, chapter 11, Jules Verne: Around the World in 80 Days, chapter 13, Mary Rowlandson: Narrative of the Captivity and Restoration of Mrs. Mary Rowlandson
Tuesday
Wednesday
- Slides on shared tasks and hackatorial
- Hackatorial package: Please download the zip file and extract it into a directory on your drive. The zip file contains
- Data with annotated entity references (sub directory
data) - Code for training, testing and uploading (sub directory
code) - Resources used for feature extraction (sub directory
static)
- Data with annotated entity references (sub directory
- List of implemented features
Friday
- Slides on Hackatorial evaluation
- Slides on what to do next
- Hackatorial results
Projects (for ECTS credit points)
If you’re interested in getting ECTS credit points for taking part in this class, you’ll need to conduct a small project, according to the following recipe (unless we agreed on a different plan):
- Pick a task (e.g., part of speech tagging)
- Pick a non-standard text that is not too long (e.g., a poem)
- Create a gold standard by applying the annotation guidelines for the task
- Apply an existing tool for the task
- Evaluate the tool against your annotations
- Either
- Develop hypotheses for improving/adapting the tool or
- Retrain the tool on existing training data and your own corpus
- Re-evaluate it after adding your own data
- Write a brief report on this and send it to me
Your project should be finished (and the report sent to me) before October 14.