Projects:ALFA

From NLPWiki

Jump to: navigation, search

Contents

About the project

Annotated corpora have proven useful in many applications in Natural Language Processing and in the Humanities. At the moment, the ALFA (Active Learning for Annotation) project focuses on annotation with parts of speech. Manually labeling each word in a text with a part of speech (POS) tag can be expensive and tedious. Alternatively, we can train a tagging model that performs with high accuracy if we have enough labeled data from which to train the model. However, what should we do if we have insufficient annotated data from which to train the machine annotator? Members of the ALFA project are implementing a system that relies on minimal amounts of hand-annotated data provided by human annotators in the framework of active learning. Active learning invites human annotators to help the system improve its annotation ability by labeling data deemed especially useful by the system. The easy cases are typically left to the machine. Furthermore, the learned model improves substantially with additional examples.

Part of this project involves determining efficient ways of asking a human annotator for the correct tags. The ALFA user study addresses this question, and we need your help.

Possible applications

POS annotated data is used for many tasks including machine translation, natural language parsing, named entity recognition, construction of natural language concordances, and many other applications.

Want to help?

We are currently annotating a corpus of English news articles. You can help by taking the time to annotate a set of sentences. You will be presented with one sentence at a time and will be asked to either annotate a single word or the entire sentence. We are ready for user help. We expect that the average participant will spend less than an hour on the task. Thank you in advance for participating!

Begin the study now!

Get Updates

For updates on the status of the study and results from the study, please subscribe to the Google Group.

Questions?

Please contact Eric Ringger or visit the Natural Language Processing research lab in room 3346 TMCB.

Personal tools