On-ramp
From NLPWiki
... to the ALFA project.
Contents |
April 28, 2009
(continued on 4/30/09)
- Topic: Welcome
- Topic: Intro. to Probability Theory
- Presenter: Eric Ringger
- Reading assignment: Manning & Schuetze 2.1, 2.2, 3, 4
- Reading assignment: Russell & Norvig 14.1-14.4
- Homework: https://cswiki.cs.byu.edu/cs479/index.php/Homework_0.1
- Optional homework: https://cswiki.cs.byu.edu/cs479/index.php/Homework_0.2
May 5, 2009
- Topic: Word Sense Disambiguation as motivation for Feature Engineering
- Presenter: Eric Ringger
- Topic: Feature Engineering Console
- Presenter: Josh Hansen
- Topic: Maximum Entropy Models
- Presenter: Peter McClanahan
- Reading assignment: M&S 7, M&S 16
- Optional reading assignment: Berger's MaxEnt tutorial
- Homework: https://cswiki.cs.byu.edu/cs479/index.php/Project_2.2
- BUT: Use the Feature Engineering Console! http://nlp.cs.byu.edu/mediawiki-private/index.php/Feature_Engineering_Console (on the Private wiki -- BYU NLP only -- requires authentication)
- Write as little extra code as possible. Possible exceptions: new feature templates/extractors.
- Work with Josh Hansen if you want to improve the FEC itself.
May 12, 2009
- Topic: Active Learning
- Presenter: Robbie Haertel
- Reading assignment: Survey of Active Learning by Burr Settles
- Homework:
- Implement one active learning selection function
- Reference: http://nlp.cs.byu.edu/mediawiki/index.php/Using_the_active_learner
- Plot learning curve for chosen function, versus random, using Gnuplot or Excel
- Work with Robbie Haertel to bring the plotting code back to life
May 19, 2009
- Topic: Sequence Labeling
- Presenter: George Busby
- Reading assignment: M&S 9, M&S 10
- Reading assignment: Paper by Toutanova & Manning on MEMMs
- Optional reading assignment: Paper on TnT by Brants
- Homework:
- Continue on Active Learning experiments
- Focus on PNP classification task
- Plot means of multiple (around 5) random runs over # of iterations.
- Would be interesting to plot variances of multiple random runs over # of iterations.
- Try POS tagging task with small batch size B and number of iterations N, such that N x B is approx. 300
May 26, 2009
- Topic: Intro. to StatNLP Code-base
- Presenter: Robbie Haertel
- Reading assignment: our ALFA Paper at the LAW
- Reading assignment: Tomanek et al., "An Approach to Text Corpus Construction which Cuts Annotation Costs and Maintains Reusability of Annotated Data"
- Homework: https://cswiki.cs.byu.edu/cs479/index.php/Project_3.1
May 31 - June 6, 2009
June 9, 2009
- Topic: Named Entity Recognition
- Presenter: Eric Ringger
- Reading assignment: Klein et al. paper from 2003: "Named Entity Recognition with Character-Level Models"
- Reading assignment: McCallum et al. paper in CoNLL 2003: "Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons"
- Reading assignment: Ratinov and Roth paper in CoNLL 2009: "Design Challenges and Misconceptions in Named Entity Recognition"
- Homework:
- Data: CoNLL 2003 Named Entity shared task data set: http://www.cnts.ua.ac.be/conll2003/ner/
- Baseline: dictionary look-up method on CoNLL named entity recognition shared task data
- dictionary is simply the list of named entities extracted from the training set
- Baseline: MEMM for Named Entity Recognition on the CoNLL data
- Improve on this by doing error analysis and feature engineering, as you did for the POS tagging task
- Run both methods (dictionary look-up and MEMM) on noisy OCR data
- Coordinate with Thomas Packer for noisy OCR data (esp. the labeled dev test set)
- Private wiki site for the noisy OCR data: http://nlp.cs.byu.edu/mediawiki-private/index.php/Ancestry_dot_Com
- Pick one 3rd-party tool (distinct from other students) from the list of open source tools on Wikipedia: http://en.wikipedia.org/wiki/Named_entity_recognition
- Prefer one of the following:
- Stanford Named Entity tagger
- CCG group at UIUC: Named Entity + semantic role-labeling tagger
- Mallet from U. Mass. Amherst
- Prefer one of the following:
- Run 3rd-party tool on CoNLL data and noisy OCR data
- Report results
June 16, 2009
- Topic: User Study and Regression Results
- Presenter: Kevin Seppi
- Reading assignment: our LREC Paper
- Reading assignment: Shilpa Arora et al.'s paper at ALNLP 2009
- Homework:
- Repro. regression results from LREC paper using R
- Apply methods from Arora's study to our data using SVM Regression
- over-all cost model
- per-subject cost model
- per-subject-type cost model
