Main Page
From NLPWiki
Welcome to the Natural Language Processing Lab of the Brigham Young University Computer Science Department
If you are looking for a private wiki where lab members can coordinate on unbaked projects, please use the Private NLPWiki.
Contents |
Overview
Members of the Natural Language Processing lab are working on text mining problems involving the discovery of structure and patterns in large collections of documents with little or no human intervention. We are also working on learning to annotate lesser studied languages to aid scholarship on documents written in those languages; approaches to solving this problem include probabilistic models of structure and cost-conscious active learning methods. In particular, we are using these methods to facilitate the annotation of ancient documents written in Syriac, a dying Semitic language in which many documents of early Christianity were written. We are also interested in learning new and difficult tasks from both data and expert knowledge in harmonious ways using active learning, feature engineering, bayesian models, and methods of advice-giving.
News
- Bill Lund received the award for Best Student Paper at JCDL 2009, the Joint Conference on Digital Libraries, in Austin, TX. The paper, co-authored with Eric Ringger and titled "Improving Optical Character Recognition through Efficient Multiple System Alignment", was also a finalist for the best full paper at the conference.
- Eric Ringger, Robbie Haertel, and Katrin Tomanek (Jena University) organized the NAACL HLT 2009 Workshop on Active Learning for NLP!
- Eric Ringger served as Publications Co-chair for NAACL 2009. Christy Doran of MITRE is also Co-chair.
- Advanced NLP Course on Text Mining for Winter 2009. CS 601R, sec. 004 - Advanced Natural Language Processing: Text Mining The focus is Text Mining, including Text Classification, Clustering, Summarization, Topic Modeling, and Visualization.
Projects
- Active Learning for Annotation:
- CCASH: Cost-Conscious Annotation Supervised by Humans
- ALFA short course: on-ramp into the project
- Syriac Corpus: Syriac morphological analysis using active learning for the construction of a labeled corpus of classical Syriac texts.
- Data-driven morphological analysis.
- Text Mining:
- Reducing error rates in Optical Character Recognition
- Document clustering and Cluster evaluation
- Topic modeling
Others
- Projects:Language Identification: Spoken Language Identification.
- Projects:Paraphrase: Sentential paraphrase.
- Projects:Intelligent Newsreader: Intelligent newsreader, including keyword extraction.
- Projects:MayaWiki: Robbie Haertel's MayaWiki.
- Pedagogical Software and Speech Technologies (PSST)
Technical Reports
- BYU NLP Lab Tech Report #2 = "Generating Paraphrases with Greater Syntactic Variation using Syntactic Phrases"
- BYU NLP Lab Tech Report #1 = "Improving Classification in Phone-Based Language Recognition with Maximum Entropy Models"
Courses
- CS601R Winter 2009 course on Text Mining: Text Classification, Clustering, Topic Modeling, and Visualization. Course Web Page
- CS601R Fall 2008 course on Topics in Statistical Machine Learning. (Actually a 700-level readings and research course)
- CS401R Fall 2008 course on Statistical Natural Language Processing. Course Web Page CS401R Wiki
- This course will be offered again Fall 2009 (September-December 2009)
People
Faculty
- Eric Ringger, Director
- Deryle Lonsdale, Linguistics
- Kevin Seppi, Machine Learning
Students
- PhD
- Robbie Haertel
- Dan Walker
- Bill Lund
- MS
- George Busby
- Peter McClanahan
- Aaron Davis
- Kevin Cook
- Paul Felt
- Linguistics MA
- Marc Carmen
- BS
- Josh Hansen
- Jeremy Sandberg
- Owen Merkling
- Warren Lemmon
- Joshua Lutes
- Jeremy Schone
Alumni
- Irene Langkilde Geary
- Michael Goulding - Microsoft
- Rebecca Madsen
- Dan Su
- Thomas Packer - BYU CS Data Engineering Lab
- Mark Gulbrandsen - Amazon.com
- Rob Van Dam
- Nathan Ekstrom
- Scott Chun
- Kalli Hansen
- Adam Teichert - University of Utah
- Xingfu Wang - Chongqing University, China
- Brandon Carroll
Contact
- 3346 TMCB; Computer Science Department; Brigham Young University; Provo, Utah 84602
- Map
- Phone: 801-422-7615
Resources
- Lab Meeting: weekly lab meeting.
- Paper Reading list idea : http://spreadsheets.google.com/ccc?key=r8QAzUzlc7lMm9m66Uj_XTg
- Reading Group: weekly Reading Group focusing on Bayesian approaches to NLP.
- NLP Mailing List (Archive)
- Provo, Utah BYU Computer Science Department
- Subversion
- NLP Lab Subversion Commit List
- Trac
- NLP Lab Administration List
- Upcoming NLP conferences and deadlines
Tools
List of tools installed on NLP Lab server
