Main Page
From NLPWiki
Welcome to the Natural Language Processing Lab of the Brigham Young University Computer Science Department
If you are looking for a private wiki where lab members can coordinate on unbaked projects, please use the Private NLPWiki.
Contents |
Overview
Members of the Natural Language Processing lab are working on text mining problems involving the discovery of structure and patterns in large collections of documents with little or no human intervention. We are also working on learning to annotate lesser studied languages to aid scholarship on documents written in those languages; approaches to solving this problem include probabilistic models of structure and cost-conscious active learning methods. In particular, we are using these methods to facilitate the annotation of ancient documents written in Syriac, a dying Semitic language in which many documents of early Christianity were written. We are also interested in learning new and difficult tasks from both data and expert knowledge in harmonious ways using active learning, feature engineering, bayesian models, and methods of advice-giving.
News
- Text Mining course for Winter 2010. CS 679 - Advanced Natural Language Processing The focus is Text Mining, including Text Classification, Clustering, Summarization, Topic Modeling, and Visualization.
- Bill Lund received the award for Best Student Paper at JCDL 2009, the Joint Conference on Digital Libraries, in Austin, TX. The paper, co-authored with Eric Ringger and titled "Improving Optical Character Recognition through Efficient Multiple System Alignment", was also a finalist for the best full paper at the conference.
- Eric Ringger, Robbie Haertel, and Katrin Tomanek (Jena University) organized the NAACL HLT 2009 Workshop on Active Learning for NLP
- Eric Ringger served as Publications Co-chair for NAACL 2009. Christy Doran of MITRE also served as Co-chair.
Projects
- Active Learning for Annotation:
- CCASH: Cost-Conscious Annotation Supervised by Humans
- ALFA short course: on-ramp into the project
- Syriac Corpus: Syriac morphological analysis using active learning for the construction of a labeled corpus of classical Syriac texts.
- Data-driven morphological analysis.
- Text Mining:
- Document clustering and Cluster evaluation
- Topic modeling
- Processing Noisy OCR Data
- Reducing error rates in Optical Character Recognition
- Recognizing names in noisy OCR data
Others
- Spoken Language Identification
- Sentential paraphrase
- Robbie Haertel's MayaWiki
- Pedagogical Software and Speech Technologies (PSST)
Technical Reports
- BYU NLP Lab Tech Report #2 = "Generating Paraphrases with Greater Syntactic Variation using Syntactic Phrases"
- BYU NLP Lab Tech Report #1 = "Improving Classification in Phone-Based Language Recognition with Maximum Entropy Models"
Courses
- CS 679: Winter 2010 (Jan.-April) course on Text Mining: Text Classification, Clustering, Topic Modeling, and Visualization. Course Web Page
- CS 479: Fall 2009 course on Natural Language Processing. Course Web Page
People
Faculty
- Eric Ringger, Director
- Deryle Lonsdale, Linguistics
- Kevin Seppi, Machine Learning
Students
PhD
MS
- George Busby
- Peter McClanahan
- Kevin Cook
- Paul Felt
Linguistics MA
- Marc Carmen
BS
- Josh Hansen
- Owen Merkling
- Joshua Lutes
- Jeremy Schone
Alumni
Contact
- 3346 TMCB; Computer Science Department; Brigham Young University; Provo, Utah 84602
- Map
- Phone: 801-422-7615
Resources
- Lab Meeting: weekly lab meeting.
- Paper Reading list idea : http://spreadsheets.google.com/ccc?key=r8QAzUzlc7lMm9m66Uj_XTg
- NLP Mailing List (Archive)
- Provo, Utah BYU Computer Science Department
- Subversion
- NLP Lab Subversion Commit List
- Trac
- NLP Lab Administration List
- Upcoming NLP conferences and deadlines
