CS601R Main Page

From NLPWiki

Jump to: navigation, search

Contents

CS 601R, Section 001, Winter 2008

Advanced Natural Language Processing: Text Classification, Text Clustering, and Topic Identification

Description and Objectives

Welcome to Advanced NLP! A conceivable and reasonable alternative title for the course is "Text Mining". The field of text mining has attracted significant interest in recent years as enormous collections of text data have become available across the web, behind firewalls on corporate networks, and on our own PCs. One side of the problem is information retrieval, epitomized by web search. Another side of the problem is the selective extraction of structured nuggets of information from unstructured text. This course focuses on a third aspect of the problem: exploratory data analysis in large collections of text, with particular emphasis on techniques for text classification, text clustering, and topic identification.


The learning objectives for the course are as follows:

  • acquire experience conducting exploratory data analysis on large collections of text
  • gain in-depth experience with and understanding of approaches to document classification, feature engineering, feature selection, sentiment classification, document clustering, and unsupervised topic identification
  • build a foundation of techniques for approximate Bayesian reasoning for unsupervised text analysis
  • obtain experience with techniques for evaluating the results of unsupervised learning processes

In addition to learning the concepts and techniques of statistical NLP, this course aims to help the student build real tools, to prepare for careers in the field, and to jump into NLP research.


Course Links

Instructor: Dr. Eric Ringger

Lecture location: 241 MSRB

Lecture time: MWF 9:00-9:50pm

Weekly Schedule (including instructor and TA hours)

Syllabus

Schedule

Text: Research papers and a selected chapter or two from related texts -- see the Schedule

Announcements: See the BYU BlackBoard page for this course. Please check for announcements regularly.

Grades: On BYU BlackBoard


Project Guidelines


How To: Technical Details

Use the links in this section to get up and running with the programming assignments. Check back often, as inactive links will shortly lead to useful content.

How to prepare your system

Get a copy of the code

Running lab code

Create a classifier

Run code on the supercomputer

Using command-line parameters

Add a command-line parameter

Train/passoff multiple models


About Data

Data splits/organization

The Data Sets


About Code

Attribution

Organization

Documentation


Questions and Answers

If you have a question, check the FAQ first, in case it has already been asked by another student and answered:

FAQ


Old List of Suggested Clustering Algorithms

Clustering Algorithms

Personal tools