CS601R:Project 3 Guidelines

From NLPWiki

Jump to: navigation, search

Back to CS601R Main Page

Contents

Project #3: Document Sentiment Classification

Deadlines

  • Early: 2/20/07
  • Due Date: 2/22/07

Objectives

This assignment is designed to:

  • give another perspective on document classification through sentiment rather than topic
  • provide hands-on experience with specific feature selection methods
  • provide additional experience with classification using Maximum Entropy classifiers, Support Vector Machines, or another classifier of your choice that you think might be competitive and is amenable to feature engineering and feature selection
  • engage in the feature engineering process to identify features that are most useful for high accuracy sentiment classification (at least for our movie review data set)
  • give you a chance to compete for honor and glory in the Project #3 Hall of Fame

Setup

1. You should have a working version of the class codebase as a result of your work on the earlier projects. If necessary, consult the following directions to get going:

How to prepare your system

Get a copy of the code

2. You will also need a copy of the Movie Review data set for this assignment. Retrieve the latest copy of the data from the following URL into a directory adjacent to your code, and follow the directions given above to unpack the data directory.

Data

3. For this lab, you have several options. For classification, you may either use (a) a classifier implemented in our class codebase (e.g., MaxEnt) or (b) a classifier implemented outside of our class codebase that consumes feature vectors produced by our codebase (e.g., libsvm for SVMs as in Project #2). Note that Naïve Bayes is only an option in a limited setting; see below for more details.

(a) If you choose the first option, then you can work entirely in our codebase. One build target should build the codebase, read the data, extract features, train your classifier on the training set, and run this simple classifier on the specified test set (e.g., “dev”). Out of the box, the target will simply train a MaxEnt classifier. If successful, you will see accuracy figures and a confusion matrix; this will serve as your baseline.

ant lab3 -DDATA=<path to MovieReviews> -DSPLIT=<path to MovieReviws>/indices -DTEST=dev

(b) If you choose the second option, then you will need to export feature vectors as was necessary in Project #2. One command on the command-line should build the codebase, read the data, extract features, and serialize the labeled feature vectors for subsequent use by a separate machine learning tool, such as libsvm.

Background

In this project, you are working with positive, negative, and neutral movie reviews. Your goal is to do as well as possible on sentiment detection (framed as a classification problem). You have the option to use maximum entropy classifiers, support vector machines, or another classifier of your choice that you think might be competitive and is amenable to feature engineering and feature selection (e.g., a hybrid decision tree / logistic regression implementation in Weka). Furthermore, you will implement a feature selection technique. You have the option of implementing any feature selection algorithm which we have discussed and which is not already implemented in the course codebase, according to your preference. Options include:

  • any of the pre-processing feature selection mechanisms, such as
    • distributional mutual information
    • chi-squared
    • term-frequency
    • document frequency
    • some combination of term frequency and document frequency
    • ...
  • distributional word clustering à la Baker and McCallum (Note: if you choose to use this feature selection mechanism, then you may use Naïve Bayes as your classifier implementation; otherwise, Naïve Bayes is not an option)
  • feature selection in the learning loop à la Berger, Della Pietra, and Della Pietra. You can adapt the idea behind their approach for many learners, or you may dig into the MaxEnt implementation and do precisely as they suggest.


Since you have better classification tools at your disposal, and since you have more data, we should probably expect to match or exceed the Pang, Lee, and Vaithyanathan results.


Note that in addition to implementing and experimenting with your own feature selection (or dimensionality reduction) technique, you may also compare your work with results using the other existing feature selection techniques.


In the course codebase, we provide a Reader for the data to read in the “split” of the movie reviews data set, including both the training and development test sets. As discussed in class, the training set is for training models, data inspection, and for feature engineering; the development test set is for error analysis and evaluation. Once you are satisfied with your results on the development test set, you will evaluate your models on the blind test set. We trust you to not even look at the blind test set prior to your final evaluation.


As in previous projects, you’re engaging in supervised learning, so all of the data available for training is labeled. The Reader creates a Collection of LabeledDatum objects from the provided review files. Each LabeledDatum object represents a single movie review. The features of these Datums are the ordered lists of tokens comprising the message. Each tokens is one of the following: a contiguous sequence of alphabetic characters, two contiguous sequences of alphabetical characters separated by a hyphen, a single punctuation character, a contiguous sequence of alphabetical characters and asterisks (which represent the stars in the ratings scales of some reviews), or a contiguous sequence of numeric digits. Consequently, in this data-set, punctuation appears in your basic feature set. This may be useful in establishing the scope of certain words (e.g., “not”). Original text case (upper and lower) is also preserved.


As in Project #2, you may want to spend time with the Reader (edu.byu.cs.nlp.fileio.MovieReviewParser) and the FeatureTransformer(s) in the course codebase in order to extract additional desired features. If you are working with libsvm, you will need to export feature vectors for subsequent processing outside of this codebase.

Feature Engineering

In order to achieve noteworthy accuracies with sentiment classification, you will need to engage in some sort of feature engineering process. During our class discussion, you came up with several ideas worth pursuing. Consult the online copy of the lecture notes for the list we constructed together. You may consider using n-grams, part of speech taggers, phrase chunkers, sentence breakers, whole sentence parsers, or anything else you have at your disposal. If these tools are foreign to you but you would like to use them, then you may want to consult with the instructor or TA for pointers to available code. As always, you are also encouraged to confer, brainstorm, and share ideas with classmates in order to come up with ideas and derive a feature set that captures your ideas. The highest accuracy will win a title in the hall of fame.


Report

In addition to documenting your approaches to classification and feature selection, you also need to document the feature engineering process in your report. Furthermore, address the following questions:

  • What features did you add and why?
  • How did the feature engineering process improve accuracy on this dataset?
  • What features did you attempt to deal with the subtleties of sentiment?


Turn in a clear, well-structured report that discusses your implementations, describes your experiments with appropriate tables or graphs and accompanying interpretation, and addresses the above questions. There is no set length requirement, but I estimate a ball-park of 4 pages for the report. Your report will be graded based on the rubric presented on the following page.


Rubric

Project #3:  Document Sentiment Classification

Name ______________________

Date _______________________


100 points total:

______ of 30	Discussion and experimental results involving your approaches to classification and feature selection.

______ of 30	Discussion and experimental results from the feature engineering process.
	
______ of 10	Presentation and interpretation of classification results on the development test set

______ of 10	Discussion of questions

______ of 10	Presentation and interpretation of classification results on the blind test set.

______ of 10    Clear writing


Total:
	______ of 100


Other Feedback:








Notes:

______ Early credit earned on this project

______ Late days used on this project

______ Total late days used as of the grading of this project

Personal tools