Running lab code

From NLPWiki

Jump to: navigation, search

For each of the first 3 labs, there is currently at least 2 targets in the included build.xml file. One of these always trains the classifier(s) for that lab and then evaluates the trained classifier against the development test set. After that, it serializes the trained classifier out to the models directory. This target is called labX, where X is the assignment number for the classifier you are currently working on. This target takes two parameters DATA and SPLIT. For example, to run this target for Lab 1, on the command line you would execute:

ant lab1 -DDATA=path_to_20_newsgroups -DSPLIT=path_to_20_newsgroups/indices/[reduced|full]_set

The other target is a test target and is used to perform only evaluation on a pre-trained, serialized classifer first. This target is named the same as the training target, with the word "Test" appended to the end. It requires that you specify whether the development or blind test set should be used for evaluation. For example, assuming that you have already executed the command listed above (which actually trains, evaluates and serializes 2 different classifiers), then you may execute the test target as follows:

ant lab1Test -DDATA=path_to_20_newsgroups -DSPLIT=path_to_newsgroupsCS601R/indices/[reduced|full]_set -DTEST=dev

You will substitute blind for dev for your final run before you begin writing your report. This will give a fair evaluation of your code, as you will not be iterating to improve performance on the blind set. You should never run against the blind test set while you are still debugging/tuning your code.

Personal tools