How to prepare your system

From NLPWiki

Jump to: navigation, search

To complete the programming assignments for this course, you will want to make sure you have the following software installed and configured properly.

Contents

Java 5 JDK (or higher)

The support code and framework used in the course is all in Java, and so it is recommended that projects also be completed in Java, in order to exploit this resource and be able to receive some amount of technical support from the TA. It would technically be possible (although potentially much more difficult) to complete some of the labs in another language as long as the implemented solution can be tested in the same manner as a Java solution (using the the ant commands explained in the passoff section). If you feel strongly about using another language, please clear it with Dr. Ringger first, so that you can be apprised of the potential difficulties that this would entail. The 5th and 6th labs must be written in Java, however, using the existing framework, as your implementations may be adapted for inclusion in future versions of the lab's NLP research.

The code framework uses Java 1.5 features such as generics. And therefore requires an installation of a Java JDK of version 1.5 or higher. You may download a copy of the Java 5 JDK from this page, or the latest Java 6 JDK here.

Ant

Ant is an open source build tool similar to make produced by the Apache Software foundation. It is installed on all of the CS department lab machines and can be automatically installed using the package management utilities of most modern Linux distributions, as well as Fink on OS X. In addition, downloads of both binary and source distributions of Ant, as well as documentation and support are available at the project's home page. The latest version is 1.7.0 and can be obtained here.

An ant script file is supplied with the support code which is pre-configured with targets for running most of the labs. If you have questions about this script, or would like help extending it to add functionality, you can look at the ant documentation or ask Dan for help.

Eclipse

Eclipse isn't necessary, but it is a fairly nice IDE. Here are some notes on working with your project under Eclipse. You will have to interact more directly with the command-line parameter facilities of the code-base. You might want to read this page about command-line parameters in the code-base.

Running from inside the Eclipse IDE

  • Add a new run configuration
  • On the Arguments tab, add the following Program Arguments: "-lz:\Reuters\data\reduced_set -dz:\Reuters -tMFL -omodels/model0.serialized" (replacing the directories with those that match your particular setup).
    • -t is the identifier for your Classifier
    • -o is the name of the output model
    • -M will load a serialized model (in which case you wouldn't need the other parameters)
  • Run the program normally

Running an Ant task from inside Eclipse

  • Run > External Tools > External Tools... (there is also a button on the toolbar)
  • Select "Ant Build" in the left pane and click the new button (the icon that is a piece of paper with a plus sign)
  • Give it an appropriate name (perhaps lab0, lab1, etc.)
  • Click the "Browse Workspace..." button, highlight build.xml, and click ok
  • Add the following arguments "-DSPLIT=Z:\Reuters\data\reduced_set -DDATA=Z:\Reuters -Do=models/model1.serialized" (replacing the directories as approriate for your setup). If you add your own parameters, you will need to pass them on to ant using the same -D syntax as above.
  • Click on the "Targets" tab
  • Select the appropriate target (e.g. lab0 or lab1).

Eclipse comes with a respectable build.xml editor to facilitate the changes you will need to make to that file.

Personal tools