My Weka page

Here is some small programs purporting to show the versatility of the Weka data mining/machine learning system and what it can do. I will not explain everything (in fact, I will not explain very much at all). At the Weka site http://www.cs.waikato.ac.nz/~ml/weka/index.html you can read more about the system as well as downloading it.

Also see: As Weka is a Java standalone application with a very nice GUI and a lot more to tweak than these applets indicates, you will definitely enjoy Weka more if you use the whole package of your own.

Many of the things shown in these applets is explained in Chapter 8: Nuts and bolts: Machine learning algorithms in Java (PDF file here) from the excellent book Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations (written by Ian H. Witten and Eibe Frank). For more informations about the book, see http://www.cs.waikato.ac.nz/~ml/weka/book.html.

The programs (so far...)

Please note: these programs was written 2002/2003 for older versions of Weka (and mostly as a "concept of proof" how to "appletize" Weka), and has later been fixed for working with the current Weka version 3.5.2 (mostly correcting the classnames for the classifiers). These programs was compiled using Sun Java compiler version 1.5 and Weka version 3.5.2.
  • ExpandFreqField.javaWeka filter for copying instances a number of times according to a frequency field. To use it with Weka Explorer (Java source code and recompilation of Weka is required):
    • Copy the file ExpandFreqField.java to the Weka directory weka/filters/unsupervised/instance
    • Add the following line in the file weka/gui/GenericObjectEditor.props together with other filters.unsupervised.instance filters:
      weka.filters.unsupervised.instance.ExpandFreqField,\
      (don't forget the trailing "\").
    • Compile the Java file
    • Start Weka Explorer

    Some of my other pages about Weka

    Also see the following pages here on my site mentioning Weka.

    ARFF data files

    The data file normally used by Weka is in ARFF file format, which consist of special tags to indicate different things in the data file (foremost: attribute names, attribute types, attribute values and the data).

    Here is a list of some ARFF-file you can use, many are standard data sets often used in the machine learning community. Most of them are available from the Weka site. Many of them are also described and downloadable from http://www.ics.uci.edu/~mlearn/MLRepository.html.

    If you click on the link in the list below you can see for yourself what the data set looks like. Please note that some files are quite big, and for some algorithms it will take a lot of time (often a lot of time!). The number in parenthesis is the size in bytes. In some of the files there are quite good comments for the data set, other has no explanation at all (they are probably converted from some other source by myself).

    One more thing: The class attribute (i.e. the attribute we want to learn) must be the last. The following data sets are quite large:

    ARFF versions of DASL data

    DASL - The Data and Story Library is a great collection of data sets, with background stories and some analysis. For ARFF versions of these data sets, see ARFF versions of DASL data sets.
    Related pages:
    • My Eureqa page: Eureqa is a great tool for symbolic regression
    • My JGAP page, I have written my own symbolic regression program using JGAP (Java)

    Back to my homepage
    Created by Hakan Kjellerstrand hakank@bonetmail.com