My Weka page
Here are some small programs purporting to show the versatility of the Weka
data mining/machine learning system and what it can do. I will not explain everything (in fact, I will not explain very much at all). At the Weka site
you can read more about the system as well as downloading it.
All the MOOC videos are on YouTube:
As Weka (Explorer) is a Java standalone application with a very nice GUI and a lot more
to tweak than these applets indicates, you will definitely enjoy Weka more if
you use the whole package of your own.
Chapter 8: Nuts and
bolts: Machine learning algorithms in Java
(PDF file here
) from the excellent book Data Mining: Practical Machine Learning Tools and Techniques with Java
(written by Ian H. Witten and Eibe Frank). For more informations about the book, see http://www.cs.waikato.ac.nz/~ml/weka/book.html
The programs (so far...)
: these programs was written 2002/2003 for older
versions of Weka (and mostly as a "concept of proof" how to
"appletize" Weka), and has later been fixed for working with the
current Weka version 3.5.2 (mostly correcting the classnames for the
classifiers). These programs was compiled using Sun Java compiler
version 1.5 and Weka version 3.5.2.
filter for copying instances a number of times according to a
frequency field. To use it with Weka Explorer (Java source code and
recompilation of Weka is required):
- Copy the file ExpandFreqField.java to the Weka directory
- Add the following line in the file
weka/gui/GenericObjectEditor.props together with
other filters.unsupervised.instance filters:
(don't forget the trailing "\").
- Compile the Java file
- Start Weka Explorer
Some of my other pages about Weka
Also see the following pages on my site mentioning Weka.
ARFF data files
The data file normally used by Weka is in ARFF file format, which consist
of special tags to indicate different things in the data file
(mostly: attribute names, attribute types, attribute values and the data).
Here is a list of some ARFF-file you can use, many are standard data sets often
used in the machine learning community. Most of them are available from the
Weka site. Many of them are also described and downloadable from http://www.ics.uci.edu/~mlearn/MLRepository.html.
If you click on the link in the list below you can see for yourself
what the data set looks like. Please note that some files are
quite big, and for some algorithms it will take a lot of time (often a lot of time!). The number in parenthesis is the size in bytes. In some of the
files there are quite good comments for the data set, other has no explanation
at all (they are probably converted from some other source by myself).
One more thing: The class attribute (i.e. the attribute we want to learn) must
be the last.
The following data sets are quite large:
ARFF versions of DASL data
DASL - The Data and Story
Library is a great collection of data sets, with background
stories and some analysis. For ARFF versions of these data sets, see ARFF versions of
DASL data sets.
- My Eureqa page: Eureqa is a great tool for symbolic regression
- My JGAP page, I have written my own symbolic regression program using JGAP (Java)
Back to my homepage
Created by Hakan Kjellerstrand email@example.com