Training a Random Forest with Weka
Sometimes, it is useful to utilise graphical tools like Weka to train machine learning models. In this tutorial we are going to learn how to prepare a dataset for Weka, how to train a Weka Random Forest over that dataset, and how to import and export models to use them in the future.
- Generate a dataset for Weka
- Train the model
- Export and import models
1. Generating a dataset for Weka
You can obtain a dataset to test your machine learning algorithms from different sources, including the Machine Learning Repository of the University of California, Irvine (UCI) and kaggle. In this case, we are going to obtain a dataset from the Machine Learning Repository of the UCI.
The dataset used in this tutorial is the Iris Dataset. This dataset contains 150 instances of three different types of iris plant (Iris Setosa, Iris Versicolour and Iris Virginica). Each instance is formed by the following attributes:
- Sepal length (cm)
- Sepal width (cm)
- Petal length (cm)
- Petal width (cm)
So, our task consists in determine the type of an iris plant given the four attributes measured. As an example, a plant with the measures sepal length = 5.1cm, sepal width = 3.5cm, petal length = 1.4cm and Petal width = 0.2cm is an Iris Setosa.
The Iris Dataset can be downloaded from this link. The file needed for this tutorial is iris.data. It is basically a csv (comma separated values) file containing the 150 instances of the dataset, so let’s rename it to iris-data.csv.
The file now named iris-data.csv does not have a header, so let’s add it by adding the following row to the beginning of the file:
The resulting csv file can be downloaded here.
Now, we have a proper csv file with a good descriptive header. In order to make it work with weka, we have to convert it to an arff file. Luckily, there are online tools like this to do it for us.
Click in the link, upload the file iris.csv, and insert the delimiter (in this case ‘,’). Click ‘submit‘ and select the option ‘First row contains labels‘. Finally click in the button ‘Generate my ARFF‘ and download it.
The resulting arff file can be downloaded here.
Now, the dataset is ready to use in Weka.
2. Training the model
Download Weka for your OS, install it and execute it.
Click in ‘Explorer‘ to open the Application Explorer. Then, click in ‘Open File…‘ and select the database file iris.arff. The information of the database is then shown in the explorer as in the image above.
Click in the tab ‘Classify‘ and click the button ‘Choose‘ to select the classifier ‘weka/classifiers/trees/RandomForest‘. You can change the parameters of the classifier by clicking in ‘RandomForest‘, to the right of the ‘Choose‘ button.
Here, we are going to use the default parameters of the classifier, so just click the button start to train the Random Forest over the iris.arff dataset. After clicking start, the results are shown in the ‘Classifier output‘ section.
Using the default parameters, we have obtained decent results. The algorithm used to train the RandomForest is Stratified cross-validation, and 143 (95.33%) of instances have been classified correctly.
As it can be seen, training a model in Weka is pretty straightforward. In the following section we are going to see how the trained model can be exported to be used in the future, and how an existing model can be imported to classify new data.
3. Export and import models
We can export the model we have just trained in the previous section by right-clicking over our model (trees.RandomForest) in the Results List section of the GUI, and clicking in ‘Save Model‘. After selecting the path where the model is saved and the name, a file with the extension .model is created. Only .model files can be exported with Weka.
The resulting model file can be downloaded here.
Similarly, to load an existing model we can right-click anywhere in the Results List section and click in ‘Load Model‘. The formats admitted by Weka are Weka models (.model) and PMML models (.xml).