Getting started with SMILE with Kotlin

I started using Python and scikit-learn for my machine-learning projects, but soon ran out of patience with Python. In my humble opinon it is a toy language that you can use for toy projects, but if you want to build something real you want to use a real language, with types, and that is where Kotlin and SMILE come in.

SMILE is a machine-learning library written in Java and with a convenience API in Kotlin. (There is also an API for Scala, and that might actually be the preferred method for using SMILE, but I don't use that.) SMILE is created by Haifeng Li and was first released in 2014.

SMILE comes with an extensive documentation, but I found it quite hard to get started, perhaps because I was used to scikit-learn and expected things to work the same in SMILE. So in this post I will describe how to get started with SMILE and Kotlin.

Finding documentation

There are four places to look for answers about SMILE:

  • SMILE main website. This contains general information on how to use SMILE, and how the machine-learning algorithms work etc. For example. if you want to use regression in your project you might want to start by reading about regression in SMILE, generally.
  • SMILE API. This is one section of the SMILE main web site. A good place to find out about exactly how to call the functions etc. Please note that the Kotlin API is very small, and although you might want to start looking into that part, you'll very soon want to call the Java API from your Kotlin code.
  • The SMILE GitHub site. The documentation here is quite limited, but you can easily access the source code from here (the API documentation is sometimes not as detailed as you would like, and you might want to look into the source code to understand the API.)
  • SMILE GitHub Issues. The GitHub site has a section where users can report problems with the library, and it has become something close to a discussion forum for SMILE. If you have a problem you can search here, and chances are someone else already asked that question, and got an answer. (Just make sure that the search field is cleared before you enter your search term; it took me a while to find this gold-mine of answers because the default filter is to only look at open issues.)

DataFrames

The data structure used to pass data to the machine-learnings functions in SMILE is called DataFrame, and it corresponds to Pandas DataFrame in Python/scikit-learn. It is described here, but you probably want to look into the Java API directly too.

You create instances of DataFrame in two main ways:

And please note, SMILE DataFrames are immutable. Yes. You cannot change them once they are created. Coming from scikit-learn this is really weird, because in scikit-learn you add and remove columns all the time, as you work your way towards the end result, but not so in SMILE. To some extent you can solve this by simply creating new instances from old ones, adding some detail, but my impression (so far) is that that is not the way DataFrames are intended to be used.

However, I have not yet figured out how DataFrames are intended to be used in SMILE, and I ended up creating my own DataSet class, with methods to convert to and from SMILE DataFrames. That might have been over-kill, since there are other DataSet/DataFrame libraries out there and you could use any of them instead, but my point is that this is not the place to learn about how to use SMILE DataFrames, because I sure don't know. Hopefully I can return to the subject in a later post.

A first example

OK, we are ready to dive into SMILE with some Ridge regression. I use Kotlin with Gradle, and my build.gradle.kts file looks like this:

plugins {
    // Apply the org.jetbrains.kotlin.jvm Plugin to add support for Kotlin.
    kotlin("jvm") version "1.4.21"
    kotlin("kapt") version "1.4.21"

    // Apply the application plugin to add support for building a CLI application in Java.
    application
}

tasks.withType<org.jetbrains.kotlin.gradle.tasks.KotlinCompile>().configureEach {
    kotlinOptions.jvmTarget = "1.8"
}

repositories {
    // Use JCenter for resolving dependencies.
    jcenter()
}

dependencies {
    // Align versions of all Kotlin components
    implementation(platform("org.jetbrains.kotlin:kotlin-bom"))

    // Use the Kotlin JDK 8 standard library.
    implementation("org.jetbrains.kotlin:kotlin-stdlib-jdk8")

    // This dependency is used by the application.
    implementation("com.google.guava:guava:29.0-jre")

    // Smile
    implementation("com.github.haifengl:smile-core:2.6.0")
    implementation("com.github.haifengl:smile-kotlin:2.6.0")
    
    // These are needed for regression i Smile.
    implementation(group="org.bytedeco", name="javacpp", version="1.5.3", classifier="macosx-x86_64")
    implementation(group="org.bytedeco", name="openblas", version="0.3.9-1.5.3", classifier="macosx-x86_64")
    implementation(group="org.bytedeco", name="arpack-ng", version="3.7.0-1.5.3", classifier="macosx-x86_64")

}

application {
    // Define the main class for the application.
    mainClass.set("se.magnusgunnarsson.smileblog.FirstExampleKt")
}

Please note the extra dependencies for org.bytedeco. We need these for regression.

SMILE uses static methods in interfaces, and for that reason we need to use JDK 1.8 instead of 1.6. I'm sure there are many ways of achieving this, and the tasks.withType in the file above is just one way.

OK, here is my Kotlin code:

package se.magnusgunnarsson.smileblog

import smile.data.DataFrame
import smile.data.formula.Formula
import smile.regression.*

fun main(args: Array<String>) {
    // Create some data. 
    var data = Array<DoubleArray>(13){doubleArrayOf(0.0)}
    data[0] = doubleArrayOf(57.3142861,45.0,14.0,164900.0,116910.0,48.7392861,133.930123 )
    data[1] = doubleArrayOf(23.0073691,43.0,12.0,138633.333,116910.0,30.9357145,138.55321)
    data[2] = doubleArrayOf(43.8676314,66.0,21.0,151266.667,120633.333,57.3142861,139.144051)
    data[3] = doubleArrayOf(20.0705358,47.0,10.0,128500.0,120633.333,23.0073691,123.355951)
    data[4] = doubleArrayOf(27.6794644,40.0,9.0,148766.667,122181.667,43.8676314,122.667478)
    data[5] = doubleArrayOf(27.4578573,146.0,66.0,138566.667,126548.333,20.0705358,109.957522)
    data[6] = doubleArrayOf(15.9874061,118.0,71.0,134733.333,128621.667,27.4578573,83.9544647)
    data[7] = doubleArrayOf(14.8142858,138.0,63.0,127533.333,137826.333,15.9874061,43.041923)
    data[8] = doubleArrayOf(15.6678573,150.0,86.0,109466.667,141869.667,14.8142858,35.7447188)
    data[9] = doubleArrayOf(14.3601192,128.0,72.0,96800.0,141120.333,15.6678573,37.8162068)
    data[10] = doubleArrayOf(14.2407408,145.0,81.0,97166.6667,144084.333,14.3601192,45.3637191)
    data[11] = doubleArrayOf(13.6261905,134.0,70.0,98900.0,141082.333,14.2407408,39.4444231)
    data[12] = doubleArrayOf(14.1465714,135.0,72.0,112233.333,127618.0,13.6261905,30.6519913)
    val features = arrayOf("x1","x2","x3","x4","x5","x6")

    // Convert the data into a SMILE DataFrame
    val df = DataFrame.of(data, "y", *features) // The first column is the target value (y), and the other columns are the features.

    // Make a formula
    val frm = Formula.of("y", *features)

    // Regression
    val lm = ridge(frm, df, lambda=1.0)

    // Print some results
    println( "\nIntercept: ${"%.2e".format(lm.intercept())}" )
    println( "Coefficients:" )
    for ( ix in 0 until features.size ){
        println( "\t${features[ix]}: ${"%.2e".format(lm.coefficients()[ix])}")
    }

    // Make predictions
    val predictions = lm.predict(df)
    println("\nActual\tPredicted")
    for ( ix in 0 until data.size ){
        println("${"%.2f".format(data[ix][0])}\t${"%.2f".format(predictions[ix])}")
    }
}

And the output is as follows:

Intercept: -1,97e+01
Coefficients:
        x1: 3,51e-02
        x2: -8,79e-03
        x3: 1,99e-04
        x4: -2,60e-06
        x5: 5,06e-01
        x6: 2,05e-02

Actual  Predicted
57,31   41,61
23,01   27,43
43,87   44,01
20,07   21,24
27,68   35,57
27,46   24,45
15,99   25,86
14,81   18,54
15,67   14,42
14,36   11,73
14,24   11,80
13,63   11,68
14,15   13,89

In my next post I will write about finding a good lambda, choosing an algorithm, and comparing scikit-learn with SMILE as regards regression.

Getting started with SMILE with Kotlin

3 thoughts on “Getting started with SMILE with Kotlin

  1. I had this page saved some time previously but my laptop crashed. I have since gotten a new one and it took me a while to find this! I also really like the template though.

  2. Have you used the more recent drop of SMILE? I’m trying to understand the difference between the LayerBuilder in MLP to the Layers (an array) for MultilayerPerceptron.

    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *


Scroll to top