Integrating various python machine learning libraries in scala stack
This post will walk you through integrating python machine-learning libraries into your existing scala stack.
Before we proceed further, let us answer the basic question:
Why not use scala native libraries?
Primarily, there are a very limited set of options available in Scala, spark mllib, deeplearning4j.
The python ML ecosystem is huge, scikit-learn itself is enough for most usecases. With the recent surge of deep-learning, most of the established and stable libraries such as Keras/Tensorflow/Theano are available in python.
Our goal is to get the best of both worlds: Scala’s strictly typed system and Python’s plethora of libraries at our disposal
from python import x
While trying to approach this problem, we investigated various libraries. Here is basic case study regarding each one of them:
Jython: Jython is basically a re-implementation of python in Java. Although it does include most of the python modules, it lacks the support for C-Extension modules. Which essentially renders Jython useless for most of the libraries in the ML ecosystem as all of them use C-Extension to speedup the processing.
JyNI: (Jython Native Interface) JyNI is a compatibility layer with the goal to enable Jython to use native CPython extensions like NumPy or SciPy. However, JyNI doesn’t currently support the entire Python C-API, so it is not currently at a state where we can use it for libraries built using Cython.
Jep: (Java Embeded Python) Jep takes a different route and embeds CPython in Java using JNI. Long story short, if you need to include CPython modules (such as numpy) Jep is the way to go.
How Jep Works?
Jep uses JNI and the CPython API to start up the Python interpreter inside the JVM. When you create a Jep instance in Java, a sub-interpreter will be created for that Jep instance and will remain in memory until the Jep instance is closed with jep.close(). Have a look at Jep’s documentation for further details.