Using scripts to improve your workflow

Development these days is very complicated. You have to contend with many types of technologies. Source code management. Maven or Gradle build scripts. Testing of all sorts. Deployment to various destinations. The permutations are endless. CI/CD (continuous integration & continuous deployment) tools like Jenkins can handle several of these challenges. But sometimes all you need is a simple script to do the job.

For example, on I’ve written different versions of the same script. They all sit on my desktop and I just click on the one that I need when I need it. I may have one that executes Maven to performs unit tests on the whole suite. Or one that only executes tests on a subset. I have also written a script that builds and deploy to a JBoss server without doing the unit tests first. This can save a lot of time when you’ve already executed your tests, fixed code, test, fix code in endless iterations. You don’t want to run those tests all over again. So script can be handy for all kinds of tasks.

Apache Spark

I’m currently learning Apache Spark running in a VirtualBox VM on a Windows host. Needless to say, scripting here is indispensable. I’ve written a script to start the VM in headless mode with a static IP address. Once the VM is up the script launches an SSH session for me.

What is really priceless is the example below of a BASH script that I put together to deploy an application into the Spark Server. This script works well because I just need to double-click on the script and it runs.  What the script does is nothing miraculous except that it simplifies a convoluted process that gets performed often.

As this example. all my scripts are written in BASH and execute in Git BASH from the Windows desktop. The script below will first build the application with code that currently checked-out. Then uploads the resulting JAR file to the VM using the SCP command. Once on the server, the JAR is submitted to Spark using the SSH command to perform a remote CLI call. Finally, once that process is done, another SCP call is made to retrieve the resulting CSV file.

# Change to the project directory
cd /c/dev/projects/apache-spark

# Build the application into a JAR file
mvn clean install

# Upload the file to the VM
scp -i /c/Users/Don/.ssh/id_rsa -P 22 /c/dev/projects/apache-spark/target/jar/SparkApp.jar spark@192.168.0.50:/home/spark/deployment/SparkApp.jar

# Submit the JAR to the Spark server
ssh -i /c/Users/Don/.ssh/id_rsa -p 22 spark@192.168.0.50 '/home/spark/spark/bin/spark-submit -v --class com.sparkinaction.App --master local /home/spark/deployment/SparkApp.jar'

# Retrieve result and copy to project folder
scp -i /c/Users/Don/.ssh/id_rsa -P 22 spark@192.168.0.50:/home/spark/result/sightings.csv /c/dev/projects/apache-spark/target/spark-result/result.csv

For those that don’t work in Linux much, there are two commands that I find indispensable.

SCP – This command copies files between computers using SSH. It’s secure and allows me to log in using an SSH key.

SSH – Everyone knows SSH, but did you know you can use it to execute commands remotely. That’s what I did above to submit the Java application to the Spark server.

Taking on Machine Learning

About two months ago I took steps to get into the Machine Learning bandwaggon. It was tough to take that first step for many reasons. The first was the tough decision of choosing the right programming language to learn. Did I want to stick to the JVM and Java or chose another JVM language? Take up Python. Or learn something else. This article from KDnuggets made that decision much harder. Luckily, due to circumstances from a recent project, I decided to turn to Scala. And so far, I haven’t been disappointed. Continue reading “Taking on Machine Learning”