sbt

Interactive Build Tool for Apache Spark

@jaceklaskowski / StackOverflow / GitHub
Books: Mastering Apache Spark / Spark Structured Streaming

Intro to sbt

  • The interactive build tool
  • Automation tool
  • Use Scala to set up your Scala projects
  • Running tasks in parallel from the shell
  • The latest version: 0.13.15
  • Available from http://www.scala-sbt.org

build.sbt

The configuration file of your Spark project

organization := "pl.japila.spark"
name         := "my-spark-app"
version      := "1.0"

scalaVersion := "2.11.11"

val sparkVer = "2.1.1"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % sparkVer
libraryDependencies += "org.apache.spark" %% "spark-sql"   % sparkVer
            

project directory (1 of 2)

build.properties to specify the minimal version of sbt

sbt.version = 0.13.15
            

Tasks in sbt

Commands

Directory Layout

Convention over Configuration

sbt Plugins

Extensions

Usage

  • Plugins are (mostly) a set of tasks and commands
  • Install plugins
    • project/*.sbt per project
    • $HOME/.sbt/0.13/plugins/*.sbt globally
  • Advice: use plugin name as the file name
    • e.g. project/assembly.sbt
  • plugins lists available and installed plugins

sbt-assembly plugin

  • Creates a fat JAR from your project's classes including all of the library dependencies
    • Except provided-scoped
  • assembly task to assemble an application JAR
  • Only needed when dependencies are not included in Spark already (aka non-Spark dependencies)
    • ...which would make them provided
  • Install in project/assembly.sbt
  • Home page: https://github.com/sbt/sbt-assembly
  • Makes spark-submit --packages harder to use for deployment flexibility

sbt-docker plugin

sbt-native-packager plugin

  • Builds application packages in native formats
    • zip, tar.gz, xz
    • deb, rpm, dmg, msi
    • docker
  • stage stages your app so you can run it locally without having it packaged
  • Could be considered superior to sbt-assembly and sbt-docker
  • Home page: https://github.com/sbt/sbt-native-packager

sbt-coursier plugin

sbt-updates plugin

sbt Demos

Demo: Standalone Scala Application

  1. Create a new Scala/sbt project using IntelliJ IDEA
  2. Fix sbt version in project/build.properties
  3. Install sbt-coursier
  4. Run sbt package to create the application's package
  5. Run the Scala application (using spark-submit)