DataSource API

Apache Spark 2 / Spark SQL

@jaceklaskowski / StackOverflow / GitHub / Mastering Apache Spark 2

Agenda

  1. DataSource API Overview
    • Input and Output
    • File Formats (JSON, TEXT, CSV, Parquet, ORC)
    • Hive Tables
    • JDBC
  2. Using DataSource API in Scala
    • Reading data from PostgreSQL (JDBC)
    • Running ML experiment with JSON data source

Questions?