Develop a standalone Spark SQL application (using IntelliJ IDEA) that uses your own upper
user-defined function (e.g. my_upper
).
Protip™: Use Scala’s StringOps.toUpperCase
Use your UDF in SQL, i.e. in spark.sql
.
Use callUDF standard function to call your UDF.
Module: Spark SQL
Duration: 30 mins
Think about using non-deterministic “features” like the current timestamp or a random number. What happens when you use such “features” in your UDFs?
// Use .asNondeterministic to see the change
val randgen = udf { (n: Long) => util.Random.nextInt() }
spark
.range(1)
.withColumn("randgen", randgen('id))
.select(randgen('id) === 'randgen)
.show