Develop a standalone Spark SQL application (using IntelliJ IDEA) that finds the ids of the rows that have values of one column in an array column.
Protip™: Use split and explode standard functions
Module: Spark SQL
Duration: 30 mins
+---+------------------+-----+
| id| words| word|
+---+------------------+-----+
| 1| one,two,three| one|
| 2| four,one,five| six|
| 3|seven,nine,one,two|eight|
| 4| two,three,five| five|
| 5| six,five,one|seven|
+---+------------------+-----+
id,words,word
1,"one,two,three",one
2,"four,one,five",six
3,"seven,nine,one,two",eight
4,"two,three,five",five
5,"six,five,one",seven
+-----+------------+
| w| ids|
+-----+------------+
| five| [2, 4, 5]|
| one|[1, 2, 3, 5]|
|seven| [3]|
| six| [5]|
+-----+------------+
The word “one” is in the rows with the ids 1
, 2
, 3
and 5
.
The word “seven” is in the row with the id 3
.