Develop a Spark MLlib application that uses Logistic Regression for email classification, i.e. what emails are spam and not a spam.
Module: Spark MLlib
Duration: 45 mins
0
is OK
while 1
is SPAM
, using when
standard functionval status = when('prediction === 0, "OK").otherwise("SPAM").as("status")
Use Online Generate Test Data to generate a CSV dataset with fake emails and the columns: id
, body
, and label
.
id,body,label
1,Zushad zam fawo gur licidtug zar honepru zolor muahada lep pired ciuvi.,0
2,Elfi ez lirde vizavbak depmapav us piwojaw sihhib novo luzkut de teb apemimi hezotce rubumzer mowja jowte.,1