- (Prerequisites) Install
scala
andPlay framework
plugins - (Prerequisites) Java version should be
8
(Java15
also work on Windows) - Open
build.sbt
as a project - Open
sbt
panel on the right hand side, click ontasks -> clean
andtasks -> compile
to recompile the project - Edit run configurations, add a new
Play 2 application
config, setPlay 2 Module
ascsye7200_project
- Run the application
/spark/train
- Start training the ML models and save them to files once complete
- Training process will be wrapped in
Future
and response will be returned immediately. Actual training status can be checked in application logs (by defaultlogs/application.log
)
/spark/infer/lr
- Logistic regression inference API. Input format can be checked in Postman.
- Output will be
0
(not popular) or1
(popular)
/spark/infer/rf
- Random forest inference API. Input format can be checked in Postman.
- Output will be
0
(not popular) or1
(popular)
/spark/infer_batch/lr
- Logistic regression Batch inference API. Input should be .csvfile.
- Output will be a List contains
0
(not popular) or1
(popular)
/spark/infer_batch/rf
- Random forest Batch inference API. Input should be .csvfile.
- Output will be a List contains
0
(not popular) or1
(popular)
To use the GUI, please follow the instruction here
We provide 2 methods to test the web APIs
- Postman
- Scripts available at
postman_test/spark.postman_collection.json
. Import it to Postman to start testing.
- Scripts available at
- Rest Client
- Scripts available at
restclient_test/request.rest
. Open it inVsCode
(with rest client extension) to start testing.
- Scripts available at
- When you start application first time, you must call the
/spark/train
API before you request other APIs
Application configuration file is conf/application.conf
spark
- all spark related configurationisLocal
- Boolean. Indicate whether Spark session will be run on localLRModelPath
- String. Path where logistic regression's pipeline model is savedRFModelPath
- String. Path where random forest's pipeline model is saveduseCsv
- Boolean. Indicate whether we will use processed .csv file as training data inputh5FolderPath
- String. IfuseCsv
is set to false, all .h5 files underh5FolderPath
will be used as training datacsvPath
- String. Path where the processed .csv file is at. Only used ifuseCsv
is set to truemasterIp
- String. When running on cluster, the IP address of master nodeexecutorMem
- String. Memory size assigned to Spark on each cluster executor. e.g. '12g'
Library used for reading hdf5 file in scala: https://github.com/jamesmudd/jhdf