Spark with pyspark
A simple usage example of pyspark with spark-submit, including:
- passing arguments
- creating spark context and sql context
- loading your project source code (src directory )
- loading pip modules (with simple requirements file)
from terminal, run:
pip install -r ./requirements.txt -t ./pip_modules && jar -cvf pip_modules.jar -C ./pip_modules .
jar -cvf src.jar -C ./src .
from terminal, run:
spark-submit main.py --some_arg=some_value --pip_modules=pip_modules.jar --src=src.jar
- run: pyspark
- from within pyspark interactive shell, run the following:
sc.addPyFile("src.jar")
sc.addPyFile("pip_modules.jar")