Apache Spark and Python
- GRoot
- Dec 31, 2019
- 1 min read
There is a package for Python that will give you access to Apache Spark.
First you need to install Spark. (A separate process all together)
Then Python, either directly or through Anaconda/MiniConda.
(I prefer the direct and MiniConda methods because Anaconda is so large and it sometimes gets behind in updating available packages.)
Once this is done you can install the Python package called PySpark.
Use the Conda console to run "Conda Install PySpark".
This will install the main package plus a Java support package, Py4J.
For a regular Python install you can use the PIP console and try, "PIP install PySpark".
This process may fail and you might have to install pypandoc with "PIP install pypandoc",
After that you should be able to run the PIP command to install PySpark.
Comments