Appendix A. Installing Apache Spark
Although we provide a VM image where Spark is already installed, we also wanted to give you step-by-step instructions on how to install Apache Spark as it would be done in the real world. This appendix contains instructions for the following:
- Installing Java (JDK)
- Downloading, installing, and configuring Apache Spark
If you aren’t using Ubuntu, we suggest that you install the VirtualBox hardware-virtualization software and create a Ubuntu VM (www.wikihow.com/Install-Ubuntu-on-VirtualBox).
Let’s get started. From now on, we’ll assume that you’re logged in to your Ubuntu OS.
If you aren’t sure whether you already have the JDK installed and set up correctly, open your terminal (Ctrl-Alt-T) and issue the following command (you can paste the command in the Ubuntu terminal with Ctrl-Shift-V and, if needed, copy from the terminal with Ctrl-Shift-C):
Note
Skip the dollar sign ($) when you enter commands—that’s just the standard way of designating that commands should be entered into the terminal.
The which command basically tells you which executable file on your filesystem would be triggered if you were to execute the javac command (javac is the Java compiler, which comes only with the JDK).