How to compile Hadoop from source

·

1 min read

Hadoop is a great (and complex) software framework with a lot of dependencies and configurations you need to make. One way to get up to speed with it is to download ready-made release binaries and just install them (You can find related links at http://hadoop.apache.org/releases.html).

But if you want to have the latest source code (and probably work on it), you will need to check them out from Apache Hadoop's git repository. Before you can run Hadoop this way, you will need to set-up your system. This setup will include:
1. Installing required OS packages (autoconf, cmake, libtool, …)
2. Installing JDK and Maven
3. Installing protocol buffers with the correct version
4. Installing Hadoop maven plugins
5. Compiling and installing Hadoop
6. Setting up password-less SSH and some environment variables

I have prepared a 'Vagrantfile' with instructions to provision the vagrant machine to do above steps. All required steps are explained in README file of the repository: https://github.com/mm-binary/hadoop-src-getting-started

<p>
</p>