Skip to main content

Command Palette

Search for a command to run...

How to compile Hadoop from source

Published
1 min read
M

Hands-on technology leader with 10+ years building scalable, mission-critical systems at Goldman Sachs, Brevan Howard and fast-growing fintechs. Expert in cloud-native architectures, distributed data pipelines and high-throughput systems; experienced in migrating legacy platforms and designing AI-enabled services. Proven track record delivering reliable platforms that process millions of transactions daily.

Hadoop is a great (and complex) software framework with a lot of dependencies and configurations you need to make. One way to get up to speed with it is to download ready-made release binaries and just install them (You can find related links at http://hadoop.apache.org/releases.html).

But if you want to have the latest source code (and probably work on it), you will need to check them out from Apache Hadoop's git repository. Before you can run Hadoop this way, you will need to set-up your system. This setup will include:
1. Installing required OS packages (autoconf, cmake, libtool, …)
2. Installing JDK and Maven
3. Installing protocol buffers with the correct version
4. Installing Hadoop maven plugins
5. Compiling and installing Hadoop
6. Setting up password-less SSH and some environment variables

I have prepared a 'Vagrantfile' with instructions to provision the vagrant machine to do above steps. All required steps are explained in README file of the repository: https://github.com/mm-binary/hadoop-src-getting-started

<p>
</p>

More from this blog

A

A Blog about Software Development

96 posts