How to set up Hadoop pseudo-distributed mode?

2 years ago

Liam

3 minutes

To set up a Hadoop pseudo-distributed cluster, you need to follow these steps:

Install Java: Make sure that Java has been installed on your machine and that the JAVA_HOME environment variable has been set up.
Download Hadoop: Download the latest version of Hadoop from the official website (https://hadoop.apache.org/releases.html) and unzip it into the location where you want to install it.
Configure Hadoop: Navigate to the installation directory of Hadoop, locate the etc/hadoop directory, and edit the hadoop-env.sh file to set the JAVA_HOME environment variable to the path where Java is installed.
Configure the core Hadoop files by editing the etc/hadoop/core-site.xml file and setting the following properties:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

The hdfs-site.xml file located in the etc/hadoop directory.

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/path/to/hadoop_data/hdfs/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/path/to/hadoop_data/hdfs/datanode</value>
    </property>
</configuration>

Please make sure to replace /path/to/hadoop_data with the directory where you want to store Hadoop data.

the yarn-site.xml file located in the etc/hadoop directory

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
</configuration>

The file mapred-site.xml is located in the directory etc/hadoop.

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

Format HDFS: Run the following command in the terminal to format the HDFS file system:

$ bin/hdfs namenode -format

Start the Hadoop cluster: Execute the following command in the terminal to begin the Hadoop cluster.

$ sbin/start-dfs.sh
$ sbin/start-yarn.sh

The URL is http://localhost:50070.

Now that you have successfully set up a pseudo-distributed cluster for Hadoop, you can use Hadoop command line tools or write MapReduce programs to process and analyze data.