How to set up Hadoop pseudo-distributed mode?

To set up a Hadoop pseudo-distributed cluster, you need to follow these steps:

  1. Install Java: Make sure that Java has been installed on your machine and that the JAVA_HOME environment variable has been set up.
  2. Download Hadoop: Download the latest version of Hadoop from the official website (https://hadoop.apache.org/releases.html) and unzip it into the location where you want to install it.
  3. Configure Hadoop: Navigate to the installation directory of Hadoop, locate the etc/hadoop directory, and edit the hadoop-env.sh file to set the JAVA_HOME environment variable to the path where Java is installed.
  4. Configure the core Hadoop files by editing the etc/hadoop/core-site.xml file and setting the following properties:
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>
  1. The hdfs-site.xml file located in the etc/hadoop directory.
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/path/to/hadoop_data/hdfs/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/path/to/hadoop_data/hdfs/datanode</value>
    </property>
</configuration>

Please make sure to replace /path/to/hadoop_data with the directory where you want to store Hadoop data.

  1. the yarn-site.xml file located in the etc/hadoop directory
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
</configuration>
  1. The file mapred-site.xml is located in the directory etc/hadoop.
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
  1. Format HDFS: Run the following command in the terminal to format the HDFS file system:
$ bin/hdfs namenode -format
  1. Start the Hadoop cluster: Execute the following command in the terminal to begin the Hadoop cluster.
$ sbin/start-dfs.sh
$ sbin/start-yarn.sh
  1. The URL is http://localhost:50070.

Now that you have successfully set up a pseudo-distributed cluster for Hadoop, you can use Hadoop command line tools or write MapReduce programs to process and analyze data.

bannerAds