How to set up Hadoop pseudo-distributed mode?
To set up a Hadoop pseudo-distributed cluster, you need to follow these steps:
- Install Java: Make sure that Java has been installed on your machine and that the JAVA_HOME environment variable has been set up.
- Download Hadoop: Download the latest version of Hadoop from the official website (https://hadoop.apache.org/releases.html) and unzip it into the location where you want to install it.
- Configure Hadoop: Navigate to the installation directory of Hadoop, locate the etc/hadoop directory, and edit the hadoop-env.sh file to set the JAVA_HOME environment variable to the path where Java is installed.
- Configure the core Hadoop files by editing the etc/hadoop/core-site.xml file and setting the following properties:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
- The hdfs-site.xml file located in the etc/hadoop directory.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/path/to/hadoop_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/path/to/hadoop_data/hdfs/datanode</value>
</property>
</configuration>
Please make sure to replace /path/to/hadoop_data with the directory where you want to store Hadoop data.
- the yarn-site.xml file located in the etc/hadoop directory
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
- The file mapred-site.xml is located in the directory etc/hadoop.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- Format HDFS: Run the following command in the terminal to format the HDFS file system:
$ bin/hdfs namenode -format
- Start the Hadoop cluster: Execute the following command in the terminal to begin the Hadoop cluster.
$ sbin/start-dfs.sh
$ sbin/start-yarn.sh
- The URL is http://localhost:50070.
Now that you have successfully set up a pseudo-distributed cluster for Hadoop, you can use Hadoop command line tools or write MapReduce programs to process and analyze data.