Hadoop 集群的部署

环境

  • 单虚拟机,SingleCluster
  • Pseudo-Distributed Operation
  • Debian 11
  • Hadoop 3.3
  • Java 11

主要参考 Apache Hadoop 3.3.1 – Hadoop: Setting up a Single Node Cluster.

兼容性

参阅

我们用最新版 3.3 with Java 11

配置主机名

确保 /etc/hosts 和当前主机名一致。节点间应该 unique.

安装 JRE

apt install openjdk-11-jre

安装基本软件

apt-get install ssh
apt-get install pdsh

安装 Hadoop

下载 Hadoop

Index of /hadoop/common/hadoop-3.3.1 (apache.org) 解压并 cd 进去。

配置 Hadoop

edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows:

  # set to the root of your Java installation
  export JAVA_HOME=/usr

Try the following command:

bin/hadoop

This will display the usage documentation for the hadoop script.

伪分布式部署,学习用

Use the following:

etc/hadoop/core-site.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

etc/hadoop/hdfs-site.xml:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

格式化文件系统

bin/hdfs namenode -format

下面启动精灵进程(NameNode daemon and DataNode daemon)。

首先配置变量。需要修改 sbin/stop-dfs.sh sbin/start-dfs.sh 两个文件。

image-20220109202720089

HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

为了避免出现 pdsh@hadoop-node-0: hadoop-node-0: connect: Connection refused

还要把下面的加到 ~/.bashrc

export PDSH_RCMD_TYPE=ssh

配置好 ssh 公钥私钥。(记得 chmod -R 700 ~/.ssh/id_rsa,否则报 Load key"/root/.ssh/id_rsa": bad permissions

然后执行

sbin/start-dfs.sh

服务运行成功后,访问机器 9870 HTTP 端口看到:

image-20220109203606038

Make the HDFS directories required to execute MapReduce jobs:

  $ bin/hdfs dfs -mkdir /user
  $ bin/hdfs dfs -mkdir /user/<username>

YARN

Yarn 是一个调度工具。

etc/hadoop/mapred-site.xml:

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP\_MAPRED\_HOME/share/hadoop/mapreduce/\*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>

etc/hadoop/yarn-site.xml:

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>

启动精灵进程前,也需要配置:sbin/start-yarn.sh sbin/stop-yarn.sh 增加:

YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

启动:

sbin/start-yarn.sh

排障

Couldn’t upload the file …

检查 Network Tab 发现重定向到节点 http://localhost:9864/ 这明显不对。不应该是 localhost. 解决方法:

hadoop - webhdfs always redirect to localhost:50075 - Stack Overflow

Well, it’s a /etc/hosts mistake.

The /etc/hosts on datanodes was:

127.0.0.1   localhost datanode-1

change it to:

127.0.0.1   datanode-1 localhost

fix this problem.

CORS 问题

core-site.xml 相关设置 Apache Hadoop 3.3.1 – Authentication for Hadoop HTTP web-consoles

core-site.xml

    <property>
         <name>hadoop.http.cross-origin.enabled</name>
         <value>true</value>
    </property>

403 问题

https://www.codestudyblog.com/cs2112bgc/1222045205.html

https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

hdfs-default.xml

    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>

检查

执行 ps -aux | grep java 可以看到五个相关进程。

通过 /sbin/stop-all.sh /sbin/start-all.sh 重启。

参考

Hadoop 集群的部署(一) - SegmentFault 思否 除了官网外基本参考这篇。但这篇路径上有问题。

hadoop 3.0 集群部署,超详细 - Ali0th - 掘金 (juejin.cn) 多台机器的部署。