ZooKeeper 集群部署

下载和解压

wget https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
scp "C:\Users\i\Downloads\Compressed\zookeeper-3.4.6.tar.gz" root@ecs-2019211379-0001:~
mv zookeeper-3.4.6.tar.gz /usr/local/
cd /usr/local/
tar -zxvf zookeeper-3.4.6.tar.gz
ln -s zookeeper-3.4.6 zookeeper

设置环境变量

vi /etc/profile
export ZOOKEEPER_HOME=/usr/local/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
. /etc/profile

配置节点

cd /usr/local/zookeeper/conf
cp zoo_sample.cfg zoo.cfg
vi zoo.cfg
dataDir=/usr/local/zookeeper/tmp

vim cc 可以清空当前行(不删除)

server.1=ecs-2019211379-0001:2888:3888
server.2=ecs-2019211379-0002:2888:3888
server.3=ecs-2019211379-0003:2888:3888
server.4=ecs-2019211379-0004:2888:3888
mkdir /usr/local/zookeeper/tmp
touch /usr/local/zookeeper/tmp/myid

复制到其它机器

此处如果带版本号传输,则后面需要手动建立符号链接

scp -r /usr/local/zookeeper root@ecs-2019211379-0002:/usr/local
scp -r /usr/local/zookeeper root@ecs-2019211379-0003:/usr/local
scp -r /usr/local/zookeeper root@ecs-2019211379-0004:/usr/local
scp /etc/profile root@ecs-2019211379-0002:/etc/profile
scp /etc/profile root@ecs-2019211379-0003:/etc/profile
scp /etc/profile root@ecs-2019211379-0004:/etc/profile

各机器 source /etc/profile

对于机器 $i:

echo $i > /usr/local/zookeeper/tmp/myid

mkdir /usr/local/zookeeper/tmp
echo 1 > /usr/local/zookeeper/tmp/myid

mkdir /usr/local/zookeeper/tmp
echo 2 > /usr/local/zookeeper/tmp/myid

mkdir /usr/local/zookeeper/tmp
echo 3 > /usr/local/zookeeper/tmp/myid

mkdir /usr/local/zookeeper/tmp
echo 4 > /usr/local/zookeeper/tmp/myid

启动

各机器启动:

cd /usr/local/zookeeper/bin
./zkServer.sh start

查看状态:

./zkServer.sh status

若 Mode 为一个 leader,三个 follower,则正确。 重启:

./zkServer.sh restart

遇到的问题

  • Mode: standalone 这是 zoo.cfg 的 server 配置有误。检查配置项是否正确,本机 id 是否正确。

  • Error contacting service. It is probably not running. 先用 ./zkServer.sh start-foreground 启动,可以看到输出:

    ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally
    java.net.BindException: Address already in use
    

    因此 ./zkServer.sh stop 关闭,再启动,看到:

    Cannot open channel to 3 at election address ecs-2019211379-0003/192.168.0.203:3888
    java.net.ConnectException: Connection refused (Connection refused)
    

    据此可知,DNS 无问题,问题在于网络无法互联。 修改各机器,让其自己的编号的地址设为 0.0.0.0,即可公网监听: upgit_20220420_1650438083.png

正确状态:

upgit_20220420_1650438188.png

upgit_20220420_1650438248.png

HBase 的部署

下载

wget https://archive.apache.org/dist/hbase/2.0.2/hbase-2.0.2-bin.tar.gz

Local

scp "C:\Users\i\Downloads\Compressed\hbase-2.0.2-bin.tar.gz" root@ecs-2019211379-0001:/usr/local

传输到各个节点

Nodes:

scp /usr/local/hbase-2.0.2-bin.tar.gz root@ecs-2019211379-0002:/usr/local
scp /usr/local/hbase-2.0.2-bin.tar.gz root@ecs-2019211379-0003:/usr/local
scp /usr/local/hbase-2.0.2-bin.tar.gz root@ecs-2019211379-0004:/usr/local

解压

cd /usr/local
tar -zxvf hbase-2.0.2-bin.tar.gz
ln -s hbase-2.0.2 hbase
vim /etc/profile

设置环境变量

export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HBASE_HOME/bin:$HBASE_HOME/sbin
source /etc/profile
cd $HBASE_HOME/conf
vim hbase-env.sh
export JAVA_HOME=/usr/local/jdk8u252-b09
export HBASE_MANAGES_ZK=false
export HBASE_LIBRARY_PATH=/usr/local/hadoop/lib/native

查看自己的 Java 路径:

update-java-alternatives -l
---
java-1.8.0-openjdk-amd64       1081       /usr/lib/jvm/java-1.8.0-openjdk-amd64

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export HBASE_MANAGES_ZK=false
export HBASE_LIBRARY_PATH=/usr/local/hadoop/lib/native

配置 hbase-site.xml

vim hbase-site.xml

插入配置:

<configuration>
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://ecs-2019211379-0001:8020/HBase</value>
    </property>
    <property>
        <name>hbase.tmp.dir</name>
        <value>/usr/local/hbase/tmp</value>
    </property>
    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
    </property>
    <property>
        <name>hbase.unsafe.stream.capability.enforce</name>
        <value>false</value>
    </property>
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>ecs-2019211379-0002:2181,ecs-2019211379-0003:2181,ecs-2019211379-0004:2181</value>
    </property>
    <property>
        <name>hbase.unsafe.stream.capability.enforce</name>
        <value>false</value>
    </property>
</configuration>

Vim 通过 :set paste 启用粘贴模式,避免粘贴错位。然后再 set nopaste

配置 regionservers

vim regionservers

替换为

ecs-2019211379-0002
ecs-2019211379-0003
ecs-2019211379-0004
ln -s /root/modules/hadoop-3.3.2/etc/hadoop/hdfs-site.xml /usr/local/hbase/conf/hdfs-site.xml

启动

在 node1 启动

/usr/local/hbase/bin/start-hbase.sh

遇到的问题

  • ecs-2019211379-0002: regionserver running as process 3406. Stop it first. 解决方法:不用解决,因为你不应该在其它结点执行 start-hbase.sh

  • PleaseHoldException: Master is initializing 先清理数据(慎重)

    /usr/local/zookeeper/bin/zkServer.sh stop
    rm /usr/local/zookeeper/tmp/version-2/* -rfd
    /usr/local/zookeeper/bin/zkServer.sh start
    /usr/local/zookeeper/bin/zkServer.sh status
    

    然后看日志:

    root@ecs-2019211379-0004:/usr/local/hbase/conf# tail ../logs/hbase-root-regionserver-ecs-2019211379-0004.log
    Caused by: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: connect(..) failed: Invalid argument: ecs-2019211379-0001/192.168.0.93:16000
            at org.apache.hbase.thirdparty.io.netty.channel.unix.Errors.throwConnectException(Errors.java:107)
            at org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.connect(Socket.java:255)
            at org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel.doConnect0(AbstractEpollChannel.java:758)
            at org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel.doConnect(AbstractEpollChannel.java:743)
            at org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.connect(AbstractEpollChannel.java:585)
            ... 15 more
    Caused by: java.net.ConnectException: connect(..) failed: Invalid argument 
            ... 20 more
    2022-04-20 16:05:14,228 WARN  [regionserver/ecs-2019211379-0004:16020] regionserver.HRegionServer: reportForDuty failed; sleeping and then retrying.
    2022-04-20 16:05:17,229 INFO  [regionserver/ecs-2019211379-0004:16020] regionserver.HRegionServer: reportForDuty to master=ecs-2019211379-0001,16000,1650441887334 with port=16020, startcode=1650441888289   
    2022-04-20 16:05:17,231 WARN  [regionserver/ecs-2019211379-0004:16020] regionserver.HRegionServer: error telling master we are up
    
    root@ecs-2019211379-0001:~# netstat -nltp | grep 16000
    tcp6       0      0 192.168.0.93:16000      :::*                    LISTEN      4423/java
    

    查询资料可知,这个端口是 HBase Master 的默认端口。也就是说 ZooKeeper 里注册的 Master 地址是 ecs-2019211379-0001/192.168.0.93:16000.

    root@ecs-2019211379-0004:~# telnet ecs-2019211379-0001 16000
    

    telnet 正常。

    root@ecs-2019211379-0001:~# /usr/local/hbase/bin/stop-hbase.sh
    root@ecs-2019211379-0001:~# /usr/local/hbase/bin/start-hbase.sh
    

    无效

    查看日志:

    root@ecs-2019211379-0001:~# vi /usr/local/hbase/logs/hbase-root-master-ecs-2019211379-0001.log
    2022-04-20 16:08:19,536 INFO  [Thread-14] client.RpcRetryingCallerImpl: Call exception, tries=19, retries=46, started=209164 ms ago, cancelled=false, msg=Call to ecs-2019211379-0004/192.168.0.101:16020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: syscall:getsockopt(..) failed: Connection refused: ecs-2019211379-0004/192.168.0.101:16020, details=row 'hbase:namespace' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ecs-2019211379-0004,16020,1650439274835, seqNum=-1
    2022-04-20 16:08:19,629 WARN  [master/ecs-2019211379-0001:16000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions.
    

    也就是说主节点无法访问其它节点。测试一下:

    root@ecs-2019211379-0001:~# telnet ecs-2019211379-0004 16020
    Trying 192.168.0.101...
    telnet: Unable to connect to remote host: Connection refused
    

    果然如此。

    一看,监听的是 tcp6 本地环回地址:

    root@ecs-2019211379-0004:~# netstat -nltp | grep 16020
    tcp6       0      0 127.0.1.1:16020         :::*                    LISTEN      5478/java
    

    最终解决方案:vi /etc/hosts,注释如下行: upgit_20220420_1650442501.png

    PS:华为云真是恶心,我不是都注释掉了吗,每次重启都要给我设置回来,醉了。

    root@ecs-2019211379-0001:~# /usr/local/hbase/bin/stop-hbase.sh
    root@ecs-2019211379-0001:~# /usr/local/hbase/bin/start-hbase.sh
    

    问题解决。

检查

正确状态 jps:

upgit_20220420_1650439561.png

从节点:

upgit_20220420_1650439581.png

HBase 使用实践

基础概念

  • 列族(Column Family)—— 每个列簇有多个列。看下面的图就明白了,PersonalWork 是两个列族。

upgit_20220420_1650440723.png

常用命令

创建带有列族的表:

create 'employee', {NAME => 'Personal', VERSIONS => 1}, {NAME => 'Work', VERSIONS => 1}

成功的话会显示:Created table employee

当然,也可以简单一点:

create 'employee', 'Personal', 'Work'

由于已经创建过,会显示:ERROR: Table already exists: employee!

hbase shell

创建表:

create '2019211379_zzj', 'cf1'

插入数据:

put '2019211379_zzj', 'rk001', 'cf1:keyword', 'applicate'
put '2019211379_zzj', 'rk002', 'cf1:keyword', 'Nokia Lumia'
put '2019211379_zzj', 'rk002', 'cf1:keyword', 'iPhone X'

扫描数据:

scan '2019211379_zzj'

upgit_20220420_1650442858.png

编程实践

代码如下:

upgit_20220420_1650446813.png

upgit_20220420_1650446770.png

代码说明

  1. 我们创建了一个 Scan,然后调用 addColumn 用于限定查询的列族。
  2. speculative.execution 是试探执行,用于在某个 Node 执行太慢时,让其他 Node 执行备份。我们这里关闭了这个功能。
  3. 然后创建了一个 Job,名为 Member Test1initTableMapperJob 将表、Mapper 和 Job 相关联。
  4. 最后我们指定输出到 tmpIndexPath 并等待执行完成。

打包之后,scp 到服务器。然后执行:

hadoop jar MyHBase.jar org.zzj2019211379.hbase.inputSource.Main

等待任务完成:

upgit_20220420_1650445920.png

查看结果:

upgit_20220420_1650446675.png

遇到的问题

  • Application is added to the scheduler and is not yet activated. Queue’s AM resource limit exceeded

    Diagnostics:	[Wed Apr 20 16:59:05 +0800 2022] Application is added to the scheduler and is not yet activated. Queue's AM resource limit exceeded. Details : AM Partition = <DEFAULT_PARTITION>; AM Resource Request = <memory:2048, vCores:1>; Queue Resource Limit for AM = <memory:4096, vCores:1>; User AM Resource Limit of the queue = <memory:4096, vCores:1>; Queue AM Resource Usage = <memory:4096, vCores:4>;
    

    分析:内存不足。

    解决方法:yarn-site.xml

    <property>
        <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
        <value>0.5</value>
    </property>
    

    此处表示允许使用 50% 的内存。