安装软件
为了集群上大家使用方便,在/opt目录下放置本次要使用的软件。
权限方面使用hadoop用户组来限制使用
下载安装JDK
使用graalvm-ce-java8-20.3.2版本(本文撰写时最新的lts版本)
1
2
3
4
|
wget https://github.com/graalvm/graalvm-ce-builds/releases/download/vm-20.3.2/graalvm-ce-java8-linux-amd64-20.3.2.tar.gz
tar -xzf ./graalvm-ce-java8-linux-amd64-20.3.2.tar.gz
mkdir /opt/java
mv ./graalvm-ce-java8-20.3.2 /opt/java/
|
下载安装Hadoop
先在namenode机器上进行配置,之后用scp分发配置好的文件夹
1
2
3
4
5
6
7
8
|
wget https://mirror.sjtu.edu.cn/apache/hadoop/core/hadoop-3.3.1/hadoop-3.3.1.tar.gz
tar -xzf ./hadoop-3.3.1.tar.gz
mkdir /opt/hadoop
mv ./hadoop-3.3.1 /opt/hadoop/
groupadd hadoop
usermod -aG hadoop <user_name>
# 使用acl规则开放hadoop用户组的权限
setfacl -R -m g:hadoop:rwx /opt/hadoop/
|
配置Hadoop
hadoop-env.sh
1
|
export JAVA_HOME="/opt/java/graalvm-ce-java8-20.3.2"
|
core-site.xml
注意这里需要修改hosts
1
2
3
4
5
6
7
8
9
10
11
|
<configuration>
<!--指定namenode的地址-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://131-198:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:///opt/hadoop/tmp</value>
</property>
</configuration>
|
hdfs-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
<configuration>
<!--指定hdfs保存数据的副本数量-->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!--指定hdfs中namenode的存储位置-->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/hadoop/tmp/dfs/name</value>
</property>
<!--指定hdfs中datanode的存储位置-->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///media/moosefs/hadoop/data</value>
</property>
</configuration>
|
一些脚本工作
namenode
1
|
mkdir -p /opt/hadoop/tmp/dfs/name
|
datanode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
# 分发文件
scp -r <user_name>@192.168.131.198:/opt/java /opt/
scp -r <user_name>@192.168.131.198:/opt/hadoop /opt/
mkdir -p /media/moosefs/hadoop/data
groupadd hadoop
usermod -aG hadoop <user_name>
chown -R root:hadoop /opt/hadoop
chown -R root:hadoop /media/moosefs/hadoop
setfacl -R -m g:hadoop:rwx /opt/hadoop
setfacl -R -m g:hadoop:rwx /media/moosefs/hadoop
|
启动节点
在namenode
1
2
3
4
5
6
|
export HADOOP_HOME="/opt/hadoop/hadoop-3.3.1"
$HADOOP_HOME/bin/hdfs namenode -format ada
$HADOOP_HOME/bin/hdfs --daemon start namenode
# 配置好免密码的话在namenode上
$HADOOP_HOME/sbin/start-dfs.sh
|
在Datanode上
1
2
3
|
export HADOOP_HOME="/opt/hadoop/hadoop-3.3.1"
sudo chown -R <user_name>:hadoop /media/moosefs/hadoop/
$HADOOP_HOME/bin/hdfs --daemon start datanode
|
注意修改hosts文件,否则会有Name or service not known错误
测试HDFS
1
2
3
4
|
cd $HADOOP_HOME
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.1-tests.jar TestDFSIO -write -nrFiles 10 -size 100MB
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.1-tests.jar TestDFSIO -read -nrFiles 10 -size 100MB
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.1-tests.jar TestDFSIO -clean
|