windows 10下的hadoop演练

摘要 Hadoop支持windows吗?是的。不过推荐在linux环境安装、运行。windows 10 下 的hadoop演练开始。

属于 :API 标签: 发布于:2020-03-24 18:57:44

Hadoop支持windows吗?是的。不过推荐在linux环境安装、运行。

windows 10 下 的hadoop演练开始。

先从官网下载 hadoop 编译好的bin. Download -> 选择版本 -》 bin

* 注意:为避免你后续无法操作,请下载 2.9.2版。

由于文件夹太长,解压支持不是很好,所以下载git, 安装git bash.

打开git bash.

假设你下载到G:/hadoop-2.9.2.tar.gz

则bash命令如下

cd /g

tar -zxvf hadoop-2.9.2.tar.gz

解压完成。

复制一份

cp -r hadoop-2.9.2/* deploy/*

最后我们在g:/deploy目录下操作。

g:\deploy\etc\hadoop

修改 hadoop-env.cmd 将下面几行放到文件尾部

set HADOOP_PREFIX=c:\deploy
set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoop
set YARN_CONF_DIR=%HADOOP_CONF_DIR%
set PATH=%PATH%;%HADOOP_PREFIX%\bin

修改 core-site.xml

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://0.0.0.0:19000</value>
  </property>
</configuration>

修改hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

slaves 文件确保有一行

localhost

修改或新增 mapred-site.xml

<configuration>

   <property>
     <name>mapreduce.job.user.name</name>
     <value>%USERNAME%</value>
   </property>

   <property>
     <name>mapreduce.framework.name</name>
     <value>yarn</value>
   </property>

  <property>
    <name>yarn.apps.stagingDir</name>
    <value>/user/%USERNAME%/staging</value>
  </property>

  <property>
    <name>mapreduce.jobtracker.address</name>
    <value>local</value>
  </property>

</configuration>

修改或新增 yarn-site.xml

<configuration>
  <property>
    <name>yarn.server.resourcemanager.address</name>
    <value>0.0.0.0:8020</value>
  </property>

  <property>
    <name>yarn.server.resourcemanager.application.expiry.interval</name>
    <value>60000</value>
  </property>

  <property>
    <name>yarn.server.nodemanager.address</name>
    <value>0.0.0.0:45454</value>
  </property>

  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>

  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>

  <property>
    <name>yarn.server.nodemanager.remote-app-log-dir</name>
    <value>/app-logs</value>
  </property>

  <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>/dep/logs/userlogs</value>
  </property>

  <property>
    <name>yarn.server.mapreduce-appmanager.attempt-listener.bindAddress</name>
    <value>0.0.0.0</value>
  </property>

  <property>
    <name>yarn.server.mapreduce-appmanager.client-service.bindAddress</name>
    <value>0.0.0.0</value>
  </property>

  <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>

  <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>-1</value>
  </property>

  <property>
    <name>yarn.application.classpath</name>
    <value>%HADOOP_CONF_DIR%,%HADOOP_COMMON_HOME%/share/hadoop/common/*,%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*</value>
  </property>
</configuration>

* winutil.exe

从 github https://codeload.github.com/cdarlint/winutils/zip/master  下载。

将你的版本下的bin目录直接拷贝到deploy目录。


* 用管理员身份运行cmd

g:
deploy\bin\hdfs namenode -format

看到输出 

Saving image file

接着cmd

sbin\start-dfs.cmd

g盘根目录下创建个txt文件, test.txt, 里面写 Hello hadoop

接着cmd

bin\hdfs dfs -put test.txt /

接着启动yarn

sbin\start-yarn.cmd

接着执行任务

bin\yarn jar g:\deploy\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.9.2.jar wordcount /tt.txt /out

*以上都是执行路径都在g盘deploy目录下, 即 g: 切换后, 进入 cd deploy


访问 http://localhost:8088/cluster/apps

本文参考: https://cwiki.apache.org/confluence/display/HADOOP2/Hadoop2OnWindows