Hadoop - install for windows (설치 및 설정하기)

CODEDRAGON Development/Java

반응형

 

 

 

Hadoop - install for windows

  • 하둡 설치파일 압축해제
  • 환경변수 추가하기 
  • 정상 설치 확인하기
  • HDFS configurations
  • YARN configurations
  • Initialize environment variables
  • Format file system 설정
  • Start HDFS daemons
  • Start YARN daemons
  • 설정 파일 다운로드

 


 

하둡 설치파일 압축해제

압축 프로그램을 관리자 권한으로 실행합니다.

[Run as administrator]

 

 

 

다운받은 압축파일을 압축해제합니다.

 

 

 

[풀기]

 

 

 

CodeLab 폴더 선택 >> [확인]

 

 

 

 

 

 

 

 

환경변수 추가하기

 

환경변수 새로 만들기

변수 이름 HADOOP_HOME
변수 C:\CodeLab\hadoop-3.1.3

 

 

PATH 경로 추가하기

변수 이름 PATH
변수 %HADOOP_HOME%\bin

  

 

 

 

정상 설치 확인하기

 

hadoop -version

C:\CodeLab>hadoop -version
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
 
C:\CodeLab>

 

 

 

HDFS configurations

 

hadoop-env.cmd

hadoop-env.cmd 오픈합니다.

C:\CodeLab\hadoop-3.1.3\etc\hadoop

 

 

 

4개의 라인을 가장 아래에 추가합니다.

set HADOOP_PREFIX=%HADOOP_HOME%
set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoop
set YARN_CONF_DIR=%HADOOP_CONF_DIR%
set PATH=%PATH%;%HADOOP_PREFIX%\bin

 

 

 

core-site.xml

core-site.xml 오픈합니다.

 

C:\CodeLab\hadoop-3.1.3\etc\hadoop

 

 

property 태그를 추가해 줍니다.

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://0.0.0.0:19000</value>
</property>
</configuration>

 

 

 

data 폴더 생성

C:\CodeLab\hadoop-3.1.3 아래에 data 폴더를 생성합니다.

C:\CodeLab\hadoop-3.1.3

 

 

 

C:\CodeLab\hadoop-3.1.3\data안에 "namenode" 폴더와 "datanode" 폴더를 생성합니다.

C:\CodeLab\hadoop-3.1.3\data

 

 

 

 

hdfs-site.xml

hdfs-site.xml 오픈합니다.

C:\CodeLab\hadoop-3.1.3\etc\hadoop

 

 

 

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///C:/CodeLab/hadoop-3.1.3/data/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///C:/CodeLab/hadoop-3.1.3/data/datanode</value>
</property>
</configuration>

  

 

namespace, logs and data files 저장됩니다.

 

 

 

YARN configurations

 

mapred-site.xml

mapred-site.xml파일을 오픈합니다.

 

 

 

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
 
<property>
<name>mapreduce.jobtracker.address</name>
<value>local</value>
</property>
</configuration>

 

 

 

 

yarn-site.xml

yarn-site.xml파일을 오픈합니다.

 

C:\CodeLab\hadoop-3.1.3\etc\hadoop

 

 

<configuration>
 
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.server.resourcemanager.address</name>
<value>0.0.0.0:8020</value>
</property>
 
 
<property>
<name>yarn.server.resourcemanager.application.expiry.interval</name>
<value>60000</value>
</property>
 
 
<property>
<name>yarn.server.nodemanager.address</name>
<value>0.0.0.0:45454</value>
</property>
 
 
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
 
 
<property>
<name>yarn.server.nodemanager.remote-app-log-dir</name>
<value>/app-logs</value>
</property>
 
 
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/dep/logs/userlogs</value>
</property>
 
 
<property>
<name>yarn.server.mapreduce-appmanager.attempt-listener.bindAddress</name>
<value>0.0.0.0</value>
</property>
 
 
<property>
<name>yarn.server.mapreduce-appmanager.client-service.bindAddress</name>
<value>0.0.0.0</value>
</property>
 
 
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
 
 
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>-1</value>
</property>
 
 
<property>
<name>yarn.application.classpath</name>
<value>%HADOOP_CONF_DIR%,%HADOOP_COMMON_HOME%/share/hadoop/common/*,%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*</value>
</property>
 
</configuration>

 

 

 

 

Initialize environment variables

환경변수를 설정해주는 hadoop-env.cmd 파일이 있는 위치로 이동합니다.

C:\CodeLab\hadoop-3.1.3\etc\hadoop

C:\CodeLab>cd C:\CodeLab\hadoop-3.1.3\etc\hadoop
 
C:\CodeLab\hadoop-3.1.3\etc\hadoop>

 

C:\CodeLab\hadoop-3.1.3\etc\hadoop>dir *.cmd
 Volume in drive C has no label.
 Volume Serial Number is CEC6-6B66
 
 Directory of C:\CodeLab\hadoop-3.1.3\etc\hadoop
 
10/27/2019  08:14 PM             4,154 hadoop-env.cmd
09/12/2019  01:11 PM               951 mapred-env.cmd
09/12/2019  01:06 PM             2,250 yarn-env.cmd
               3 File(s)          7,355 bytes
               0 Dir(s)  147,772,735,488 bytes free
 
C:\CodeLab\hadoop-3.1.3\etc\hadoop>

 

C:\CodeLab\hadoop-3.1.3\etc\hadoop>hadoop-env.cmd
 
C:\CodeLab\hadoop-3.1.3\etc\hadoop>

 

 

 

Format file system 설정

 

C:\CodeLab\hadoop-3.1.3\etc\hadoop>hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
2019-10-27 20:46:46,750 WARN util.Shell: Did not find winutils.exe: {}
java.io.FileNotFoundException: Could not locate Hadoop executable: C:\CodeLab\hadoop-3.1.3\bin\winutils.exe -see https://wiki.apache.org/hadoop/WindowsProblems
        at org.apache.hadoop.util.Shell.getQualifiedBinInner(Shell.java:620)
        at org.apache.hadoop.util.Shell.getQualifiedBin(Shell.java:593)
        at org.apache.hadoop.util.Shell.<clinit>(Shell.java:690)
        at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:78)
        at org.apache.hadoop.hdfs.server.common.HdfsServerConstants$RollingUpgradeStartupOption.getAllOptionString(HdfsServerConstants.java:127)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<clinit>(NameNode.java:324)
2019-10-27 20:46:46,996 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = CODEMASTER/xxx.xxx.xxx.xxx
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 3.1.3
STARTUP_MSG:   classpath = C:\CodeLab\hadoop-3.1.3\etc\hadoop;C:\CodeLab\hadoop-3.1.3\share\hadoop\common;C:\CodeLab\hadoop-3.1.3\share\hadoop\common\lib\accessors-smart-1.2.jar;C:\CodeLab\hadoop-3.1.3\share\hadoop\common\lib\animal-sniffer-annotations-1.17.jar;C:\CodeLab\hadoop-3.1.3\share\hadoop\common\lib\asm-5.0.4.jar;C:\CodeLab\hadoop-3.1.3\share\hadoop\common\lib\audience-annotations-0.5.0.jar;C:\CodeLab\hadoop-3.1.3\share\hadoop\common\lib\avro-1.7
 
... 생략
 
STARTUP_MSG:   build = https://gitbox.apache.org/repos/asf/hadoop.git -r ba631c436b806728f8ec2f54ab1e289526c90579; compiled by 'ztang' on 2019-09-12T02:47Z
STARTUP_MSG:   java = 1.8.0_65
************************************************************/
2019-10-27 20:46:47,347 INFO namenode.NameNode: createNameNode [-format]
2019-10-27 20:46:47,779 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Formatting using clusterid: CID-2a188cb9-4f77-4c04-ba4c-1307eb61d1d5
2019-10-27 20:46:50,687 INFO namenode.FSEditLog: Edit logging is async:true
2019-10-27 20:46:51,052 INFO namenode.FSNamesystem: KeyProvider: null
2019-10-27 20:46:51,054 INFO namenode.FSNamesystem: fsLock is fair: true
2019-10-27 20:46:51,056 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
2019-10-27 20:46:51,158 INFO namenode.FSNamesystem: fsOwner             = codedragon (auth:SIMPLE)
2019-10-27 20:46:51,159 INFO namenode.FSNamesystem: supergroup          = supergroup
2019-10-27 20:46:51,160 INFO namenode.FSNamesystem: isPermissionEnabled = true
2019-10-27 20:46:51,164 INFO namenode.FSNamesystem: HA Enabled: false
2019-10-27 20:46:51,236 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2019-10-27 20:46:51,311 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000
2019-10-27 20:46:51,311 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
2019-10-27 20:46:51,322 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
2019-10-27 20:46:51,323 INFO blockmanagement.BlockManager: The block deletion will start around 2019 Oct 27 20:46:51
2019-10-27 20:46:51,340 INFO util.GSet: Computing capacity for map BlocksMap
2019-10-27 20:46:51,340 INFO util.GSet: VM type       = 64-bit
2019-10-27 20:46:51,362 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
2019-10-27 20:46:51,362 INFO util.GSet: capacity      = 2^21 = 2097152 entries
2019-10-27 20:46:51,385 INFO blockmanagement.BlockManager: dfs.block.access.token.enable = false
2019-10-27 20:46:51,394 INFO Configuration.deprecation: No unit for dfs.namenode.safemode.extension(30000) assuming MILLISECONDS
2019-10-27 20:46:51,394 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
2019-10-27 20:46:51,394 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
2019-10-27 20:46:51,395 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
2019-10-27 20:46:51,396 INFO blockmanagement.BlockManager: defaultReplication         = 1
2019-10-27 20:46:51,397 INFO blockmanagement.BlockManager: maxReplication             = 512
2019-10-27 20:46:51,397 INFO blockmanagement.BlockManager: minReplication             = 1
2019-10-27 20:46:51,397 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
2019-10-27 20:46:51,398 INFO blockmanagement.BlockManager: redundancyRecheckInterval  = 3000ms
2019-10-27 20:46:51,398 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
2019-10-27 20:46:51,398 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
2019-10-27 20:46:51,457 INFO namenode.FSDirectory: GLOBAL serial map: bits=24 maxEntries=16777215
2019-10-27 20:46:51,497 INFO util.GSet: Computing capacity for map INodeMap
2019-10-27 20:46:51,497 INFO util.GSet: VM type       = 64-bit
2019-10-27 20:46:51,498 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
2019-10-27 20:46:51,499 INFO util.GSet: capacity      = 2^20 = 1048576 entries
2019-10-27 20:46:51,501 INFO namenode.FSDirectory: ACLs enabled? false
2019-10-27 20:46:51,501 INFO namenode.FSDirectory: POSIX ACL inheritance enabled? true
2019-10-27 20:46:51,502 INFO namenode.FSDirectory: XAttrs enabled? true
2019-10-27 20:46:51,503 INFO namenode.NameNode: Caching file names occurring more than 10 times
2019-10-27 20:46:51,512 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: false, skipCaptureAccessTimeOnlyChange: false, snapshotDiffAllowSnapRootDescendant: true, maxSnapshotLimit: 65536
2019-10-27 20:46:51,515 INFO snapshot.SnapshotManager: SkipList is disabled
2019-10-27 20:46:51,542 INFO util.GSet: Computing capacity for map cachedBlocks
2019-10-27 20:46:51,542 INFO util.GSet: VM type       = 64-bit
2019-10-27 20:46:51,543 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
2019-10-27 20:46:51,544 INFO util.GSet: capacity      = 2^18 = 262144 entries
2019-10-27 20:46:51,559 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2019-10-27 20:46:51,559 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2019-10-27 20:46:51,560 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2019-10-27 20:46:51,564 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
2019-10-27 20:46:51,564 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2019-10-27 20:46:51,567 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2019-10-27 20:46:51,567 INFO util.GSet: VM type       = 64-bit
2019-10-27 20:46:51,567 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
2019-10-27 20:46:51,567 INFO util.GSet: capacity      = 2^15 = 32768 entries
2019-10-27 20:46:51,639 INFO namenode.FSImage: Allocated new BlockPoolId: BP-2118333937-xxx.xxx.xxx.xxx-1572176811632
2019-10-27 20:46:51,752 INFO common.Storage: Storage directory C:\CodeLab\hadoop-3.1.3\data\namenode has been successfully formatted.
2019-10-27 20:46:51,838 INFO namenode.FSImageFormatProtobuf: Saving image file C:\CodeLab\hadoop-3.1.3\data\namenode\current\fsimage.ckpt_0000000000000000000 using no compression
2019-10-27 20:46:51,993 INFO namenode.FSImageFormatProtobuf: Image file C:\CodeLab\hadoop-3.1.3\data\namenode\current\fsimage.ckpt_0000000000000000000 of size 394 bytes saved in 0 seconds .
2019-10-27 20:46:52,190 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2019-10-27 20:46:52,230 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid = 0 when meet shutdown.
2019-10-27 20:46:52,230 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at CODEMASTER/xxx.xxx.xxx.xxx
************************************************************/
 
C:\CodeLab\hadoop-3.1.3\etc\hadoop>

HDFS configurations 기반으로한 경로가 보여집니다. (Bold처리)

 

 

 

Start HDFS daemons

 

C:\CodeLab\hadoop-3.1.3\etc\hadoop>%HADOOP_HOME%\sbin\start-dfs.cmd
 
C:\CodeLab\hadoop-3.1.3\etc\hadoop>

 

두개의 명령어 창이 오픈됩니다. namenode datanode 위한 것입니다.

 

[Allow access]

 

 

 

Start YARN daemons

 

%HADOOP_HOME%\sbin\start-all.cmd

C:\CodeLab\hadoop-3.1.3\etc\hadoop>%HADOOP_HOME%\sbin\start-all.cmd
This script is Deprecated. Instead use start-dfs.cmd and start-yarn.cmd
starting yarn daemons
 
C:\CodeLab\hadoop-3.1.3\etc\hadoop>

 

4개가 구동됩니다.

Hadoop Namenode

Hadoop datanode

YARN Resourc Manager

YARN Node Manager

 

 

 

Resource manager 오픈

YRAN 웹사이트를 통해 job status 있습니다.

http://localhost:8088

 

 

 

설정 파일 다운로드

 

hadoop-configurations.zip
다운로드