hadoop User가 Crontab을 실행할 수 있도록 설정

1. /etc/cron.allow에 user 추가.
vi cron.allow
hadoop

2. Crontab 설정 파일
crontab -u hadoop -e
0 * * * * shell_script_absolute_path

3. Crontab Restart
/etc/init.d/crond restart

4. Shell Script 작성
source ~/.bash_profile (Profile의 PATH 정보를 Update하기 위함)
hadoop jar ......

5. Crontab Log
cat /var/log/cron

6. 오류 메시지
User account has expired.
You (hadoop) are not allowed to access to (crontab) because of pam configuration.
chage -E mm/dd/yy {userId}
받은 트랙백이 없고, 댓글이 없습니다.

댓글+트랙백 RSS :: http://www.yongbi.net/rss/response/581

[Flume Agent]

vi flume.conf

#########################################################
#### Configure for Flume Agent (Gateway Log Agent)
#########################################################

agentA.sources = gwSource
agentA.channels = gwChannel
agentA.sinks = gwSink

##########################################
#### Configure for Source
##########################################

# For each one of the sources, the type is defined
agentA.sources.gwSource.type = exec
agentA.sources.gwSource.command = tail -Fs 180 /home/yusbha/nginx/logs/access.log
agentA.sources.gwSource.restart = true
agentA.sources.gwSource.restartThrottle=1000
agentA.sources.gwSource.interceptors = i1
agentA.sources.gwSource.interceptors.i1.type = timestamp
agentA.sources.gwSource.channels = gwChannel


##########################################
#### Configure for Channel
##########################################

# Each channel's type is defined.
agentA.channels.gwChannel.type = file
agentA.channels.gwChannel.dataDirs = /home/yusbha/flume/file-channel/data/01
agentA.channels.gwChannel.checkpointDir = /home/yusbha/flume/file-channel/checkpoint/01
agentA.channels.gwChannel.maxFileSize=524288000
agentA.channels.gwChannel.checkpointInterval=10000
agentA.channels.gwChannel.transactionCapacity=1000

##########################################
#### Configure for Sink
##########################################
# Each sink's type must be defined
agentA.sinks.gwSink.type = avro
agentA.sinks.gwSink.hostname = 172.27.106.48
agentA.sinks.gwSink.port = 35853
agentA.sinks.gwSink.channel = gwChannel

Flume Agent 실행 Command

bin/flume-ng agent --conf conf --conf-file conf/flume.conf --name agentA -Dflume.root.logger=INFO,console

[Flume Collector]

vi flume.conf

#########################################################
#### Configure for Flume Agent (Gateway Log Collector)
#########################################################

collector.sources = collectorSource
collector.channels = collectorChannel
collector.sinks = HDFS

##########################################
#### Configure for Source : 172.27.106.48
##########################################

# For each one of the sources, the type is defined.
collector.sources.collectorSource.type = avro
collector.sources.collectorSource.bind = 0.0.0.0
collector.sources.collectorSource.port = 35853
collector.sources.collectorSource.channels = collectorChannel

##########################################
#### Configure for Channel
##########################################
collector.channels.collectorChannel.type = memory

#collector.channels.collectorChannel.type = file
#collector.channels.collectorChannel.dataDirs = /home/hadoop/flume/file-channel/data/01
#collector.channels.collectorChannel.checkpointDir = /home/hadoop/flume/file-channel/checkpoint/01
#collector.channels.collectorChannel.transactionCapacity = 1000
#collector.channels.collectorChannel.checkpointInterval = 30000
#collector.channels.collectorChannel.maxFileSize = 2146435071
#collector.channels.collectorChannel.minimumRequiredSpace = 524288000
#collector.channels.collectorChannel.keep-alive = 5
#collector.channels.collectorChannel.write-timeout = 10
#collector.channels.collectorChannel.checkpoint-timeout = 600
#collector.channels.collectorChannel.capacity = 500000

##########################################
#### Configure for Sink
##########################################
# Each sink's type must be defined
collector.sinks.HDFS.type = hdfs
collector.sinks.HDFS.hdfs.path = hdfs://name.odp.kt.com/logs
collector.sinks.HDFS.hdfs.filePrefix = %Y%m%d%H%M%S
collector.sinks.HDFS.hdfs.fileType = DataStream
collector.sinks.HDFS.hdfs.fileSuffix = .log
collector.sinks.HDFS.hdfs.inUseSuffix = .work

collector.sinks.HDFS.hdfs.maxOpenFiles = 200
collector.sinks.HDFS.hdfs.rollSize = 0
collector.sinks.HDFS.hdfs.rollInterval = 60
collector.sinks.HDFS.hdfs.rollCount = 0
collector.sinks.HDFS.hdfs.rollTimerPoolSize = 1
collector.sinks.HDFS.hdfs.batchSize = 100
collector.sinks.HDFS.hdfs.threadsPoolSize = 1
collector.sinks.HDFS.hdfs.callTimeout = 60000

collector.sinks.HDFS.hdfs.writeFormat = TEXT
collector.sinks.HDFS.serializer = text
collector.sinks.HDFS.serializer.appendNewline = true
collector.sinks.HDFS.channel = collectorChannel

Flume Agent 실행 Command

bin/flume-ng agent --conf conf --conf-file conf/flume.conf --name collector -Dflume.root.logger=INFO,console
받은 트랙백이 없고, 댓글이 없습니다.

댓글+트랙백 RSS :: http://www.yongbi.net/rss/response/580

Hadoop Clustering

Articles 2013/09/09 18:05 용비
[공통]
OS : CentOS 6.4 64-bit
Java : OpenJDK 1.7
Hadoop : 1.2.1

1. 설치 서버
- Name Node : 172.27.106.48 (name.odp.kt.com)
- Data Node : 172.27.233.144 (data01.odp.kt.com)

2. OpenJDK 설치
- 실행 계정 : root
- 적용 서버 : 전체 서버 (name node, data node)
yum -y install *openJDK*

3. Hadoop 계정 추가
- 실행 계정 : root
- 적용 서버 : 전체 서버 (name node, data node)
groupadd hadoop
useradd -g hadoop hadoop
passwd hadoop

4. Host 설정
- 실행 계정 : root
- 적용 서버 : 전체 서버 (name node, data node)
vi /etc/hosts (hosts 파일 수정)
== 하단에 내용 추가
172.27.106.48 name.odp.kt.com
172.27.233.144 data01.odp.kt.com

5. 방화벽 설정
- 실행 계정 : root
- 적용 서버 : 전체
service iptables stop
chkconfig iptables off

[NameNode 서버 설정]
1. 데이터 디렉토리 생성
- 실행 계정 : hadoop
- 적용 서버 : name.odp.kt.com
mkdir $HOME/data
mkdir $HOME/data/name

2. SSH 접근 제어 설정
- 실행 계정 : hadoop
- 적용 서버 : name.odp.kt.com
== SSH 키 생성
ssh-keygen -t rsa

== SSH 키 배포
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@data01.odp.kt.com

3. Hadoop 설치
- 실행 계정 : hadoop
- 적용 서버 : name.odp.kt.com
- Hadoop Version : 1.1.2, 1.2.1
tar xvf hadoop-1.x.x-tar.gz

4. Hadoop 환경 설정
- 실행 계정 : hadoop
- 적용 서버 : name.odp.kt.com
vi hadoop-env.sh

export HADOOP_HOME=/home/hadoop/hadoop-1.2.1
export HADOOP_HOME_WARN_SUPPRESS="TRUE"
 
# export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64
export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64
export HADOOP_OPTS=-server

vi core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://name.odp.kt.com:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop-${user.name}</value>
</property>
</configuration>

vi hdfs-site.xml

<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/data/name,/home/hadoop/data/backup</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/data/node01,/home/hadoop/data/node02</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>30</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
<property>
<name>dfs.support.broken.append</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>true</value>
</property>
<property>
<name>dfs.web.ugi</name>
<value>hadoop,supergroup</value>
</property>
<property>
<name>dfs.permissions.supergroup</name>
<value>supergroup</value>
</property>
<property>
<name>dfs.upgrade.permission</name>
<value>0777</value>
</property>
<property>
<name>dfs.umaskmode</name>
<value>022</value>
</property>
<property>
<name>dfs.http.address</name>
<value>name.odp.kt.com:50070</value>
</property>
</configuration>

vi mapred-site.xml

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://name.odp.kt.com:9001</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/home/hadoop/data/mapred/system</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/hadoop/data/mapred/local</value>
</property>
</configuration>

vi conf/masters <= 수정 사항 없음. 일반적으로 secondary name node 정보 setup.

vi conf/slaves
== 아래 내용 추가
data01.odp.kt.com

Hadoop 설치 폴더 배포
scp -r /home/hadoop/hadoop-1.2.1 data01.odp.kt.com:/home/hadoop/hadoop-1.2.1

환경설정 배포
rsync -av /home/hadoop/hadoop-1.2.1/conf hadoop@data01.odp.kt.com:/home/hadoop/hadoop-1.2.1

Hadoop 실행
== NameNode 포맷
./hadoop namenode -format

== Hadoop 시작
./start-all.sh

== Hadoop Console 확인
./hadoop dfsadmin -report

== Hadoop 종료
./stop-all.sh

[DataNode 서버 설정]
- 실행 계정 : hadoop
- 적용 서버 : data01.odp.kt.com
mkdir $HOME/data
mkdir $HOME/data/node01
mkdir $HOME/data/node02

받은 트랙백이 없고, 댓글이 없습니다.

댓글+트랙백 RSS :: http://www.yongbi.net/rss/response/579