本站分享:AI、大数据、数据分析师培训认证考试,包括:Python培训Excel培训Matlab培训SPSS培训SAS培训R语言培训Hadoop培训Amos培训Stata培训Eviews培训

Hadoop的安装和配置_Hadoop的安装和配置文件

hadoop培训 cdadata 2683℃

Hadoop的安装和配置_Hadoop的安装和配置文件

1.创建Hadoop用户(在所有的机器上都要做一遍)

#useradd –m hadoop

#passwd  hadoop

2.使master可以以无密码的方式ssh登录到slaves上

在master和slaves上

# vi /etc/ssh/sshd_config

RSAAuthentication yes

PubkeyAuthentication yes

# /etc/init.d/sshd restart

#su – hadoop

$ ssh-keygen -t rsa

把master的public key拷贝到它自己和slaves机器上,把slaves上的public key拷贝到master上

在master上

$ ssh-copy-id -i ~/.ssh/id_rsa.pub aca712e4.ipt.aol.com

$ ssh-copy-id -i ~/.ssh/id_rsa.pub aca712e5.ipt.aol.com

$ ssh-copy-id -i ~/.ssh/id_rsa.pub aca712 e6.ipt.aol.com

$ ssh-copy-id -i ~/.ssh/id_rsa.pub aca712 e7.ipt.aol.com

在slaves上

$ ssh-copy-id -i ~/.ssh/id_rsa.pub aca712e4.ipt.aol.com

$ ssh-copy-id -i ~/.ssh/id_rsa.pub aca712e5.ipt.aol.com

$ ssh-copy-id -i ~/.ssh/id_rsa.pub aca712 e6.ipt.aol.com

$ ssh-copy-id -i ~/.ssh/id_rsa.pub aca712 e7.ipt.aol.com

3.下载和安装JDK

$wget

http://download.oracle.com/otn-pub/java/jdk/7u3-b04/jdk-7u3-linux-i586.tar.gz

$tar -xvf jdk-7u3-linux-i586.tar.gz

同步jdk1.7.0_03文件夹:把hadoop文件夹复制到slaves节点的相同位置

$scp -r jdk1.7.0_03 aca712e5.ipt.aol.com:~

$scp -r jdk1.7.0_03 aca712e6.ipt.aol.com:~

$scp -r jdk1.7.0_03 aca712e7.ipt.aol.com:~

4.下载并安装Hadoop(只需要在master上做即可,最后可以同步文件夹到slaves上)

$wget http://labs.renren.com/apache-mirror/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz

$tar -xzf hadoop-0.20.2.tar.gz

$mv Hadoop-0.20.2 hadoop

5.添加java环境变量

$vi .bashrc

# .bashrc

# Source globaldefinitions

if [ -f /etc/bashrc]; then

. /etc/bashrc

fi

# User specificaliases and functions

export JAVA_HOME=/home/hadoop/jdk1.7.0_03

export HADOOP_HOME=/home/hadoop/hadoop/

exportCLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib

exportPATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin

$source ~/.bashrc

$java -version查看

同步 .bashrc文件夹:把hadoop文件夹复制到slaves节点的相同位置

$scp -r .bashrc aca712e5.ipt.aol.com:~

$scp -r  .bashrc aca712e6.ipt.aol.com:~

$scp -r  .bashrc aca712e7.ipt.aol.com:~

6.设置conf/*里的文件

conf/hadoop-env.sh修改HADOOP_PID_DIR变量的值

exportHADOOP_PID_DIR=${HADOOP_HOME}/pids(需要自己创建该目录文件pids)

配置namenode的IP和端口

conf/core-site.xml添加fs.default.name

-bash-3.2$ catcore-site.xml

<?xmlversion=”1.0″?>

<?xml-stylesheettype=”text/xsl” href=”configuration.xsl”?>

<!– Putsite-specific property overrides in this file. –>

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://172.167.18.228:9000</value>   //namenode的公网IP地址

</property>

</configuration>

配置Jobtracker的IP和端口

conf/mapred-site.xml添加mapred.job.tracker

-bash-3.2$ catmapred-site.xml

<?xmlversion=”1.0″?>

<?xml-stylesheettype=”text/xsl” href=”configuration.xsl”?>

<!– Putsite-specific property overrides in this file. –>

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>172.167.18.228:9001</value>  //Jobtracker的公网IP地址

</property>

 

</configuration>

配置数据备份数量

站点节点配置:hdfs-site.xml

<configuration>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

<name>dfs.name.dir</name>

<value>/home/hadoop/name</value>

</property>

<property>

<name>dfs.data.dir</name>

<value>/home/hadoop/data</value>

</property>

<property>

<name>dfs.tmp.dir</name>

<value>/home/hadoop/tmp</value>

</property>

</configuration>

 

conf/slaves添加slaves机器名

-bash-3.2$ catslaves

aca712e5.ipt.aol.com

aca712e6.ipt.aol.com

aca712e7.ipt.aol.com

 

conf/masters添加secondary master

-bash-3.2$ catmasters

aca712e4.ipt.aol.com  //最好使用这种地址,不用localhost,便于后面同步到其他机器

同步hadoop文件夹:把hadoop文件夹复制到slaves节点的相同位置

$scp -r hadoopaca712e5.ipt.aol.com:~

$scp -r hadoopaca712e6.ipt.aol.com:~

$scp -r hadoopaca712e7.ipt.aol.com:~

7.在master上格式化hdfs

-bash-3.2$bin/hadoopnamenode -format

注意:重复格式化会失败,需要重新格式化需要删除namenode的name目录及下所有文件,并删除datanode的data目录及下所有文件

8.在master上启动hdfs和mapreduce,自动生成日志文件

-bash-3.2$bin/start-all.sh

注意:需要重启应用时,先停止所有应用,而且要删除namenode和datanode上的logs目录下的文件

检查集群状态

[hadoop@aca712e4 bin]$ ./hadoop dfsadmin -report

9.web界面访问hdfs和Jobtracker

http://172.167.18.228:50070/dfshealth.jsp

http://172.167.18.228:50030/jobtracker.jsp

10.把WordCount.java放到主节点的家目录下

package org.myorg;

import java.io.IOException;

import java.util.*;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.conf.*;

import org.apache.hadoop.io.*;

import org.apache.hadoop.mapred.*;

import org.apache.hadoop.util.*;

public class wordcount{

public static class Map extends MapReduceBase implementsMapper<LongWritable, Text, Text, IntWritable> {

private final static IntWritable one = newIntWritable(1);

private Text word = new Text();

public void map(LongWritable key, Text value, OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException {

String line = value.toString();

StringTokenizer tokenizer = new StringTokenizer(line);

while (tokenizer.hasMoreTokens()) {

word.set(tokenizer.nextToken());

output.collect(word, one);

}

}

}

public static class Reduce extends MapReduceBase implementsReducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator<IntWritable> values,OutputCollector<Text, IntWritable> output, Reporter reporter) throwsIOException {

int sum = 0;

while (values.hasNext()) {

sum += values.next().get();

}

output.collect(key, new IntWritable(sum));

}

}

public static void main(String[] args) throws Exception {

JobConf conf = new JobConf(wordcount.class);

conf.setJobName(“wordcount”);

conf.setOutputKeyClass(Text.class);

conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(Map.class);

conf.setCombinerClass(Reduce.class);

conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);

conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));

FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);

}

}

11.编译WordCount.java,并打包生成的文件到wordcount.jar

$ mkdirwordcount_classes

$javac -classpathhadoop/hadoop-0.20.2-core.jar -d wordcount_classes wordcount.java

$ jar -cvfwordcount.jar -C wordcount_classes/ .

12.准备输入文件夹

$ ./hadoop fs-mkdir input

$ echo “HelloWorld Bye World”>>file01

$ echo “HelloHadoop Bye Hadoop”>>file02

$ ./hadoop fs -putfile01 input/

$ ./hadoop fs -putfile02 input/

$ ./hadoop fs -lsinput/

Found 2 items

-rw-r–r–   2hadoop supergroup         22 2011-11-2819:10 /user/hadoop/input/file01

-rw-r–r–   2 hadoop supergroup         24 2011-11-28 19:11/user/hadoop/input/file02

13.运行WordCount

$ ./hadoop jar/home/hadoop/wordcount.jar org.myorg.wordcount input output

14.查看结果

$ ./hadoop fs -lsoutput

Found 2 items

drwxr-xr-x   – hadoop supergroup          0 2011-11-28 19:16/user/hadoop/output/_logs

-rw-r–r–   2 hadoop supergroup         31 2011-11-28 19:16/user/hadoop/output/part-00000

$ ./hadoop fs -catoutput/part-00000

Bye     2

Hadoop  2

Hello   2

World   2

转载请注明:数据分析 » Hadoop的安装和配置_Hadoop的安装和配置文件

喜欢 (0)or分享 (0)