本站分享:大数据、数据分析师考试认证培训,包括:Python培训Excel培训Matlab培训SPSS培训SAS培训R语言培训Hadoop培训Amos培训Stata培训Eviews培训广告位

使用IBM的MapReduce Tools for Eclipse插件简化Hadoop开发和部署

hadoop培训 cdadata 1066℃

使用Cygwin模拟Linux环境,配置ssh以及认证就非常麻烦了,不过真要是走一遍那个流程,会学会不少东西的啊。

IBM的MapReduce Tools for Eclipse插件,极大地简化了这些配置,你可以想运行一个Java类一样轻松进行开发、调试和部署。

下载IBM的MapReduce Tools for Eclipse插件,地址是http://www.alphaworks.ibm.com/tech/mapreducetools,下载完成后,解压缩,将plugins目录下的文件夹拷贝到Eclipse目录下的plugins目录下,启动Eclipse,进行一番简单地配置就能进行Hadoop的开发、调试和部署了。

配置过程:

启动Eclipse,选择Window—>Preferences,弹出如图所示的对话框:

使用IBM的MapReduce Tools for Eclipse插件简化Hadoop开发和部署

设置Hadoop Main Directory为自己下载的Hadoop发行包的解压包所在目录。设置完成后单击“OK”完成。

新建一个 Project ,选择MapReduce Project,如图所示:

使用IBM的MapReduce Tools for Eclipse插件简化Hadoop开发和部署

继续进行创建,选择填写工程名后,完成一个MapReduce Project工程的创建,可以进行Hadoop程序的开发了。

比如,我直接把Hadoop自带的WordCount类程序一点不动地拷贝过来,修改包名。

然后进行运行时配置,选择Run As—>Open Debug Dialog选项,在Arguments选项卡中设置:

使用IBM的MapReduce Tools for Eclipse插件简化Hadoop开发和部署

在其中填写两个目录,分别为数据输入目录和输出目录,中间用空格分隔:

G:hadoop-0.16.4in G:hadoop-0.16.4myout

然后,就可以像运行一个Java程序一样运行了,控制台上打印出执行任务的信息,如下所示:

08/09/21 22:35:47 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
08/09/21 22:35:47 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
08/09/21 22:35:47 INFO mapred.FileInputFormat: Total input paths to process : 7
08/09/21 22:35:48 INFO mapred.JobClient: Running job: job_local_1
08/09/21 22:35:48 INFO mapred.MapTask: numReduceTasks: 1
08/09/21 22:35:49 INFO mapred.JobClient: map 0% reduce 0%
08/09/21 22:35:50 INFO mapred.LocalJobRunner: file:/G:/hadoop-0.16.4/in/a.txt:0+1957
08/09/21 22:35:50 INFO mapred.TaskRunner: Task ‘job_local_1_map_0000’ done.
08/09/21 22:35:50 INFO mapred.TaskRunner: Saved output of task ‘job_local_1_map_0000’ to file:/G:/hadoop-0.16.4/myout
08/09/21 22:35:50 INFO mapred.MapTask: numReduceTasks: 1
08/09/21 22:35:50 INFO mapred.JobClient: map 100% reduce 0%
08/09/21 22:35:51 INFO mapred.LocalJobRunner: file:/G:/hadoop-0.16.4/in/b.txt:0+10109
08/09/21 22:35:51 INFO mapred.TaskRunner: Task ‘job_local_1_map_0001’ done.
08/09/21 22:35:51 INFO mapred.TaskRunner: Saved output of task ‘job_local_1_map_0001’ to file:/G:/hadoop-0.16.4/myout
08/09/21 22:35:51 INFO mapred.MapTask: numReduceTasks: 1
08/09/21 22:35:51 INFO mapred.LocalJobRunner: file:/G:/hadoop-0.16.4/in/c.txt:0+1957
08/09/21 22:35:51 INFO mapred.TaskRunner: Task ‘job_local_1_map_0002’ done.
08/09/21 22:35:51 INFO mapred.TaskRunner: Saved output of task ‘job_local_1_map_0002’ to file:/G:/hadoop-0.16.4/myout
08/09/21 22:35:51 INFO mapred.MapTask: numReduceTasks: 1
08/09/21 22:35:51 INFO mapred.LocalJobRunner: file:/G:/hadoop-0.16.4/in/d.txt:0+1987
08/09/21 22:35:51 INFO mapred.TaskRunner: Task ‘job_local_1_map_0003’ done.
08/09/21 22:35:51 INFO mapred.TaskRunner: Saved output of task ‘job_local_1_map_0003’ to file:/G:/hadoop-0.16.4/myout
08/09/21 22:35:52 INFO mapred.MapTask: numReduceTasks: 1
08/09/21 22:35:52 INFO mapred.LocalJobRunner: file:/G:/hadoop-0.16.4/in/e.txt:0+1957
08/09/21 22:35:52 INFO mapred.TaskRunner: Task ‘job_local_1_map_0004’ done.
08/09/21 22:35:52 INFO mapred.TaskRunner: Saved output of task ‘job_local_1_map_0004’ to file:/G:/hadoop-0.16.4/myout
08/09/21 22:35:52 INFO mapred.MapTask: numReduceTasks: 1
08/09/21 22:35:52 INFO mapred.LocalJobRunner: file:/G:/hadoop-0.16.4/in/f.txt:0+1985
08/09/21 22:35:52 INFO mapred.TaskRunner: Task ‘job_local_1_map_0005’ done.
08/09/21 22:35:52 INFO mapred.TaskRunner: Saved output of task ‘job_local_1_map_0005’ to file:/G:/hadoop-0.16.4/myout
08/09/21 22:35:52 INFO mapred.MapTask: numReduceTasks: 1
08/09/21 22:35:53 INFO mapred.LocalJobRunner: file:/G:/hadoop-0.16.4/in/g.txt:0+1957
08/09/21 22:35:53 INFO mapred.TaskRunner: Task ‘job_local_1_map_0006’ done.
08/09/21 22:35:53 INFO mapred.TaskRunner: Saved output of task ‘job_local_1_map_0006’ to file:/G:/hadoop-0.16.4/myout
08/09/21 22:35:53 INFO mapred.LocalJobRunner: file:/G:/hadoop-0.16.4/in/b.txt:0+10109
08/09/21 22:35:54 INFO mapred.JobClient: map 28% reduce 0%
08/09/21 22:35:54 INFO mapred.LocalJobRunner: file:/G:/hadoop-0.16.4/in/c.txt:0+1957
08/09/21 22:35:54 INFO mapred.LocalJobRunner: reduce > reduce
08/09/21 22:35:54 INFO mapred.TaskRunner: Task ‘reduce_xk6d4v’ done.
08/09/21 22:35:54 INFO mapred.TaskRunner: Saved output of task ‘reduce_xk6d4v’ to file:/G:/hadoop-0.16.4/myout
08/09/21 22:35:55 INFO mapred.JobClient: Job complete: job_local_1
08/09/21 22:35:55 INFO mapred.JobClient: Counters: 9
08/09/21 22:35:55 INFO mapred.JobClient:   Map-Reduce Framework
08/09/21 22:35:55 INFO mapred.JobClient:     Map input records=7
08/09/21 22:35:55 INFO mapred.JobClient:     Map output records=3649
08/09/21 22:35:55 INFO mapred.JobClient:     Map input bytes=21909
08/09/21 22:35:55 INFO mapred.JobClient:     Map output bytes=36511
08/09/21 22:35:55 INFO mapred.JobClient:     Combine input records=3649
08/09/21 22:35:55 INFO mapred.JobClient:     Combine output records=21
08/09/21 22:35:55 INFO mapred.JobClient:     Reduce input groups=7
08/09/21 22:35:55 INFO mapred.JobClient:     Reduce input records=21
08/09/21 22:35:55 INFO mapred.JobClient:     Reduce output records=7

和使用Cygwin模拟时的运行过程信息是一致的。

有了这个MapReduce Tools 插件,可真是太方便了。

 

转载请注明:数据分析 » 使用IBM的MapReduce Tools for Eclipse插件简化Hadoop开发和部署

喜欢 (1)or分享 (0)