hadoop wordcount运行实例

root@hadoop1:/opt/hadoop# echo “hello hadoop world” > /tmp/test_file1.txt
root@hadoop1:/opt/hadoop# cat /tmp/test_file1.txt
hello hadoop world
root@hadoop1:/opt/hadoop# echo “hello hadoop world,I’m lpxuan” > /tmp/test_file2.txt

root@hadoop1:/opt/hadoop# cat /tmp/test_file2.txt
hello hadoop world,I’m lpxuan

root@hadoop1:/opt/hadoop# bin/hadoop dfs -copyFromLocal /tmp/test*.txt test-in

adoop# bin/hadoop dfs -ls test-in
Found 2 items
-rw-r–r– 1 root supergroup 0 2011-07-07 14:03 /user/root/test-in/test_file1.txt
-rw-r–r– 1 root supergroup 0 2011-07-07 14:03 /user/root/test-in/test_file2.txt

root@hadoop1:/opt/hadoop# bin/hadoop jar hadoop-0.19.0-examples.jar wordcount test-in test-out
11/07/07 14:27:40 INFO mapred.FileInputFormat: Total input paths to process : 2
11/07/07 14:27:40 INFO mapred.JobClient: Running job: job_201107071424_0002
11/07/07 14:27:41 INFO mapred.JobClient: map 0% reduce 0%
11/07/07 14:28:07 INFO mapred.JobClient: map 33% reduce 0%
11/07/07 14:28:54 INFO mapred.JobClient: map 66% reduce 0%
11/07/07 14:28:56 INFO mapred.JobClient: map 100% reduce 0%
11/07/07 14:29:06 INFO mapred.JobClient: map 100% reduce 11%
11/07/07 14:29:09 INFO mapred.JobClient: map 100% reduce 100%
11/07/07 14:29:13 INFO mapred.JobClient: Job complete: job_201107071424_0002
11/07/07 14:29:14 INFO mapred.JobClient: Counters: 16
11/07/07 14:29:14 INFO mapred.JobClient:   File Systems
11/07/07 14:29:14 INFO mapred.JobClient:     HDFS bytes read=56
11/07/07 14:29:14 INFO mapred.JobClient:     HDFS bytes written=46
11/07/07 14:29:14 INFO mapred.JobClient:     Local bytes read=97
11/07/07 14:29:14 INFO mapred.JobClient:     Local bytes written=290
11/07/07 14:29:14 INFO mapred.JobClient:   Job Counters
11/07/07 14:29:14 INFO mapred.JobClient:     Launched reduce tasks=1
11/07/07 14:29:14 INFO mapred.JobClient:     Launched map tasks=3
11/07/07 14:29:14 INFO mapred.JobClient:     Data-local map tasks=3
11/07/07 14:29:14 INFO mapred.JobClient:   Map-Reduce Framework
11/07/07 14:29:14 INFO mapred.JobClient:     Reduce input groups=5
11/07/07 14:29:14 INFO mapred.JobClient:     Combine output records=7
11/07/07 14:29:14 INFO mapred.JobClient:     Map input records=2
11/07/07 14:29:14 INFO mapred.JobClient:     Reduce output records=5
11/07/07 14:29:14 INFO mapred.JobClient:     Map output bytes=77
11/07/07 14:29:14 INFO mapred.JobClient:     Map input bytes=49
11/07/07 14:29:14 INFO mapred.JobClient:     Combine input records=7
11/07/07 14:29:14 INFO mapred.JobClient:     Map output records=7
11/07/07 14:29:14 INFO mapred.JobClient:     Reduce input records=7

root@hadoop1:/opt/hadoop# bin/hadoop dfs -ls test-out
Found 2 items
drwxr-xr-x – root supergroup 0 2011-07-07 14:27 /user/root/test-out/_logs
-rw-r–r– 1 root supergroup 46 2011-07-07 14:29 /user/root/test-out/part-00000

root@hadoop1:/opt/hadoop# bin/hadoop dfs -cat /user/root/test-out/part-00000
hadoop   2
hello   2
lpxuan   1
world   1
world,I’m   1

–FAQ:
root@hadoop1:/opt/hadoop# bin/hadoop jar hadoop-0.19.0-examples.jar wordcount test-in test-out
11/07/07 14:18:12 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)

11/07/07 14:18:12 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar retries left 4
11/07/07 14:18:12 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)

11/07/07 14:18:12 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar retries left 3
11/07/07 14:18:13 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)

11/07/07 14:18:13 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar retries left 2
11/07/07 14:18:15 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)

11/07/07 14:18:15 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar retries left 1
11/07/07 14:18:18 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)

11/07/07 14:18:18 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
11/07/07 14:18:18 WARN hdfs.DFSClient: Could not get block locations. Aborting…
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)

at org.apache.hadoop.ipc.Client.call(Client.java:696)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
Exception closing file /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:198)
at org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java:65)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3084)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3053)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:942)
at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:210)
at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:243)
at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1413)
at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:236)
at org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:221)

Everybody as a beginner to hadoop must have got this. There are a number of reasons I know of. The most common is that you have reformatted the namenode leaving it in an inconsistent state. The most common solution is to stop dfs, remove the contents of the dfs directories on all the machines, run “hadoop namenode -format” on the controller, then restart dfs. That consistently fixes the problem for me. This may be serious overkill but it works.

NOTE: You will lose all of the contents of your HDFS file system.
However, this did not solve my problem on this occasion!

Another reason that may be the cause this problem is that there may not be much space on the namenode for its operation which was precisely the problem which I faced. Clear some space for hadoop to launch its operations and you are done.

转载请注明：数据分析 » hadoop wordcount运行实例_ hadoop wordcount运行_hadoop2运行wordcount