Programmatically reading the output of Hadoop Mapreduce Program -
this may basic question, not find answer on google.
have map-reduce job creates multiple output files in output directory. java application executes job on remote hadoop cluster , after job finished, needs read output programatically using org.apache.hadoop.fs.filesystem api. possible?
application knows output directory, not names of output files generated map-reduce job. seems there no way programatically list contents of directory in hadoop file system api. how output files read?
seems such commonplace scenario, sure has solution. missing obvious.
the method looking called liststatus(path). returns files inside of path filestatus array. can loop on them create path object , read it.
filestatus[] fss = fs.liststatus(new path("/")); (filestatus status : fss) { path path = status.getpath(); sequencefile.reader reader = new sequencefile.reader(fs, path, conf); intwritable key = new intwritable(); intwritable value = new intwritable(); while (reader.next(key, value)) { system.out.println(key.get() + " | " + value.get()); } reader.close(); } for hadoop 2.x can setup reader this:
sequencefile.reader reader = new sequencefile.reader(conf, sequencefile.reader.file(path))
Comments
Post a Comment