Programmatically reading the output of Hadoop Mapreduce Program -

- March 15, 2010

this may basic question, not find answer on google.
have map-reduce job creates multiple output files in output directory. java application executes job on remote hadoop cluster , after job finished, needs read output programatically using org.apache.hadoop.fs.filesystem api. possible?
application knows output directory, not names of output files generated map-reduce job. seems there no way programatically list contents of directory in hadoop file system api. how output files read?
seems such commonplace scenario, sure has solution. missing obvious.

the method looking called liststatus(path). returns files inside of path filestatus array. can loop on them create path object , read it.

    filestatus[] fss = fs.liststatus(new path("/"));     (filestatus status : fss) {         path path = status.getpath();         sequencefile.reader reader = new sequencefile.reader(fs, path, conf);         intwritable key = new intwritable();         intwritable value = new intwritable();         while (reader.next(key, value)) {             system.out.println(key.get() + " | " + value.get());         }         reader.close();     }

for hadoop 2.x can setup reader this:

 sequencefile.reader reader =             new sequencefile.reader(conf, sequencefile.reader.file(path))

Search This Blog

Score

Programmatically reading the output of Hadoop Mapreduce Program -

Comments

Post a Comment

Popular posts from this blog

how to build hyperlink for query string in php -

php - What is the difference between $_SERVER['PATH_INFO'] and $_SERVER['ORIG_PATH_INFO']? -

queue - mq_receive: message too long -