java - Map and Reduce with large datasets = how does it work in practice? -

- May 15, 2011

i thankfull advice:

http://en.wikipedia.org/wiki/mapreduce states: "...a large server farm can use mapreduce sort petabyte of data in few hours..." , "...the master node takes input, partitions smaller sub-problems, , distributes worker nodes..."

i not understand how work in practice. given have san(storage) 1 petabyte of data. how can distrubute amout of data efficiently through "master" slaves? thats can not understand. given have 10gibt connection san master, , masters slave 1 gbit, can @ maximum "spread" 10gbit @ time. how can process petabytes withing several hours,as first have transfer data "reducer/worker nodes"?

thanks much! jens

actually, on full-blown map/reduce framework, such hadoop, data storage distributed. hadoop, example, has hdfs distributed file storage system allows both redudancy , high performance. filesystem nodes can used computing nodes, or can dedicated storage nodes, depending on how framework has been deployed.

usually, when mentioning computing times in case, assumed input data exists in distributed storage of cluster. master node merely feeds computing nodes data ranges process - not data itself.

Search This Blog

Score

java - Map and Reduce with large datasets = how does it work in practice? -

Comments

Post a Comment

Popular posts from this blog

how to build hyperlink for query string in php -

php - What is the difference between $_SERVER['PATH_INFO'] and $_SERVER['ORIG_PATH_INFO']? -

queue - mq_receive: message too long -