java - Map and Reduce with large datasets = how does it work in practice? -


i thankfull advice:

http://en.wikipedia.org/wiki/mapreduce states: "...a large server farm can use mapreduce sort petabyte of data in few hours..." , "...the master node takes input, partitions smaller sub-problems, , distributes worker nodes..."

i not understand how work in practice. given have san(storage) 1 petabyte of data. how can distrubute amout of data efficiently through "master" slaves? thats can not understand. given have 10gibt connection san master, , masters slave 1 gbit, can @ maximum "spread" 10gbit @ time. how can process petabytes withing several hours,as first have transfer data "reducer/worker nodes"?

thanks much! jens

actually, on full-blown map/reduce framework, such hadoop, data storage distributed. hadoop, example, has hdfs distributed file storage system allows both redudancy , high performance. filesystem nodes can used computing nodes, or can dedicated storage nodes, depending on how framework has been deployed.

usually, when mentioning computing times in case, assumed input data exists in distributed storage of cluster. master node merely feeds computing nodes data ranges process - not data itself.


Comments

Popular posts from this blog

php - What is the difference between $_SERVER['PATH_INFO'] and $_SERVER['ORIG_PATH_INFO']? -

fortran - Function return type mismatch -

queue - mq_receive: message too long -