Intel Parallel Studio timing inconsistencies -
i have code uses intel tbb , i'm running on 32 core machine. in code, use
parallel_for(blocked_range (2,left_image_width-2, left_image_width /32) ...
to spawn 32 threads concurrent work, there no race conditions , each thread given same amount of work. i'm using clock_t measure how long program takes. image, takes 19 seconds complete.
then ran code through intel parallel studio , ran code in 2 seconds. result expecting can't figure out why there's such large difference between two. time_t sum clock cycles on cores? doesn't make sense. below snippet in question.
clock_t begin=clock(); create_threads_and_do_work(); clock_t end=clock(); double diffticks=end-begin; double diffms=(diffticks*1000)/clocks_per_sec; cout<<"and time "<<diffms<<" ms"<<endl; any advice appreciated.
it's isn't quite clear if difference in run time result of 2 different inputs (images) or 2 different run-time measuring methods (clock_t difference vs. intel software measurement). furthermore, aren't showing goes on in create_threads_and_do_work(), , didn't mention tool within intel parallel studio using, vtune?
your clock_t difference method sum processing time of thread called (the main thread in example), might not count processing time of threads spawned within create_threads_and_do_work(). whether or doesn't depends on whether within function wait threads complete , exit function or if spawn threads , exit (before complete processing). if in function parallel_for(), clock_t difference should yield right result , should no different other run-time measurements.
within intel parallel studio there profiling tool called vtune. powerful tool , when run program through can view (in graphically pleasing way) processing time (as times called) of each function in code. i'm pretty sure after doing you'll figure out.
one last idea - did program complete course when using intel software? i'm asking because vtune collect data time , stop without allowing program complete.
Comments
Post a Comment