performance - How can I improve the speed of XML parsing in VBA -


i have large xml file needs parsed in vba (excel 2003 & 2007). there upwards of 11,000 'rows' of data in xml file each 'row' having between 10 , 20 'columns'. ends being huge task parse through , grab data (5 - 7 minutes). tried reading xml , placing each 'row' dictionary (key = row number, value = row attributes), takes long.

it taking forever traverse dom. there more efficient way?

dim xmldict     sub parsexml(byref rootnode ixmldomnode)         dim counter long         dim rowlist ixmldomnodelist         dim columnlist ixmldomnodelist         dim rownode ixmldomnode         dim columnnode ixmldomnode         counter = 1         set rowlist = rootnode.selectnodes("row")          each rownode in rowlist             set columnlist = rownode.selectnodes("col")             dim nodevalues string             each columnnode in columnlist                 nodevalues = nodevalues & "|" & columnnode.attributes.getnameditem("id").text & ":" & columnnode.text             next columnnode             xmldict.add counter, nodevalues             counter = counter + 1         next rownode     end sub 

you try using sax instead of dom. sax should faster when doing parsing document , document non-trivial in size. reference sax2 implementation in msxml here

i typically reach straight dom xml parsing in excel sax seems have advantages in situations. short comparison here might explain differences between them.

here's hacked-together example (partially based on this) using debug.print output:

add reference "microsoft xml, v6.0" via tools > references

add code in normal module

option explicit  sub main()  dim saxreader saxxmlreader60 dim saxhandler contenthandlerimpl  set saxreader = new saxxmlreader60 set saxhandler = new contenthandlerimpl  set saxreader.contenthandler = saxhandler saxreader.parseurl "file://c:\users\foo\desktop\bar.xml"  set saxreader = nothing  end sub 

add class module, call contenthandlerimpl , add following code

option explicit  implements ivbsaxcontenthandler  private lcounter long private snodevalues string private bgetchars boolean 

use left-hand drop-down @ top of module choose "ivbsaxcontenthandler" , use right-hand drop-down add stubs each event in turn (from characters startprefixmapping)

add code of stubs follows

explicitly set counter , flag show if want read text data @ time

private sub ivbsaxcontenthandler_startdocument()  lcounter = 0 bgetchars = false  end sub 

every time new element starts, check name of element , take appropriate action

private sub ivbsaxcontenthandler_startelement(strnamespaceuri string, strlocalname string, strqname string, byval oattributes msxml2.ivbsaxattributes)  select case strlocalname     case "row"         snodevalues = ""     case "col"         snodevalues = snodevalues & "|" & oattributes.getvaluefromname(strnamespaceuri, "id") & ":"         bgetchars = true     case else         ' nothing end select  end sub 

check see if interested in text data and, if are, chop off extraneous white space , remove line feeds (this may or may not desirable depending on document trying parse)

private sub ivbsaxcontenthandler_characters(strchars string)  if (bgetchars)     snodevalues = snodevalues & replace(trim$(strchars), vblf, "") end if  end sub 

if have reached end of col stop reading text values; if have reached end of row print out string of node values

private sub ivbsaxcontenthandler_endelement(strnamespaceuri string, strlocalname string, strqname string)  select case strlocalname     case "col"         bgetchars = false     case "row"         lcounter = lcounter + 1         debug.print lcounter & " " & snodevalues     case else         ' nothing end select  end sub 

to make things clearer, here full version of contenthandlerimpl al of stub methods in place:

option explicit  implements ivbsaxcontenthandler  private lcounter long private snodevalues string private bgetchars boolean  private sub ivbsaxcontenthandler_characters(strchars string)  if (bgetchars)     snodevalues = snodevalues & replace(trim$(strchars), vblf, "") end if  end sub  private property set ivbsaxcontenthandler_documentlocator(byval rhs msxml2.ivbsaxlocator)  end property  private sub ivbsaxcontenthandler_enddocument()  end sub  private sub ivbsaxcontenthandler_endelement(strnamespaceuri string, strlocalname string, strqname string)  select case strlocalname     case "col"         bgetchars = false     case "row"         lcounter = lcounter + 1         debug.print lcounter & " " & snodevalues     case else         ' nothing end select  end sub  private sub ivbsaxcontenthandler_endprefixmapping(strprefix string)  end sub  private sub ivbsaxcontenthandler_ignorablewhitespace(strchars string)  end sub  private sub ivbsaxcontenthandler_processinginstruction(strtarget string, strdata string)  end sub  private sub ivbsaxcontenthandler_skippedentity(strname string)  end sub  private sub ivbsaxcontenthandler_startdocument()  lcounter = 0 bgetchars = false  end sub  private sub ivbsaxcontenthandler_startelement(strnamespaceuri string, strlocalname string, strqname string, byval oattributes msxml2.ivbsaxattributes)  select case strlocalname     case "row"         snodevalues = ""     case "col"         snodevalues = snodevalues & "|" & oattributes.getvaluefromname(strnamespaceuri, "id") & ":"         bgetchars = true     case else         ' nothing end select  end sub  private sub ivbsaxcontenthandler_startprefixmapping(strprefix string, struri string)  end sub 

Comments

Popular posts from this blog

php - What is the difference between $_SERVER['PATH_INFO'] and $_SERVER['ORIG_PATH_INFO']? -

fortran - Function return type mismatch -

queue - mq_receive: message too long -