performance - How can I improve the speed of XML parsing in VBA -
i have large xml file needs parsed in vba (excel 2003 & 2007). there upwards of 11,000 'rows' of data in xml file each 'row' having between 10 , 20 'columns'. ends being huge task parse through , grab data (5 - 7 minutes). tried reading xml , placing each 'row' dictionary (key = row number, value = row attributes), takes long.
it taking forever traverse dom. there more efficient way?
dim xmldict sub parsexml(byref rootnode ixmldomnode) dim counter long dim rowlist ixmldomnodelist dim columnlist ixmldomnodelist dim rownode ixmldomnode dim columnnode ixmldomnode counter = 1 set rowlist = rootnode.selectnodes("row") each rownode in rowlist set columnlist = rownode.selectnodes("col") dim nodevalues string each columnnode in columnlist nodevalues = nodevalues & "|" & columnnode.attributes.getnameditem("id").text & ":" & columnnode.text next columnnode xmldict.add counter, nodevalues counter = counter + 1 next rownode end sub
you try using sax instead of dom. sax should faster when doing parsing document , document non-trivial in size. reference sax2 implementation in msxml here
i typically reach straight dom xml parsing in excel sax seems have advantages in situations. short comparison here might explain differences between them.
here's hacked-together example (partially based on this) using debug.print output:
add reference "microsoft xml, v6.0" via tools > references
add code in normal module
option explicit sub main() dim saxreader saxxmlreader60 dim saxhandler contenthandlerimpl set saxreader = new saxxmlreader60 set saxhandler = new contenthandlerimpl set saxreader.contenthandler = saxhandler saxreader.parseurl "file://c:\users\foo\desktop\bar.xml" set saxreader = nothing end sub add class module, call contenthandlerimpl , add following code
option explicit implements ivbsaxcontenthandler private lcounter long private snodevalues string private bgetchars boolean use left-hand drop-down @ top of module choose "ivbsaxcontenthandler" , use right-hand drop-down add stubs each event in turn (from characters startprefixmapping)
add code of stubs follows
explicitly set counter , flag show if want read text data @ time
private sub ivbsaxcontenthandler_startdocument() lcounter = 0 bgetchars = false end sub every time new element starts, check name of element , take appropriate action
private sub ivbsaxcontenthandler_startelement(strnamespaceuri string, strlocalname string, strqname string, byval oattributes msxml2.ivbsaxattributes) select case strlocalname case "row" snodevalues = "" case "col" snodevalues = snodevalues & "|" & oattributes.getvaluefromname(strnamespaceuri, "id") & ":" bgetchars = true case else ' nothing end select end sub check see if interested in text data and, if are, chop off extraneous white space , remove line feeds (this may or may not desirable depending on document trying parse)
private sub ivbsaxcontenthandler_characters(strchars string) if (bgetchars) snodevalues = snodevalues & replace(trim$(strchars), vblf, "") end if end sub if have reached end of col stop reading text values; if have reached end of row print out string of node values
private sub ivbsaxcontenthandler_endelement(strnamespaceuri string, strlocalname string, strqname string) select case strlocalname case "col" bgetchars = false case "row" lcounter = lcounter + 1 debug.print lcounter & " " & snodevalues case else ' nothing end select end sub to make things clearer, here full version of contenthandlerimpl al of stub methods in place:
option explicit implements ivbsaxcontenthandler private lcounter long private snodevalues string private bgetchars boolean private sub ivbsaxcontenthandler_characters(strchars string) if (bgetchars) snodevalues = snodevalues & replace(trim$(strchars), vblf, "") end if end sub private property set ivbsaxcontenthandler_documentlocator(byval rhs msxml2.ivbsaxlocator) end property private sub ivbsaxcontenthandler_enddocument() end sub private sub ivbsaxcontenthandler_endelement(strnamespaceuri string, strlocalname string, strqname string) select case strlocalname case "col" bgetchars = false case "row" lcounter = lcounter + 1 debug.print lcounter & " " & snodevalues case else ' nothing end select end sub private sub ivbsaxcontenthandler_endprefixmapping(strprefix string) end sub private sub ivbsaxcontenthandler_ignorablewhitespace(strchars string) end sub private sub ivbsaxcontenthandler_processinginstruction(strtarget string, strdata string) end sub private sub ivbsaxcontenthandler_skippedentity(strname string) end sub private sub ivbsaxcontenthandler_startdocument() lcounter = 0 bgetchars = false end sub private sub ivbsaxcontenthandler_startelement(strnamespaceuri string, strlocalname string, strqname string, byval oattributes msxml2.ivbsaxattributes) select case strlocalname case "row" snodevalues = "" case "col" snodevalues = snodevalues & "|" & oattributes.getvaluefromname(strnamespaceuri, "id") & ":" bgetchars = true case else ' nothing end select end sub private sub ivbsaxcontenthandler_startprefixmapping(strprefix string, struri string) end sub
Comments
Post a Comment