regex - Extract data from a Google Chrome bookmarks export with PHP -
i wanting google chrome bookmarks database, first step taking exported .html file chrome php , getting data variables, hoping php code able run data below , extract url, add_date, icon, , link text there own variables.
i know need use regex this, can help? thanks, add bounty when time allows.
<a href="http://snipt.net/public/tag/css" add_date="1271801059" icon="data:image/png;base64,ivborw0kggoaaaansuheugaaabaaaaaqcayaaaaf8/9haaactkleqvq4jxwss2gtursgvzszyasxpss2vhe2wosgilgvhydqzo2iioog+eikciiufntgjuovblwcikarfcsfi7hqflt4qqp10sk11mkbgk3smjsdjdnzj4s+0fb/ztml/3z8596jmkdelxcvytuwxs3+gu7o9dysqzvstvt8kfvnp9ddvbfrz3w3n197dqgaepv2ayeppuj9fdknguzbg68/dzo/hjcm/gl0dcqrs4kro9pznvt+edvudovdwr6lsksdyuefr39nhundp7n2kvnrzti21brf856eo7aloqagul40ihgx3ysqsonxp3znih/avp6yx2lsxwesdrvprfe2fnhqfd8bdsduviqzxq19mcxlawaxporwkkxwxiyqjwxdmzu1i2ytjutxskev2dllsvjmalgxo5ymrqymhe1zpjw6sbalqbsuxziyonzc9upk3qjarsfa7qjoil5ywx/15yqa6vyinc3m0vl2c0bejxukqqch6gu074miioiwjwhh55lipkiopdgpnvzt8un5agskgdrjml74yoowei2iighaa4iwqwd55prc1uo1r26p/yibek3e2kom+5hcgb8adtjsr2cc1oinxqz92anyvwaanngnygrmrdqylmc8cogqdviil5v7nrxg9craxbz17upztuqziojrnuyaqvmznedq0muyl76jg893hdt+y2jj+bumqeanxw5yjxs8d2ioigaqtant0tvf5mr7wu53rsox6zsevz62nqyeemoajduf1nvo4a2bqtlomhdbolxrxuv/figekfrfbm5vlfzffh66tvefgi6ouf0u7pt4a2pz47vffe4thwcqytlck9qy/npnz6vetzyzppp3m7cf6n8k+0vkjxba6xp6d/3poynbmjaed07afs4s+tmmt7gqwf/5fgmaewl1u/qpfaaaaabjru5erkjggg==" >snipt - public - css | share , store code or command snippets.</a> update
i liked user yc's recommendation of using instead of regex
$s = '<a href="http://snipt.net/public/tag/css" add_date="1271801059" icon="data:image/png;base64,ivborw0kggoaaaansuheugaaabaaaaaqcayaaaaf8/9haaactkleqvq4jxwss2gtursgvzszyasxpss2vhe2wosgilgvhydqzo2iioog+eikciiufntgjuovblwcikarfcsfi7hqflt4qqp10sk11mkbgk3smjsdjdnzj4s+0fb/ztml/3z8596jmkdelxcvytuwxs3+gu7o9dysqzvstvt8kfvnp9ddvbfrz3w3n197dqgaepv2ayeppuj9fdknguzbg68/dzo/hjcm/gl0dcqrs4kro9pznvt+edvudovdwr6lsksdyuefr39nhundp7n2kvnrzti21brf856eo7aloqagul40ihgx3ysqsonxp3znih/avp6yx2lsxwesdrvprfe2fnhqfd8bdsduviqzxq19mcxlawaxporwkkxwxiyqjwxdmzu1i2ytjutxskev2dllsvjmalgxo5ymrqymhe1zpjw6sbalqbsuxziyonzc9upk3qjarsfa7qjoil5ywx/15yqa6vyinc3m0vl2c0bejxukqqch6gu074miioiwjwhh55lipkiopdgpnvzt8un5agskgdrjml74yoowei2iighaa4iwqwd55prc1uo1r26p/yibek3e2kom+5hcgb8adtjsr2cc1oinxqz92anyvwaanngnygrmrdqylmc8cogqdviil5v7nrxg9craxbz17upztuqziojrnuyaqvmznedq0muyl76jg893hdt+y2jj+bumqeanxw5yjxs8d2ioigaqtant0tvf5mr7wu53rsox6zsevz62nqyeemoajduf1nvo4a2bqtlomhdbolxrxuv/figekfrfbm5vlfzffh66tvefgi6ouf0u7pt4a2pz47vffe4thwcqytlck9qy/npnz6vetzyzppp3m7cf6n8k+0vkjxba6xp6d/3poynbmjaed07afs4s+tmmt7gqwf/5fgmaewl1u/qpfaaaaabjru5erkjggg==" >snipt - public - css | share , store code or command snippets.</a>'; $bookmarks = simplexml_load_string($s2); echo $bookmarks["href"]; //url echo '<br>'; echo $bookmarks[0]; //name echo '<br>'; echo $bookmarks['icon']; //icon echo '<br>'; echo $bookmarks['add_date']; //add_date however have not figured out how make work multiple links on html page or string yet.
i found php domdocument class , seem have working this...
$html = '<dt><a href="http://stackapps.com/questions/518/stacktack-a-javascript-widget-you-can-stick-anywhere" add_date="1301274664" icon="data:image/png;base64,ivborw0kggoaaaansuheugaaabaaaaaqcayaaaaf8/9haaacy0leqvq4jx2ss0jvqrtgv//mnjnxgjctpbcfirgo2rrr06kirdtkeylwuj4gohdbzlfuemmp0n8wizxrimif25zu27hqiakiuhaze/93ni3yky/bwr3o/m43zddfghp14fpeo67hzwcx+onby8ur28s7caywqh0dp6g01lrruqb9lcdrcclamza0doqihnesrxqnmfbkghgetwd4eeoxhkfwwuoc7/lgxh0y2pd2xacn5znl2zbpurasjaghwauhyxwaegoacw7eevhjyapmqbqyo9k0ybkapgnbmqjjioksekorkk2hlcwpjutka2itizueeugqmeztgt3kazhgggsejbbrjjjvz4jpz4jp5yt6bvg2ugxogq97co7g/ea9u5uo5nk5cavvo9ianlg+s10nx/81a3r6ptd7n5zvmmsvbllcznpxvhdnwun9wxijmc0za2smgyvjyvtd1fa3s3j1kkmsemmcgvmgkfjrxoouxe3fmgybbd9pnjhwwf7dzl3odceshxaqi/jsqe6tezqnumnepvga50prene2ufjsxb/v5sjulaxkknyagihjgl8pv8vtajbtutpbh2fnra5mgyyxg3chuu4agfgt8nz5zzmssyiafje+qxpqhgae5/peovhj4z2drtq/6cv/g5tmsyvbsqoiqrtbsoay1olkvonkl34uhi40lvgoahnz78lhi8cwinaohjckgd947w+gelt3dtbykvn+2jjqrlxugdfm8vjjix//6/m1by9om+ltabcs73x4/fybntix0xaavn75nlonev8fqfl5fd/x4v4arzqwgyloddcaaaaasuvork5cyii=">app - stacktack, javascript widget can stick anywhere - stack apps</a> <dt><a href="https://chrome.google.com/extensions/detail/paoeolblihedcagbofkkkecjilmpehmo" add_date="1301275461" icon="data:image/png;base64,ivborw0kggoaaaansuheugaaabaaaaaqcayaaaaf8/9haaadskleqvq4jw2tx2gbbqdgf5e7ngmtloz/xkvtbofwrjwuna6dob9ay4amqvgweprfrytbgepwwrcnfvb8erqn+cbs6xsovawzimmwottjtcz2bdpm69qkjfntxq7j3sv358ueyvzepvh9fc/fj/a/optipph6ymjzmq4pigh2h9o5tztpf33uypwjzc63vz8v9ptwrzj0wvvhcy9edrxumjnxwsuilmuhoj1ipp9vdjqtp+ryb4+tr39xx+vqqwpde9pjy99cog9lgihrjiham0e6rc1he9zglgytgbua1prtab3n8bzxb04ecaad7g8n35z0nad62fa5fpsuw0+d4nqsk6q/tplqb13xol70bwykwqdbper7su3pq0v5rrfg9nzjs9e+x4t2a49syxmlvxsflydde4z08dg+n0poitcbu4f5j0tdrsmx8hbivljzmrgoh13d50ferribdzrriqrb+ftacguto4tb++jv9qo5olis5bg9axsg27jhcrarnhkdsz1zkxikns43go3ynkle4arumeqngavsiyfryb4rrqw5mcxqxufmcxmkp7p8nowsmbgdkmpq+ygnvvkt3aatrzsjtuzqdtdc4i+xmzncpiorasvlfdevejvmg/iwlukgwa46yxkrilk1dk1u1/nmzew3a7qufxgzouem+zpj6tf2qh58lswrz/tqfzsdjwdqa40oio2kqqns1tpmqjonbzf8dm9kqsakinqnof10gooodr55qjge7iy6w1dg5ztqposqsqiawpyt8i3rqoqy5lz3o45g+zdkouwrcms1wtlw8rmcfx2r+bzxyfcntmna0+wolwbws5py8vjhzaboqw9/+9y5aerw+5hf3wpiauxv0qleph+hmaxkeqypogoodmuprt3kky/lr0ol/hab0m2/95bzzufjvqnii6/hekiz/tanqk6aucuwlgp1g1w1gb0mcwmimpx+epkikbbvlbkmfspr8uz6brg9ff0nbanomq55y0i2ejztqncl/kohv/xb7urhl+v3gfna/m+zgcbw3n97ydg8ei93tyx8cvuqkuvuklnz3p11tp0pyfwdfoh6fw+8jwmiafhaa9sakpahtohyfvgfh8p7963yqu4aaaaasuvork5cyii=">stackstalker - google chrome extension gallery</a> <dt><a href="http://stackapps.com/questions/319/phpstack-a-php-wrapper-to-the-se-api" add_date="1301276371" icon="data:image/png;base64,ivborw0kggoaaaansuheugaaabaaaaaqcayaaaaf8/9haaacy0leqvq4jx2ss0jvqrtgv//mnjnxgjctpbcfirgo2rrr06kirdtkeylwuj4gohdbzlfuemmp0n8wizxrimif25zu27hqiakiuhaze/93ni3yky/bwr3o/m43zddfghp14fpeo67hzwcx+onby8ur28s7caywqh0dp6g01lrruqb9lcdrcclamza0doqihnesrxqnmfbkghgetwd4eeoxhkfwwuoc7/lgxh0y2pd2xacn5znl2zbpurasjaghwauhyxwaegoacw7eevhjyapmqbqyo9k0ybkapgnbmqjjioksekorkk2hlcwpjutka2itizueeugqmeztgt3kazhgggsejbbrjjjvz4jpz4jp5yt6bvg2ugxogq97co7g/ea9u5uo5nk5cavvo9ianlg+s10nx/81a3r6ptd7n5zvmmsvbllcznpxvhdnwun9wxijmc0za2smgyvjyvtd1fa3s3j1kkmsemmcgvmgkfjrxoouxe3fmgybbd9pnjhwwf7dzl3odceshxaqi/jsqe6tezqnumnepvga50prene2ufjsxb/v5sjulaxkknyagihjgl8pv8vtajbtutpbh2fnra5mgyyxg3chuu4agfgt8nz5zzmssyiafje+qxpqhgae5/peovhj4z2drtq/6cv/g5tmsyvbsqoiqrtbsoay1olkvonkl34uhi40lvgoahnz78lhi8cwinaohjckgd947w+gelt3dtbykvn+2jjqrlxugdfm8vjjix//6/m1by9om+ltabcs73x4/fybntix0xaavn75nlonev8fqfl5fd/x4v4arzqwgyloddcaaaaasuvork5cyii=">library - phpstack - php wrapper se api - stack apps</a> '; $dom = new domdocument; $dom->loadhtml($html); foreach ($dom->getelementsbytagname('a') $node) { echo 'title = ' .$node->nodevalue. '</br>'; echo 'url = ' .$node->getattribute("href"). '</br>'; echo 'icon = ' . $node->getattribute("icon"). '</br>'; echo 'date added = ' . $node->getattribute("add_date"). '</br>'; echo '<br>'; }
don't use regex, since html, if provided chrome, isn't regular language.
use xml parser, simplexml.
if string above $s,
$bookmarks = simplexml_load_string($s); echo $bookmarks["href"]; //url echo $bookmarks[0]; //name object(simplexmlelement)#1 (2) { ["@attributes"]=> array(3) { ["href"]=> string(31) "http://snipt.net/public/tag/css" ["add_date"]=> string(10) "1271801059" ["icon"]=> string(1026) "data:image/png;base64,ivbh....=" } [0]=> string(64) "snipt - public - css | share , store code or command snippets." }
Comments
Post a Comment