python - Problems in scraping and saving file by eventlet -

- June 15, 2014

i can use evenlet scrap img website failed save them domestic directory. following code. 1 familiar i/o operation in tasklets model? thanks

import pyquery import eventlet eventlet.green import urllib2  #fetch img urls............ works fine  print "loading page..." html=urllib2.urlopen("http://www.meinv86.com/meinv/yuanchuangmeinvzipai/").read() print "parsing urls..." d=pyquery.pyquery(html) count=0 urls=[] url='' in d('img'):  count=count+1  print i.attrib["src"]  urls.append(i.attrib["src"])   def fetch(url):  try:   print "start feteching %s" %(url)   urlfile = urllib2.urlopen(url)   size=int(urlfile.headers['content-length'])   print 'downloading %s, total file size: %d' %(url,size)   data = urlfile.read()   print 'download complete - %s' %(url)  ########################################## #file save won't work    f=open("/head2/"+url+".jpg","wb")   f.write(body)     f.close()   print "file saved" ##########################################      return data   except:   print "fail download..."     pool = eventlet.greenpool()  body in pool.imap(fetch, urls):   print "done"

make sure url suitable filename e.g.:

import hashlib import os  def url2filename(url, ext=''):     return hashlib.md5(url).hexdigest() + ext # removes '\/'  # ... open(os.path.join("/head2", url2filename(url, '.jpg')), 'wb') f:      f.write(body) print "file saved"

note: don't want write files top-level directory such '/head2'.

you consider urllib.urlretrieve().

Search This Blog

Score

python - Problems in scraping and saving file by eventlet -

Comments

Post a Comment

Popular posts from this blog

how to build hyperlink for query string in php -

php - What is the difference between $_SERVER['PATH_INFO'] and $_SERVER['ORIG_PATH_INFO']? -

queue - mq_receive: message too long -