python - Problems in scraping and saving file by eventlet -
i can use evenlet scrap img website failed save them domestic directory. following code. 1 familiar i/o operation in tasklets model? thanks
import pyquery import eventlet eventlet.green import urllib2 #fetch img urls............ works fine print "loading page..." html=urllib2.urlopen("http://www.meinv86.com/meinv/yuanchuangmeinvzipai/").read() print "parsing urls..." d=pyquery.pyquery(html) count=0 urls=[] url='' in d('img'): count=count+1 print i.attrib["src"] urls.append(i.attrib["src"]) def fetch(url): try: print "start feteching %s" %(url) urlfile = urllib2.urlopen(url) size=int(urlfile.headers['content-length']) print 'downloading %s, total file size: %d' %(url,size) data = urlfile.read() print 'download complete - %s' %(url) ########################################## #file save won't work f=open("/head2/"+url+".jpg","wb") f.write(body) f.close() print "file saved" ########################################## return data except: print "fail download..." pool = eventlet.greenpool() body in pool.imap(fetch, urls): print "done"
make sure url suitable filename e.g.:
import hashlib import os def url2filename(url, ext=''): return hashlib.md5(url).hexdigest() + ext # removes '\/' # ... open(os.path.join("/head2", url2filename(url, '.jpg')), 'wb') f: f.write(body) print "file saved" note: don't want write files top-level directory such '/head2'.
you consider urllib.urlretrieve().
Comments
Post a Comment