Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why the QNetworkReply readAll() returns zero bytes?

I am using qtwebkit in pyqt4 to download images through QNetworkReply:

import os 
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import QWebPage

class dxBrowser(QWebPage):
def __init__(self,url):
    QWebPage.__init__(self)
    self._url=url
    self.manager=self.networkAccessManager()
    self.connect(self.manager,SIGNAL("finished(QNetworkReply *)"),self.onFinished)

def crawl(self):
    self.mainFrame().load(QUrl(self._url))

def onFinished(self,networkReply):
    if networkReply.rawHeader('Content-Type')=='image/png':
        print 'find the image'
        l=int(networkReply.rawHeader('Content-Length'))
        print l
        byteArray=networkReply.readAll()
        print byteArray.size()
        im=QImage.fromData(byteArray)
        if not im.save('test.jpg','jpg'):
            print 'image save error'


def main():
    app=QApplication(sys.argv)
    url='http://www.yiyaows.cn/DrsPath.do?kid=6666686E686E69673334333632303335&username=mylibddrz&spagenum=251&pages=50&fid=7534992&a=95cb07394dbf1d43c1fe61bdf6d4a36d&btime=2011-08-19&etime=2011-09-08&template=bookdsr1&firstdrs=http%3A%2F%2Fbook1.duxiu.com%2FbookDetail.jsp%3FdxNumber%3D000005609810%26d%3DA30222298F3C6715323B5476CB66D650'
    dx=dxBrowser(url)
    dx.crawl()
    sys.exit(app.exec_())

if __name__=='__main__':
    main()    

Though the content-length is non-zero but the byteArray.size() is 0. So I can't save the image. Why? Can anyone help me.

EDIT: Maybe I figured this out. The qtwebkit may have retrieve the content of the qnetworkreply, a QIODevice, the size of it would be 0 after readall().Maybe the qtwebkit as a browser has read it for rendering, I guess.

like image 874
Treper Avatar asked Jan 21 '26 17:01

Treper


1 Answers

EDIT: Maybe I figured this out. The qtwebkit may have retrieve the content of the qnetworkreply, a QIODevice, the size of it would be 0 after readall().Maybe the qtwebkit as a browser has read it for rendering, I guess.

Yes and there is an easy work-around: add a QNetworkDiskCache to the manager (with QNetworkAccessManager.setCache) and retrieve the image from the cache in your slot onFinished.

If the website uses "Pragma: no-cache" or "Cache-control" to hint the browser not to save the file to disk, you will have to redefine the method prepare (and maybe updateMetaData) of QNetworkDiskCache to override the flag saveToDisk before calling the original method(s).

like image 79
alexisdm Avatar answered Jan 23 '26 06:01

alexisdm



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!