Giter Club home page Giter Club logo

Comments (7)

yudianguo avatar yudianguo commented on July 24, 2024

不知道这个方法可行么

from phpfetcher.

fanfank avatar fanfank commented on July 24, 2024

应该是可行的,你是在爬取tech.qq.com的内容时碰到了乱码问题?
如果有一个具体的出错example我这边会更好修复一些。

from phpfetcher.

yudianguo avatar yudianguo commented on July 24, 2024

爬取tech.qq.com 所有的中文都会变成乱码,使用mb_detect_encoding方法查出实际编码EUC-CN;你可以试一试爬取http://tech.qq.com/网站,里面的中文都是乱码。

from phpfetcher.

fanfank avatar fanfank commented on July 24, 2024

好的,我今晚回去看下

from phpfetcher.

fanfank avatar fanfank commented on July 24, 2024

@yudianguo 我这边detect的结果是FALSE,你试一下在源码的这一行: https://github.com/fanfank/phpfetcher/blob/master/Phpfetcher/Page/Default.php#L363

后面加上var_dump(mb_detect_encoding($this->_strContent));,看看得到的结果是什么?

from phpfetcher.

yudianguo avatar yudianguo commented on July 24, 2024

我现在就是用 mb_detect_encoding($this->_strContent, array("ASCII","UTF-8","GB2312","GBK"));这个检测了一下,返回的东西和白天试的不一样这会返回的编码是CP936。可能是他们网站做了什么处理了把。我看有人介绍说不建议使用这中方式检测编码,说是不准确

from phpfetcher.

fanfank avatar fanfank commented on July 24, 2024

是的,之前我也是尝试过用这种方法,效果不太好。后来是想着既然页面内有编码标识就直接用,如果实际编码和声明的不一致,那么就和规范不统一了。这里我怀疑是另外一些问题导致了乱码,我再看看

from phpfetcher.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.