Comments (7)
不知道这个方法可行么
from phpfetcher.
应该是可行的,你是在爬取tech.qq.com的内容时碰到了乱码问题?
如果有一个具体的出错example我这边会更好修复一些。
from phpfetcher.
爬取tech.qq.com 所有的中文都会变成乱码,使用mb_detect_encoding方法查出实际编码EUC-CN;你可以试一试爬取http://tech.qq.com/网站,里面的中文都是乱码。
from phpfetcher.
好的,我今晚回去看下
from phpfetcher.
@yudianguo 我这边detect的结果是FALSE,你试一下在源码的这一行: https://github.com/fanfank/phpfetcher/blob/master/Phpfetcher/Page/Default.php#L363
后面加上var_dump(mb_detect_encoding($this->_strContent));
,看看得到的结果是什么?
from phpfetcher.
我现在就是用 mb_detect_encoding($this->_strContent, array("ASCII","UTF-8","GB2312","GBK"));这个检测了一下,返回的东西和白天试的不一样这会返回的编码是CP936。可能是他们网站做了什么处理了把。我看有人介绍说不建议使用这中方式检测编码,说是不准确
from phpfetcher.
是的,之前我也是尝试过用这种方法,效果不太好。后来是想着既然页面内有编码标识就直接用,如果实际编码和声明的不一致,那么就和规范不统一了。这里我怀疑是另外一些问题导致了乱码,我再看看
from phpfetcher.
Related Issues (8)
- Phpfetcher stops when encountering 404 pages
- 可以可以
- 有没有对采集的数据入库的例子? HOT 3
- 能否排除正则匹配的某些链接? HOT 1
- 大神,需要你的帮助
- 咨询,如何获取js初始化之后的页面数据 HOT 2
- 是否考虑支持正则匹配 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from phpfetcher.