Comments (53)
[deleted comment]
from newcrawler.
[deleted comment]
from newcrawler.
10.发布功能支持上传文件
11.邮箱通知支持上传附件
Original comment by [email protected]
on 19 Sep 2012 at 2:08
from newcrawler.
12.采集规则添加一个过滤接口,一个过滤插件
Original comment by [email protected]
on 20 Sep 2012 at 3:34
from newcrawler.
13.循环区配,采集规则不允许为空时,当一条记录的部份规则匹配为空时,此时将不能再匹配任何其它记录,需要修改逻辑,使其从最后匹配的位置继续匹配下一条记录。
Original comment by [email protected]
on 6 Nov 2012 at 9:27
from newcrawler.
14.分页规则索引号可以与采集规则索引号相同
Original comment by [email protected]
on 9 Nov 2012 at 9:01
from newcrawler.
15.新添加的采集规则再更新时有错误
Original comment by [email protected]
on 9 Nov 2012 at 9:10
from newcrawler.
[deleted comment]
from newcrawler.
16.账户管理,语言、时区设置
Original comment by [email protected]
on 26 Jan 2013 at 3:25
from newcrawler.
17.修改select的值时,应当更新所有页面的select
Original comment by [email protected]
on 26 Jan 2013 at 3:27
from newcrawler.
18.采集规则添加"标签组合"
Original comment by [email protected]
on 15 Apr 2013 at 5:59
from newcrawler.
19.重新设计日志记录方式,1.将日志存放于内存(GAE),2将日志存放于DB。
20.前台可以设置每个站点的采集速率。
Original comment by [email protected]
on 20 Jul 2013 at 8:57
from newcrawler.
21.任务队列统计、采集的URL(每日统计)、采集到的数据(每日统计)
22.前台查看日志
Original comment by [email protected]
on 10 Aug 2013 at 1:29
from newcrawler.
[deleted comment]
from newcrawler.
23.当数据来源为其它标签时修改采集规则区域的显示方式.
Original comment by [email protected]
on 29 Nov 2013 at 6:59
from newcrawler.
24.完善选项卡异步加载
25.完善嵌套采集时的COOKIE设置
Original comment by [email protected]
on 18 Feb 2014 at 1:40
from newcrawler.
26.实现查看计划任务中采集规则的运行状态
27.XPATH读取时可以直接添加到采集规则
Original comment by [email protected]
on 13 Oct 2014 at 8:01
from newcrawler.
[deleted comment]
from newcrawler.
29.采集到的数据列表页增加按入库日期查询
30.数量统计同步统计类型字段
Original comment by [email protected]
on 13 Oct 2014 at 8:32
from newcrawler.
31.当采集测试没有匹配到数据时提示是哪条规则没有匹配到数据
Original comment by [email protected]
on 16 Oct 2014 at 8:36
from newcrawler.
32.站点管理》HTTP请求配置窗口无法打开
Original comment by [email protected]
on 17 Oct 2014 at 1:03
from newcrawler.
33.采集规则字段合并排版问题
Original comment by [email protected]
on 23 Oct 2014 at 3:27
from newcrawler.
34.JS依赖分析失败
Original comment by [email protected]
on 23 Oct 2014 at 3:30
from newcrawler.
35.load异常的时,关闭loading mark
Original comment by [email protected]
on 23 Oct 2014 at 6:36
from newcrawler.
36.数据列表页查询时开始索引错误
Original comment by [email protected]
on 23 Oct 2014 at 8:18
from newcrawler.
37.为计划任务添加执行日志
38.为“数据自动采集”计划任务增加入队列统计,完成度统计。
Original comment by [email protected]
on 6 Nov 2014 at 3:10
from newcrawler.
39.站点编码“自动识别”改成每次抓取都自动识别
Original comment by [email protected]
on 6 Nov 2014 at 3:15
from newcrawler.
40.修改XPATH提取工具的class,避免class冲突
Original comment by [email protected]
on 19 Nov 2014 at 3:46
from newcrawler.
41.XPATH匹配增加outerHTML、innerHTML、innerTEXT属性
Original comment by [email protected]
on 25 Nov 2014 at 3:20
from newcrawler.
[deleted comment]
from newcrawler.
43.添加采集队列管理功能,如删除队列、停止队列、运行队列
Original comment by [email protected]
on 28 Nov 2014 at 3:37
from newcrawler.
44.统计功能数据自动刷新
Original comment by [email protected]
on 9 Dec 2014 at 6:57
from newcrawler.
45.导出到CSV
Original comment by [email protected]
on 9 Dec 2014 at 7:09
from newcrawler.
[deleted comment]
from newcrawler.
47.将采集器做为服务,开放采集API支持异步或同步返回两种形式
Original comment by [email protected]
on 16 Dec 2014 at 1:44
from newcrawler.
48.在站点管理里增加“最大采集队列数”,为空或小于1时不限制。计划任务在执行“数据自动采集”时会检测当前站点未完成的任务数,超过限制时将不开启本次采集任务。这样可以避免开启过多的任务而耗尽系统资源。
Original comment by [email protected]
on 16 Dec 2014 at 7:00
from newcrawler.
49.完善WEB端,
1.优化响应速度CND加速、多节点同步(DNS智能加速)
2.GAE在线安装使用排队机制
Original comment by [email protected]
on 24 Dec 2014 at 7:49
from newcrawler.
50.Queue SYNC_FULL 需要加入CPU操时处理逻辑
Original comment by [email protected]
on 14 Jan 2015 at 1:33
from newcrawler.
51.网址批量添加
多个网址用'|$|'分隔
to
多个网址使用'换行'或'|$|'分隔
Original comment by [email protected]
on 16 Jan 2015 at 8:14
from newcrawler.
52.实现密码找回功能
Original comment by [email protected]
on 19 Jan 2015 at 8:31
from newcrawler.
53.newcrawler.com全球服务器选择功能
Original comment by [email protected]
on 19 Jan 2015 at 8:33
from newcrawler.
54.数据发布规则,默认隐藏,增加显示按钮
Original comment by [email protected]
on 19 Jan 2015 at 8:36
from newcrawler.
[deleted comment]
from newcrawler.
55.快速开始,增加可视化规则创建功能
56.增加数据查询API,提供JSON、CSV格式.
57.爬虫池配置--负载均衡功能实现
Original comment by [email protected]
on 12 Mar 2015 at 3:15
from newcrawler.
58.异步查询时增加loading中的图片
Original comment by [email protected]
on 12 Mar 2015 at 3:26
from newcrawler.
59.可以为每个站点配置“触发抓取异常”
抓取到网页内容后检测是否包含异常文本(如反爬虫验证码输入提示),包含异常文本时系统将抛出抓取异常并且系统默认会重试抓取一次
Original comment by [email protected]
on 25 May 2015 at 8:57
from newcrawler.
60.增加自定义采集速率
Original comment by [email protected]
on 26 May 2015 at 1:18
from newcrawler.
61.验证Cookie的语言环境是否与当前系统选择的语言一致
Original comment by [email protected]
on 29 May 2015 at 1:45
from newcrawler.
62.爬虫统计数据没有生效
Original comment by [email protected]
on 29 May 2015 at 1:45
from newcrawler.
63.可以为爬虫配置默认的采集速率
64.回调检测时间,描述:采集器会使用异步的方式调用爬虫采集,当爬虫由于一些原因没有返回结果时,需要重新采集网址,回调检测时间就是定义爬虫多长时间没有返回时触发重新采集
Original comment by [email protected]
on 25 Jun 2015 at 3:48
from newcrawler.
65.登录后比较版本,需要更新时醒目提示
66.查看日志,length右对齐单位改为KB,lastmodified增加宽度
67.爬虫远程访问增加密码认证
Original comment by [email protected]
on 25 Jun 2015 at 9:14
from newcrawler.
68.登录界面“帮助”连接到WIKI
Original comment by [email protected]
on 29 Jun 2015 at 6:05
from newcrawler.
69.添加服务条款页面
Original comment by [email protected]
on 29 Jun 2015 at 6:06
from newcrawler.
Related Issues (20)
- 可视化配置预览窗口可拖动 HOT 1
- 可视化配置无法选择属性
- 可视化配置增加预览视图,在预览视图实现更多的选项配置
- 邀请码什么时候开放?还有gae的能不能更新一键搭建? HOT 2
- 控制台显示数据发布日志
- docker 部署后还需要邀请码才可以使用? HOT 1
- Invite code HOT 2
- Invite HOT 2
- Invitation Code HOT 2
- 真心求一个邀请码,谢谢了。 HOT 4
- 怎么获取邀请码呢 HOT 6
- max_allowed_packet HOT 1
- invitation code
- ERROR HOT 2
- The URI you submitted has disallowed characters. HOT 3
- 这系统的密码是多少! HOT 1
- 请问这个嵌入iframe是怎么解决跨域问题的呢? HOT 3
- 注册时验证码图片无法Load HOT 1
- Replace speed/newcrawler/war to speed/windows-64bit-jetty-jre/war??? HOT 15
- 竟然不是单机版晕
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from newcrawler.