Comments (11)
I'm hit with it too. Recent version of Mac OS X.
from sitediff.
This is blocking me. :(
from sitediff.
Thank you both for submitting the stack traces. We are developing on 0.0.4 and Ruby 2.5 now, but I have a hypothesis about what is causing this issue. I will see if I can reproduce it and update the issue with my findings.
from sitediff.
This error is thrown by a 3rd party library, likely the ruby database DBM or GDBM. Any notable repro steps you can share with us? Are you crawling a site with large content or many pages? How large is cache.db?
from sitediff.
from sitediff.
+1
from sitediff.
HASH: Out of overflow pages. Increase page size Traceback (most recent call last): 26: from /usr/local/bin/sitediff:23:in
25: from /usr/local/bin/sitediff:23:in
load' 24: from /usr/local/lib/ruby/gems/2.5.0/gems/sitediff-0.0.3/bin/sitediff:10:in
<top (required)>'23: from /usr/local/lib/ruby/gems/2.5.0/gems/thor-0.19.4/lib/thor/base.rb:444:in
start' 22: from /usr/local/lib/ruby/gems/2.5.0/gems/thor-0.19.4/lib/thor.rb:369:in
dispatch'21: from /usr/local/lib/ruby/gems/2.5.0/gems/thor-0.19.4/lib/thor/invocation.rb:126:in
invoke_command' 20: from /usr/local/lib/ruby/gems/2.5.0/gems/thor-0.19.4/lib/thor/command.rb:27:in
run'19: from /usr/local/lib/ruby/gems/2.5.0/gems/sitediff-0.0.3/lib/sitediff/cli.rb:97:in
diff' 18: from /usr/local/lib/ruby/gems/2.5.0/gems/sitediff-0.0.3/lib/sitediff.rb:109:in
run'17: from /usr/local/lib/ruby/gems/2.5.0/gems/sitediff-0.0.3/lib/sitediff/fetch.rb:22:in
run' 16: from /usr/local/lib/ruby/gems/2.5.0/gems/typhoeus-1.3.0/lib/typhoeus/hydra/memoizable.rb:51:in
run'15: from /usr/local/lib/ruby/gems/2.5.0/gems/typhoeus-1.3.0/lib/typhoeus/hydra/runnable.rb:15:in
run' 14: from /usr/local/lib/ruby/gems/2.5.0/gems/ethon-0.11.0/lib/ethon/multi/operations.rb:43:in
perform'13: from /usr/local/lib/ruby/gems/2.5.0/gems/ethon-0.11.0/lib/ethon/multi/operations.rb:164:in
run' 12: from /usr/local/lib/ruby/gems/2.5.0/gems/ethon-0.11.0/lib/ethon/multi/operations.rb:151:in
check'11: from /usr/local/lib/ruby/gems/2.5.0/gems/ethon-0.11.0/lib/ethon/easy/response_callbacks.rb:68:in
complete' 10: from /usr/local/lib/ruby/gems/2.5.0/gems/ethon-0.11.0/lib/ethon/easy/response_callbacks.rb:68:in
each'9: from /usr/local/lib/ruby/gems/2.5.0/gems/ethon-0.11.0/lib/ethon/easy/response_callbacks.rb:68:in
block in complete' 8: from /usr/local/lib/ruby/gems/2.5.0/gems/typhoeus-1.3.0/lib/typhoeus/easy_factory.rb:159:in
block in set_callback'7: from /usr/local/lib/ruby/gems/2.5.0/gems/typhoeus-1.3.0/lib/typhoeus/request/operations.rb:35:in
finish' 6: from /usr/local/lib/ruby/gems/2.5.0/gems/typhoeus-1.3.0/lib/typhoeus/request/callbacks.rb:127:in
execute_callbacks'5: from /usr/local/lib/ruby/gems/2.5.0/gems/typhoeus-1.3.0/lib/typhoeus/request/callbacks.rb:127:in
each' 4: from /usr/local/lib/ruby/gems/2.5.0/gems/typhoeus-1.3.0/lib/typhoeus/request/callbacks.rb:128:in
block in execute_callbacks'3: from /usr/local/lib/ruby/gems/2.5.0/gems/sitediff-0.0.3/lib/sitediff/uriwrapper.rb:104:in
block in typhoeus_request' 2: from /usr/local/lib/ruby/gems/2.5.0/gems/sitediff-0.0.3/lib/sitediff/fetch.rb:41:in
block (2 levels) in queue_path'1: from /usr/local/lib/ruby/gems/2.5.0/gems/sitediff-0.0.3/lib/sitediff/cache.rb:35:in
set' /usr/local/lib/ruby/gems/2.5.0/gems/sitediff-0.0.3/lib/sitediff/cache.rb:35:in
[]=': dbm_store failed (DBMError)`
from sitediff.
My Cachdb is 32GB -> I am crawling a medium sized site. I get this when I do the diff - not when I do the initial crawl.
from sitediff.
Our first attempt at repro did not work. Is anyone able to reproduce this issue on a public website?
from sitediff.
Thanks to everyone who reported, we are preparing to release a v.4 gem with a fix for this issue.
from sitediff.
We released new code into master which fixes this issue. The gem version number is now 0.0.5. It requires Ruby 2.3 or above.
from sitediff.
Related Issues (20)
- Where should the output be? HOT 8
- Seems to use 'after' site for both before and after HOT 2
- Under AWS Linux 2 running "sitediff store" command throws an exception HOT 5
- feature backlog suggestion: API diffs HOT 1
- Merging sanitisation rules from includes HOT 2
- Feature request - crawl sitemap.xml HOT 4
- sitediff version throws error HOT 1
- Exclude does not seem to work for crawl HOT 3
- Can sitediff load pages from disk? HOT 2
- Improve docume ntation for "Export" option HOT 2
- Invalid byte sequence in US-ASCII (ArgumentError) when running `sitediff diff` HOT 1
- Sitediff init not creating Paths.txt HOT 5
- sitediff fails with "Not a directory @ apply2files" if crawl only produces one page HOT 6
- Diff can fail with error HOT 1
- sitediff store ends up in ArgumentError HOT 2
- Paths with trailing slashes always have the trailing slash removed HOT 1
- Allow separate curl_opts for before and after
- Crawler throws error when URL has leading/trailing whitespace HOT 1
- Ignore "itok" for image style URIs in Drupal preset HOT 2
- URLs with the www subdomain are not crawled
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sitediff.