Giter Club home page Giter Club logo

Comments (12)

joeytwiddle avatar joeytwiddle commented on July 19, 2024

The behaviour I am currently experiencing is that an error message is displayed, but the crawl continues. But then at the end no output is produced. That is horrible!

Better might be to abort immediately after an error.

But I would prefer if it would simply display results even after errors have occurred. (Perhaps with a note at the top with a count of how many errors occurred during the crawl.)

from ucss.

oyvindeh avatar oyvindeh commented on July 19, 2024

Yeah, there seems to be a bug: I've seen that a couple of times too, but I haven't had the time to investigate yet. I will look into it as soon as I can find some time for it. I've created a separate issue ( #32 ).

from ucss.

oyvindeh avatar oyvindeh commented on July 19, 2024

What is the error message(s) you get, is it ETIMEDOUT and/or ESOCKETTIMEDOUT? Do you get a timeout on loading the CSS?

from ucss.

joeytwiddle avatar joeytwiddle commented on July 19, 2024

I did have a few timeouts, but I increased the "timeout" parameter in the config to compensate.

The errors I am getting now appear to be with certain binaries. I see them from .zip, .jpg, .png and .pdf files found during the crawl:

Visited:  http://...-engines-eng.jpg
Unable to load http://...-engines-eng.jpg: RangeError: Maximum call stack size exceeded
undefined

Occasionally I also get them from messy links!

_getHtmlAsString() failed to read javascript: void(0): ENOENT, no such file or directory 'javascript: void(0)'

It would be great if the report could be shown even when failures occur!

from ucss.

oyvindeh avatar oyvindeh commented on July 19, 2024

Thanks for the info! I have fixed the problem with the timeouts (not released yet). As for the problem with the crawler following binaries, that's a known bug (#29). I am currently travelling, but I will look into all this when I get back in a few days, as well as release a bunch of other fixes.

from ucss.

oyvindeh avatar oyvindeh commented on July 19, 2024

#29 has now been fixed.

from ucss.

joeytwiddle avatar joeytwiddle commented on July 19, 2024

Great thanks, the fix for binaries is working here too. ๐Ÿ‘

However I am still getting a few errors from javascript: "links" and an ENOENT for an ftp: URL that does not exist.

I am also getting this at the end:

.../node_modules/ucss/node_modules/q/q.js:126
                    throw e;
                          ^
RangeError: Maximum call stack size exceeded

The result is that after a long crawl visiting 400 pages, I see no results about the CSS!

(Unfortunately stack overflows do not produce a stack-trace so it is harder to see where this came from, but I may try to run with a debugger at some point...)

from ucss.

oyvindeh avatar oyvindeh commented on July 19, 2024

I've fixed the "javascript:" and "ftp:" bug.

Looking into the other one as well. It's a bit tricky, so I cannot promise when a fix will be out. I don't know what kind of site you try to crawl, but if it resembles a product catalog (i.e. having lots of pages with the same markup and CSS, but different content), you could make an exclude list in your config and add the whole subtree, and add a couple of entries to include list (I've updated the example config to show this).

from ucss.

oyvindeh avatar oyvindeh commented on July 19, 2024

Actually, it seems like I may have a fix working. Was just able to crawl 24k+ pages without any crash. Will clean it up somewhat, do some more test runs, and hopefully publish it later today.

from ucss.

oyvindeh avatar oyvindeh commented on July 19, 2024

Published both fixes.

from ucss.

joeytwiddle avatar joeytwiddle commented on July 19, 2024

Many thanks oyvindeh. Your fixes have prevented all the errors here, and our crawl of 700 pages is now completing! It seems it was worth the wait too...

Total: 1446 (87 used, 1353 unused, 177 duplicates, 5 ignored)

I do wonder if the report would still be shown if an error did occur (e.g. if a webserver incorrectly returned a binary file as text/html).

from ucss.

oyvindeh avatar oyvindeh commented on July 19, 2024

No problem, happy that I could help! ๐Ÿ‘

That's a lot of unused CSS! (Be aware that if your page is JavaScript heavy, and classes are added using JavaScript, they will not be captured.)

There are errors that will crash the script. E.g., sometimes modules used, or V8, will just make the script exit. However, my error handling can improve as well. The case you mention is such a case, and I have a TODO item about that. Will look further into this later, so I keep this issue open.

from ucss.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.