Giter Club home page Giter Club logo

Comments (5)

 avatar commented on September 2, 2024 1

What I interpret this as (and @sian1468, please correct me if wrong) is an option to only follow links if the links are in the same domain.

Normally I would think this is covered by domain restriction but I think what is being suggested is to ignore the "hops" restriction and keep following any links that are in the same domain. This is somewhat similar to "archive the whole site" provided the entire site is inter-linked from the starting seed.

You correct @machawk1

I think Squidwarc can do more than capture from depth setting by capture whole site with single/by depth page's offsite links or without offsite links setting.

from squidwarc.

machawk1 avatar machawk1 commented on September 2, 2024

What I interpret this as (and @sian1468, please correct me if wrong) is an option to only follow links if the links are in the same domain.

Normally I would think this is covered by domain restriction but I think what is being suggested is to ignore the "hops" restriction and keep following any links that are in the same domain. This is somewhat similar to "archive the whole site" provided the entire site is inter-linked from the starting seed.

from squidwarc.

N0taN3rd avatar N0taN3rd commented on September 2, 2024

@sian1468 thanks you for suggesting this and I believe your suggestion would be an excellent feature for Squidwarc.

I will be putting some thought into how to accomplish this nicely alongside the existing crawl modes.
Do you have any suggestions as to how you would like to be able to specify this crawl mode?

from squidwarc.

 avatar commented on September 2, 2024

@sian1468 thanks you for suggesting this and I believe your suggestion would be an excellent feature for Squidwarc.

I will be putting some thought into how to accomplish this nicely alongside the existing crawl modes.
Do you have any suggestions as to how you would like to be able to specify this crawl mode?

Recursive crawl mode

I got inspiration from other archiving tools & software eg. Wpulll, grab-site and crocoite

from squidwarc.

N0taN3rd avatar N0taN3rd commented on September 2, 2024

implemented and merged into master PR #47

from squidwarc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.