Giter Club home page Giter Club logo

Comments (10)

tacman avatar tacman commented on May 25, 2024 1

Also, I'd like to migrate this to psr-4, and separate the classes into their own files. Or perhaps you should do that, it's likely a BC.

from phpscraper.

tacman avatar tacman commented on May 25, 2024

Sigh. Version 6 no longer including a manager, so using this requires a cache and mechanism for fetching and storing the rules.

It's not too difficult, but it's not a trivial syntax change either (though there are some namespace changes).

Thoughts on how to proceed?

from phpscraper.

spekulatius avatar spekulatius commented on May 25, 2024

Hey @tacman

yeah, I remember there were some structural changes in the package. Do you think you can get it done? Namespaces can usually be replaced easily most of the time.

Cheers,
Peter

from phpscraper.

tacman avatar tacman commented on May 25, 2024

OK. There are 2 approaches. The easiest is to download the rules file, add it to the repo, and then load it. Of course, the rules will become stale.

The better approach require a dependency on a cache. Then we can fetch the rules like this, which will update the rules every 24 hours:

    public function getTldCollection(): Rules
    {
        $cache = new FilesystemAdapter(); // or some other cache.

        $rules = $cache->get('pdp_rules', function (ItemInterface $item) {
            // The callable will only be executed on a cache miss.
            $item->expiresAfter(3600 * 24);
            $response = $this->client->request(
                'GET',
                PsrStorageFactory::PUBLIC_SUFFIX_LIST_URI
            );
            return $response->getContent();
        });

        $publicSuffixList = Rules::fromString($rules);
        return $publicSuffixList;

I'll go ahead and implement this to make it functional, but I'm not sure how to code it so that the user can inject whatever cache they already have in their application.

from phpscraper.

tacman avatar tacman commented on May 25, 2024

Since you've tagged this as a new version, can we also bump to PHP8?

from phpscraper.

tacman avatar tacman commented on May 25, 2024

I started down the rabbit hole...

If phpscraper needs a cache for the domain parse, a CacheInterface cache should probably be injected. But that means the phpscaper should itself be a service that's injected, rather than called with new phpscaper().

Alternatively, we can add a CacheAwareInterface, and add the cache via a method call.

Alas, I'm not as expert in this as I'd like to be!

from phpscraper.

spekulatius avatar spekulatius commented on May 25, 2024

The question of the cache was stopped me too. I was actually thinking of storing a file/set of files somewhere to avoid handling the questions of integration, especially with simple VanillaPHP projects (where PHPscraper comes in handy for me most).

from phpscraper.

spekulatius avatar spekulatius commented on May 25, 2024

Hey @tacman,

Have you made progress implementing a cache? I've seen this commit and was wondering if we can get a framework agnostic-solution working. I'd still try to avoid injecting a CacheInterface as it is framework dependent. Happy to hear your thoughts!

Cheers,
Peter

from phpscraper.

spekulatius avatar spekulatius commented on May 25, 2024

Alternatively either spatie/url or thephpleague/uri could replace jeremykendall/php-domain-parser. I still need to confirm if the libs are suitable for the job tho.

from phpscraper.

spekulatius avatar spekulatius commented on May 25, 2024

For now we are using league/uri for URL processing, with this the subdomain-specific filtering has been dropped.

from phpscraper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.