Giter Club home page Giter Club logo

Comments (11)

kostiklv avatar kostiklv commented on July 17, 2024

Are you generating it dynamically (from a controller) or by using dump command?

from prestasitemapbundle.

kl3ryk avatar kl3ryk commented on July 17, 2024

Yes I'm using dumper command with listeners. I think it could be ok if some files will be generated when constraints are exceeded. Eg.:

we have here: https://github.com/prestaconcept/PrestaSitemapBundle/blob/master/Sitemap/XmlConstraint.php

const LIMIT_ITEMS = 49999;

If this constraint will be exceeded we are creating first file default.xml, and remove whole url set from the memory (50k objects). And if we are still adding url to urlset and if we will again exceed limit another file is created - default_0.xml

:)

And also possibility to change limits in config.yml could be great feature :)

from prestasitemapbundle.

iamdey avatar iamdey commented on July 17, 2024

I'll try fix this asap ... I think in two weeks min. You are free to suggest a PR :)

The limits defined as constants comes from specifications in http://www.sitemaps.org/protocol.html
These limits can be different for each websearch crawlers. For google, you can provide 50MB by files instead of 10MB.

In conclusion, I want to keep the Sitemap generation as far as possible from symfony (to be able to use it as an external library in the future). And I think these limits can be tweaked by overriding with a custom class (eg. GoogleXmlConstraint).

@kl3ryk can you provides more informations about this bug ? The number of urls and sitemaps intended to be generated. The decorators used. Some part of your listener that can help to reproduce.
Thanks.

from prestasitemapbundle.

kl3ryk avatar kl3ryk commented on July 17, 2024

https://gist.github.com/kl3ryk/5720247 here is the code which I'm using to generate sitemaps. As you can see there is abstract class - it is because I have about 15 different sitemaps and i moved common part to SitemapListener.

As you can see there is also pagination (https://gist.github.com/kl3ryk/5720247#file-sitemaplistener-php-L35). I have added it to avoid memory limit problems and it was working till urls count has grown.

So again i do some improvements here - https://gist.github.com/kl3ryk/5720247#file-sitemaplistener-php-L48.

And now it is working but in few days it will die again :).

Problem is that in memory there are together my data from database and whole urlset. I tweaked all places where my data are used and I'm removing them from memory as soon as they are not needed.

Number of urls being generated is about 60000, without decorators.

from prestasitemapbundle.

kl3ryk avatar kl3ryk commented on July 17, 2024

If this change will be applied this problem: https://github.com/prestaconcept/PrestaSitemapBundle/blob/master/Sitemap/Urlset.php#L63 will no longer exits :).

from prestasitemapbundle.

Koc avatar Koc commented on July 17, 2024

It would be nice if we can flush sitemap.

$limit = 500;
$skip = 0;
do {
  $result = $db->retriveDataSlice($limit, $skip);
  foreach ($result as $row) {
    $urlset->addUrl($row);
  }
  $skip += $limit;
  $urlSet->flush(); // dump sitemap to file/cache, release used memory etc
} while (count($result))

from prestasitemapbundle.

iamdey avatar iamdey commented on July 17, 2024

@kl3ryk I try to bench the dumper with basic set or url and for now it seems it is pretty fast.
Well I like @Koc's idea (flush).
I will add this to dumper command probably next Friday.

Well I still think there's something to do with listener
https://gist.github.com/kl3ryk/5720247#file-sitemaplistener-php-L37 -> can you set the service outside of the loop ?
Did you take a look to yield that meant to be faster than foreach ?
etc.

from prestasitemapbundle.

Koc avatar Koc commented on July 17, 2024

@esion does when we dumping sitemap it stores on memory? Or it every addUrl stores changed to disk?

from prestasitemapbundle.

kostiklv avatar kostiklv commented on July 17, 2024

The dumper command was specifically designed to avoid storing anything in memory. It writes the sitemaps into temporary files line by line, and then merges the file by reading it in small chunks.

Are you running the command with --env=prod --no-debug? If not - then I'm sure it's symfony loggers who eat the memory. If you are - then check your providers, maybe something is eating memory when preparing URLs before the dumper?

We developed the dumper command specifically to be able to process large sitemaps (500K URLs), and it's working fine with standard memory limits.

from prestasitemapbundle.

kl3ryk avatar kl3ryk commented on July 17, 2024

Nope i was using it without those flags - thanks i will try it.

from prestasitemapbundle.

iamdey avatar iamdey commented on July 17, 2024

Well I close this issue since we have no proof the problem comes from sitemap dumper.
@kl3ryk, feel free to re-open it if you find any clues.

from prestasitemapbundle.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.