Comments (11)
Are you generating it dynamically (from a controller) or by using dump command?
from prestasitemapbundle.
Yes I'm using dumper command with listeners. I think it could be ok if some files will be generated when constraints are exceeded. Eg.:
we have here: https://github.com/prestaconcept/PrestaSitemapBundle/blob/master/Sitemap/XmlConstraint.php
const LIMIT_ITEMS = 49999;
If this constraint will be exceeded we are creating first file default.xml, and remove whole url set from the memory (50k objects). And if we are still adding url to urlset and if we will again exceed limit another file is created - default_0.xml
:)
And also possibility to change limits in config.yml could be great feature :)
from prestasitemapbundle.
I'll try fix this asap ... I think in two weeks min. You are free to suggest a PR :)
The limits defined as constants comes from specifications in http://www.sitemaps.org/protocol.html
These limits can be different for each websearch crawlers. For google, you can provide 50MB by files instead of 10MB.
In conclusion, I want to keep the Sitemap generation as far as possible from symfony (to be able to use it as an external library in the future). And I think these limits can be tweaked by overriding with a custom class (eg. GoogleXmlConstraint).
@kl3ryk can you provides more informations about this bug ? The number of urls and sitemaps intended to be generated. The decorators used. Some part of your listener that can help to reproduce.
Thanks.
from prestasitemapbundle.
https://gist.github.com/kl3ryk/5720247 here is the code which I'm using to generate sitemaps. As you can see there is abstract class - it is because I have about 15 different sitemaps and i moved common part to SitemapListener.
As you can see there is also pagination (https://gist.github.com/kl3ryk/5720247#file-sitemaplistener-php-L35). I have added it to avoid memory limit problems and it was working till urls count has grown.
So again i do some improvements here - https://gist.github.com/kl3ryk/5720247#file-sitemaplistener-php-L48.
And now it is working but in few days it will die again :).
Problem is that in memory there are together my data from database and whole urlset. I tweaked all places where my data are used and I'm removing them from memory as soon as they are not needed.
Number of urls being generated is about 60000, without decorators.
from prestasitemapbundle.
If this change will be applied this problem: https://github.com/prestaconcept/PrestaSitemapBundle/blob/master/Sitemap/Urlset.php#L63 will no longer exits :).
from prestasitemapbundle.
It would be nice if we can flush
sitemap.
$limit = 500;
$skip = 0;
do {
$result = $db->retriveDataSlice($limit, $skip);
foreach ($result as $row) {
$urlset->addUrl($row);
}
$skip += $limit;
$urlSet->flush(); // dump sitemap to file/cache, release used memory etc
} while (count($result))
from prestasitemapbundle.
@kl3ryk I try to bench the dumper with basic set or url and for now it seems it is pretty fast.
Well I like @Koc's idea (flush).
I will add this to dumper command probably next Friday.
Well I still think there's something to do with listener
https://gist.github.com/kl3ryk/5720247#file-sitemaplistener-php-L37 -> can you set the service outside of the loop ?
Did you take a look to yield that meant to be faster than foreach ?
etc.
from prestasitemapbundle.
@esion does when we dumping sitemap it stores on memory? Or it every addUrl
stores changed to disk?
from prestasitemapbundle.
The dumper command was specifically designed to avoid storing anything in memory. It writes the sitemaps into temporary files line by line, and then merges the file by reading it in small chunks.
Are you running the command with --env=prod --no-debug
? If not - then I'm sure it's symfony loggers who eat the memory. If you are - then check your providers, maybe something is eating memory when preparing URLs before the dumper?
We developed the dumper command specifically to be able to process large sitemaps (500K URLs), and it's working fine with standard memory limits.
from prestasitemapbundle.
Nope i was using it without those flags - thanks i will try it.
from prestasitemapbundle.
Well I close this issue since we have no proof the problem comes from sitemap dumper.
@kl3ryk, feel free to re-open it if you find any clues.
from prestasitemapbundle.
Related Issues (20)
- Repository service id ? HOT 1
- How to include translated routes in dump file ? HOT 2
- Cancel <lastmod> field HOT 1
- Multi-domain errors cause sitemapindex XML confusion
- Creation of sitemap.xml files in sub directory HOT 4
- Deprecation Warning as of Symfony 6.2 HOT 3
- Wrong Domain in Index Sitemap if multiple Tenants are in use HOT 4
- Decorating in AddUrlEvent is impossible HOT 2
- S3 compliant HOT 2
- Fill field lastmod on a per url basis ? HOT 2
- Help: No URLs were added to sitemap by EventListeners - this may happen when provided section is invalid HOT 2
- Urlset->add() doesn't use Url->lastmod() for the Urlset lastmod HOT 3
- Drop Symfony 4.4, add Symfony 7 HOT 4
- Add /robots.txt route HOT 2
- Add (or Document) how to get the section list from the sitemap service HOT 1
- Add option for xsl
- route context not generating as expected HOT 5
- Routes generated twice in the XML HOT 3
- Allow to add custom URLs to sitemap index without populating section HOT 12
- Namespace Issue in Sitemap HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from prestasitemapbundle.