Giter Club home page Giter Club logo

sitemap's Introduction

Sitemap

XML Sitemap and XML Sitemap Index builder.

GitHub release (latest SemVer) GitHub Workflow Status (branch) Packagist PHP Version Support Packagist Downloads Packagist Downloads GitHub

Features

  • Create sitemap files: either regular or gzipped.
  • Create multi-language sitemap files.
  • Create sitemap index files.
  • Use custom stylesheet.
  • Automatically creates new file if either URL limit or file size limit is reached.
  • Fast and memory efficient.

Installation

Installation via Composer is very simple:

composer require samdark/sitemap

After that, make sure your application autoloads Composer classes by including vendor/autoload.php.

How to use it

use samdark\sitemap\Sitemap;
use samdark\sitemap\Index;

// create sitemap
$sitemap = new Sitemap(__DIR__ . '/sitemap.xml');

// add some URLs
$sitemap->addItem('http://example.com/mylink1');
$sitemap->addItem('http://example.com/mylink2', time());
$sitemap->addItem('http://example.com/mylink3', time(), Sitemap::HOURLY);
$sitemap->addItem('http://example.com/mylink4', time(), Sitemap::DAILY, 0.3);

// set sitemap stylesheet (see example-sitemap-stylesheet.xsl)
$sitemap->setStylesheet('http://example.com/css/sitemap.xsl');

// write it
$sitemap->write();

// get URLs of sitemaps written
$sitemapFileUrls = $sitemap->getSitemapUrls('http://example.com/');

// create sitemap for static files
$staticSitemap = new Sitemap(__DIR__ . '/sitemap_static.xml');

// add some URLs
$staticSitemap->addItem('http://example.com/about');
$staticSitemap->addItem('http://example.com/tos');
$staticSitemap->addItem('http://example.com/jobs');

// set optional stylesheet (see example-sitemap-stylesheet.xsl)
$staticSitemap->setStylesheet('http://example.com/css/sitemap.xsl');

// write it
$staticSitemap->write();

// get URLs of sitemaps written
$staticSitemapUrls = $staticSitemap->getSitemapUrls('http://example.com/');

// create sitemap index file
$index = new Index(__DIR__ . '/sitemap_index.xml');

// set index stylesheet (see example in repo)
$index->setStylesheet('http://example.com/css/sitemap.xsl');

// add URLs
foreach ($sitemapFileUrls as $sitemapUrl) {
    $index->addSitemap($sitemapUrl);
}

// add more URLs
foreach ($staticSitemapUrls as $sitemapUrl) {
    $index->addSitemap($sitemapUrl);
}

// write it
$index->write();

Multi-language sitemap

use samdark\sitemap\Sitemap;

// create sitemap
// be sure to pass `true` as second parameter to specify XHTML namespace
$sitemap = new Sitemap(__DIR__ . '/sitemap_multi_language.xml', true);

// Set URL limit to fit in default limit of 50000 (default limit / number of languages) 
$sitemap->setMaxUrls(25000);

// add some URLs
$sitemap->addItem('http://example.com/mylink1');

$sitemap->addItem([
    'ru' => 'http://example.com/ru/mylink2',
    'en' => 'http://example.com/en/mylink2',
], time());

$sitemap->addItem([
    'ru' => 'http://example.com/ru/mylink3',
    'en' => 'http://example.com/en/mylink3',
], time(), Sitemap::HOURLY);

$sitemap->addItem([
    'ru' => 'http://example.com/ru/mylink4',
    'en' => 'http://example.com/en/mylink4',
], time(), Sitemap::DAILY, 0.3);

// set stylesheet (see example-sitemap-stylesheet.xsl)
$sitemap->setStylesheet('http://example.com/css/sitemap.xsl');

// write it
$sitemap->write();

Options

There are methods to configure Sitemap instance:

  • setMaxUrls($number). Sets maximum number of URLs to write in a single file. Default is 50000 which is the limit according to specification and most of existing implementations.
  • setMaxBytes($number). Sets maximum size of a single site map file. Default is 10MiB which should be compatible with most current search engines.
  • setBufferSize($number). Sets number of URLs to be kept in memory before writing it to file. Default is 10. Bigger values give marginal benefits. On the other hand when the file size limit is hit, the complete buffer must be written to the next file.
  • setUseIndent($bool). Sets if XML should be indented. Default is true.
  • setUseGzip($bool). Sets whether the resulting sitemap files will be gzipped or not. Default is false. zlib extension must be enabled to use this feature.
  • setStylesheet($string). Sets the xml-stylesheet tag. By default, tag is not generated. See example example-sitemap-stylesheet.xsl

There is a method to configure Index instance:

  • setUseGzip($bool). Sets whether the resulting index file will be gzipped or not. Default is false. zlib extension must be enabled to use this feature.
  • setStylesheet($string). Sets the xml-stylesheet tag. By default, tag is not generated. See example example-sitemap-stylesheet.xsl

Running tests

In order to run tests perform the following commands:

composer install
./vendor/bin/phpunit

sitemap's People

Contributors

atailouloute avatar bodograumann avatar craftyshaun avatar davidgoodwin avatar fr05t1k avatar jakubskrz avatar mougrim avatar paritoshbh avatar raulr avatar rdeanar avatar samdark avatar terales avatar theluk avatar wawan93 avatar wintersilence avatar zinovyev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sitemap's Issues

XMLWriter::writePi()

I need to add an xml-stylesheet to the sitemap, but can't access the $writer.
Is there a way to add a Pi?

Thanks!

Dependency on XMLWriter

Hi,

please add the dependency on XMLWriter explicitely into your composer.json:

"ext-xmlwriter": "*",

Cyrylica error

When i to add URL on Cyrylica - on Russian language", i get error from validation URL.

https://wxm/кекеё5

Why do you make validation URL? I think than it need removed. Validation must don't be related with this lib.

Fatal error: Uncaught InvalidArgumentException: The location must be a valid URL. You have specified: https://wxm/кекеё5. in ...vendor\samdark\sitemap\Sitemap.php:243 Stack trace: #0 ...vendor\samdark\sitemap\Sitemap.php(297): samdark\sitemap\Sitemap->validateLocation('https://wxm..') 
...vendor\samdark\sitemap\Sitemap.php(272): samdark\sitemap\Sitemap->addSingleLanguageItem('https://wxm_...', 1524153061, 'weekly', 0.3) #2 
.../index.php(13): samdark\sitemap\Sitemap->addItem('https://wxm/...', 1524153061, 'weekly', 0.3) #3 {main} 
...vendor\samdark\sitemap\Sitemap.php on line 243

Количество ссылок в карте

при мультиязыковой карте, количество ссылок в Число языков больше

нужно

$this->urlsCount = $this->urlsCount + count($location);

Usage in Symfony 2 framework

Thanks for well coded library. I successfully installed and using this library in Symfony2 framework. Just want to share my code that can be merged into readme and useful for someone else.

// app/config/services.yml
services:
sitemap.generator:
class: samdark\sitemap\Sitemap
arguments: ["%kernel.root_dir%/../web/sitemap.xml"]

Use it as a service
$sitemap = $this->getContainer()->get('sitemap.generator');

Unsupported german letters in URL

Sitemap.php : 171 throws Error, if an url contains either of ä, ü or ö.

Unfortunately there is no way of overriding, so I would suggest to extract the validation into a separate function, so that it could be overriding in a deriving class.

Error when run Composer Require

`Your requirements could not be resolved to an installable set of packages.

Problem 1
- Can only install one of: samdark/sitemap[2.0.0, dev-master].
- Can only install one of: samdark/sitemap[2.0.1, dev-master].
- Can only install one of: samdark/sitemap[2.0.2, dev-master].
- Can only install one of: samdark/sitemap[2.0.3, dev-master].
- Can only install one of: samdark/sitemap[2.0.4, dev-master].
- Can only install one of: samdark/sitemap[2.0.5, dev-master].
- Can only install one of: samdark/sitemap[2.0.6, dev-master].
- Can only install one of: samdark/sitemap[2.0.7, dev-master].
- Installation request for samdark/sitemap dev-master -> satisfiable by samdark/sitemap[dev-master].
- Installation request for samdark/sitemap ^2.0 -> satisfiable by samdark/sitemap[2.0.0, 2.0.1, 2.0.2, 2.0.3, 2.0.4, 2.0.5, 2.0.6, 2.0.7].

Installation failed, reverting ./composer.json to its original content.`

Multiple writes to the same compressed XML file, would break it in browsers

Here is a difference in files, there are two start sequences in multi-time-write.gz:

Two start sequences highlighted
Chrome, v60 response
Firefox,v54 response

It works fine with 7Zip decompressing, but browsers couldn't get it. So I'm worrying whether all search engines would correctly handle it for now and for future comparability.

Here is a sample repository to reproduce an issue:
https://github.com/terales/multi-writes-to-gzipped-file

I've prepared a workaround for my project with disabling gzip for Sitemap instances and compressing already generated files:

<?php
    $tempfile = fopen('php://temp/', 'r+');

    $sitemapRead = fopen($path, 'r');
    stream_copy_to_stream($sitemapRead, $tempfile);
    fclose($sitemapRead);

    rewind($tempfile);

    $sitemapWrite = fopen('compress.zlib://' . $path, 'w');
    stream_copy_to_stream($tempfile, $sitemapWrite);
    fclose($sitemapWrite);

    fclose($tempfile);

And I can't figure out any ways to fix this issue inside a library. Can you share your thoughts, so I can prepare a PR with a fix?

Не работает дозапись при генерации карты сайта

Смысл в том, что когда приходится генерировать сразу карту сайта большого размера (более 1 млн. строк) приходится делать ее генерацию рекурсией и отдавать вашему скрипту порциями по 25 тыс. строк за раз, чтобы скрипту не приходилось долго ждать ответа от сервака. Так вот при каждой отправке порции строк он тупо перезаписывает файл, а не дополняет его. Нельзя ли доработать так, чтобы скрипт добавлял записи, а не перезаписывал их по новой?

A special char (&#13;) is inserted in the sitemap.xml output

Hello,

I installed the package with composer (composer require samdark/sitemap) so I'm running version 2.2.0. My environment is Windows 7, PHP 7.2.13.

Here's the code I tested:

use samdark\sitemap\Sitemap;

$sitemap = new Sitemap(__DIR__ . '/sitemap.xml');
$sitemap->addItem('http://example.com/mylink1');
$sitemap->write();

And this is the output I get:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">&#13;
 <url>
  <loc>http://example.com/mylink1</loc>
 </url>
</urlset>

I think the &#13; is a bug... or did I miss something?

Sitemap minification

It will be good to have sitemap minified into one long line without space between tags. Idea is the same as JS, HTML, CSS minification.

Special Chars in URL

I am not sure but this throws an exception because of special chars in url. But it seems that special chars are very common now (i just asked my self when it was the time this switched...)

The location must be a valid URL. You have specified: https://example.com/künstliche-intelligenz

File: samdark/sitemap/Sitemap.php
Line: 243

(The original domain was: https://heartbeat.gmbh, which is a valid domain)

Why protected?

Why is the validateLocation function protected and not private?

Use of sitemap

Hi, how to append new items in 'sitemap.xml' after finished.

Sorry, but my english is not so good.
Thanks you.

Auto detection last modified date

It would be nice if the last modified date for each additional sitemap was determined automatically. For the sitemap-index, I mean...

Регенерация sitemap файла

На данный момент вижу что скрипт дописывает если файл сайтмап существует уже верно ?
Я так понимаю надо удалять файлы сайтмап перед регенерацией верно ?

Sometime a sitemap contains more than $maxUrls URLs

The problem lays in flush() function.

When a sitemap is truncated by size here:

if ($this->byteCount + $dataSize + $footSize > $this->maxBytes) {

finishFile() functions is called which leads to zeroing urlsCount variable, but right after that a remaining chunk is appended to a file here (with contains up to $bufferSize URLs):

$this->writerBackend->append($data);

Those URLs in that chunk aren't counted anywhere and a next sitemap overflows.

Не вышло запустить

Fatal error: Class 'samdark\sitemap\Sitemap' not found
Почитал аналогичные описания у Вас в ветке но так и не понял почему вылазит ошибка: немспейсы стоят корректно, шторм ошибок не показывает. Использовал пример с Вашего описания.

Add support for filesystem adapters

This is based on discussion in #24 where I want to be able to write the sitemaps and index to Amazon S3.

I think a very simple interface like this may work:

interface FilesystemAdapterInterface
{
    /**
     * Create a file or update if exists.
     *
     * @param string $path     The path to the file.
     * @param string $contents The file contents.
     *
     * @return bool True on success, false on failure.
     */
    public function put($path, $contents);
}

And then a FilesystemAdapter could be passed into the constructor and used when writing files.

Covering Units Tests for XML Stylesheets from #63

I'd love to see some unit tests validating behaviour but that's up to you. Could be merged w/o it.

I was planning to add these covering tests as part of v3 or as an additional issue post-merge.
I'll create an issue for the example sitemap and further tests (assuming it is not going to be re-written in 3.x)

Originally posted by @craftyshaun in #63 (comment)

Class not found

This is more a comment than an issue. I'm using this with Yii2.

I found I need:
use samdark\sitemap\Sitemap;
use samdark\sitemap\Index;

When I used:
use samdark\sitemap

classes Sitemap and Index were not found. Might be a stupid error on my part, but in case others are having the same issue, I thought I'd mention it.

what method to use?

I was reading your code, and if I am understanding for good, I haven't found a method that does a spider thing to build the whole sitemap, you must put one per one?

Ability to update a sitemap

It would be useful to have an ability to update a sitemap. The use case is where URLs are being added dynamically. Currently, I can't easily append just one URL to the sitemap.

how i run it

hi dear

i download sitemap and update with composer update command line

i am amator and need help you for run this script

for example i create index.php and set this code
////////////////

addItem(new SitemapItem( 'http://rmcreative.ru/', // URL time(), // last modifcation timestamp Item::DAILY, // update frequency 0.7 // priority )); // add more pages foreach ($pages as $page){ $sitemap->addItem(new SitemapItem( 'http://rmcreative.ru/' . $page->url, $page->updatedOn, Item::MONTHLY )); } // generate sitemap.xml $sitemap->writeToFile('sitemap.xml'); // or get it as string $sitemapString = $sitemap->render(); //////////////////////////////////////////// after run have more error Fatal error: Class 'samdark\sitemap' not found in C:\xampp\htdocs\samdark\index.php on line 9 how i run it????

[BUG] @ should not be used

Hello,

Thanks you for all this, you did a good job :)

In the futur, if you can remove the "@" from unlink (Sitemap.php, inside createNewFile()) to make your code compliant with the strict dev environnement, it would be nice.

if(is_file($filePath)){ unlink($filePath);}

Best regards,

V

Support of sitemaps index file

Google has support management multiple sitemap files and allow to submit them as one .
Need to create sitemap.index.xml file with list of sitemap files
Example:

<?xml version="1.0" encoding="UTF-8"?>
   <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      <loc>http://www.example.com/sitemap1.xml.gz</loc>
      <lastmod>2004-10-01T18:23:17+00:00</lastmod>
   </sitemap>
   <sitemap>
      <loc>http://www.example.com/sitemap2.xml.gz</loc>
      <lastmod>2005-01-01</lastmod>
   </sitemap>
   </sitemapindex>

https://support.google.com/webmasters/answer/75712?hl=en

Lastmod is the current date/time

This makes Google think it's blackhat/spam if the lastmod date for every single post is the same date/time when the site was compiled. I can't figure out how to get it to not do that without just removing , time() from the generator.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.