Giter Club home page Giter Club logo

apostrophe-site-map's Introduction

apostrophe-site-map

This module generates XML and plaintext sitemaps for sites powered by the Apostrophe CMS.

It serves two purposes: white-hat SEO and content strategy.

SEO with sitemaps

A frequently updated and accurate XML sitemap allows search engines to index your content more quickly and spot new pages immediately. But an out-of-date sitemap is worse than nothing and will damage your site's SEO.

This module generates a sitemap that includes all of the pages on your site that are visible to the public, including "pieces" such as events, and blog posts. And it does so dynamically, with a short cache lifetime, so your sitemap is not out of date.

How to use it

  • Install the module.

npm install --save apostrophe-site-map

  • Configure it in app.js, as one of your modules.
{
  // You should configure `baseUrl` to ensure full URLs in your sitemap
  baseUrl: 'http://example.com',
  modules: {
    'apostrophe-site-map': {
      // array of doc types you do NOT want
      // to include, even though they are
      // accessible on the site. You can also
      // do this at the command line.
      excludeTypes: []
    }
  }
}

Alternative configuration

If you don't like to modify/overwrite the baseUrl for the site or keep the site without a baseUrl, you can add baseUrl in the configuration of the module:

{
  // No baseUrl here
  modules: {
    'apostrophe-site-map': {
      baseUrl: 'http://example.com',
      excludeTypes: []
    }
  }
}
  • Just launch your site as you normally would. In development that might just be:
node app
  • Access http://localhost:3000/sitemap.xml (in production, of course, the hostname is different).

AN IMPORTANT WARNING: if you ALREADY have a STATIC public/sitemap.xml file, THAT FILE WILL BE SENT INSTEAD. Remove it. Also, SITEMAPS ARE CACHED for one hour by default, so you won't see changes instantly. Read on for how to change the cache lifetime, and what you can realistically expect from Google.

Clearing the cache, and changing the cache lifetime

To better support multiple-server environments, this module now serves sitemaps directly and caches them in your database. That way we don't have to worry about whether a static file exists in a given environment, running the same task on multiple servers, etc.

By default sitemaps are cached for 1 hour. You can change this by specifying the cacheLifetime option to this module, in seconds. However, don't get too excited: Google usually does not check a sitemap more often than a few times a month.

You can clear the cache at any time with this command line task:

node app apostrophe-site-map:clear

This will force a new sitemap to be generated on the next request.

Generating the sitemap ahead of time

You can use this command line task to update the sitemap in Apostrophe's cache at any time, rather than waiting for it to expire after an hour and generate again on the next request:

node app apostrophe-site-map:map --update-cache

If your site has many pages and pieces, generating the sitemap dynamically may take a long time. Scheduling the above task to run at least twice an hour via a cron job guarantees that a search engine will never be forced to wait when requesting your sitemap. If you have enough content, search engines may hang up before your sitemap is generated, so this task is very useful.

Generating sitemaps as static files

If you wish, you can generate a sitemap as a static file.

Just run this task:

node app apostrophe-site-map:map

When --update-cache is not given, this task generates an XML sitemap and displays it on the console. This is mostly useful for content strategy purposes. If your goal is to serve the sitemap to search engines, see above for a better way.

How to tell Google about your sitemap

Create a public/robots.txt file if you do not already have one and add a Sitemap line. Here is a valid example for a site that doesn't have any other robots.txt rules:

Sitemap: http://EXAMPLE.com/sitemap.xml

You can also have other robots.txt directives if you wish.

On Google's next crawl of your site it should pick up on the presence of the sitemap.

Changing the priority of pages and pieces

By default, an XML sitemap will assign a priority to a page based on its depth. The home page has a priority of 1.0 (the highest), a subpage of the home page 0.9, and so on.

Pieces receive a priority of 0.7; however if they have a startDate property (i.e. they are events) in the future, they bump up to 0.8, and if they have a startDate in the past they bump down to 0.6.

You can also set the priority yourself. Once you install this module you will discover that there is a new "sitemap priority" field in "page settings," and when editing a piece via the edit dialog box. You can set this field to any number between 0.0 and 1.0, with 1.0 being the highest.

As of this writing, Google suggests that they may use the priority to rank the importance of pages relatively within your site. Please do not set all the priorities to 1.0. It will only hurt your chances of communicating which pages are most important to Google.

Content strategy

You can also use this module just to generate a map of your site for your own study:

node app apostrophe-site-map:map --format=text --indent

The result is a very informative depth-first list of pages. Note the use of leading spaces to indicate depth:

/
  /about
    /about/people
    /about/ducklings
/products
  /products/cheesemaker

You'll want to pipe that to a text file and consider printing it.

The displayed "depth" of pieces won't always correspond directly to the pieces-pages that display them. You might want to exclude them when generating content strategy maps.

Warning: watch out for your custom stuff!

This module does the best it can.

It'll list your published pages, and your published pieces. And it'll rank future events higher than past events.

But it doesn't know anything about the custom URLs, independent of Apostrophe's usual mechanisms, that you're generating in your own creative and amazing modules.

If that's a concern for you, create lib/modules/apostrophe-site-map/index.js in your project, subclass the module, and override the custom method to output information about additional URLs. Note: if you have multiple locales via apostrophe-workflow this method is called once per locale. This method now receives req, locale, callback if written to accept three arguments.

It's straightforward: all you have to do is pass Apostrophe page objects, or anything else with an _url property and a siteMapPriority property, to self.output.

Here's a simple example. Note the use of self.host to get the "stem" of the URL (http://mysite.com).

For regular pages in the page tree, level starts at 0 (the home page) and increments from there for nested pages. For your own "pages," just keep that in mind. The higher the level, the lower the priority will be in the XML sitemap. Or pass thesiteMapPriority property explicitly.

This feature is not for changing priorities of existing pages and pieces. It is for your custom routes and dispatch URLs that the module cannot discover on its own. See the "page settings" dialog box or the edit dialog box for a field that lets you set the priority of an ordinary page or piece.

// lib/modules/apostrophe-site-map/index.js, at project level, not in node_modules
module.exports = {
  construct: function(self, options) {
    self.custom = function(req, locale, callback) {
      // Discover something via the database, then...
      self.output({
        _url: 'http://mysite.com/myspecialplace',
        // Defaults to 0.5 if not set and a `level` property
        // cannot be used to infer it
        siteMapPriority: 0.9
      });
      return callback(null);
    };
  }
};

Note that req only has the same privileges as an anonymous site visitor. If you call find methods with it, you will only see what typical site visitors see. This is good, because you don't want Google to index restricted pages.

How to exclude stuff

"I don't want thousands of blog posts in my sitemaps." OK, so do this in app.js when configuring the module:

Or do it in app.js when configuring the module:

  {
    'apostrophe-site-map': {
      excludeTypes: [ 'apostrophe-blog-post' ]
    }
  }

You may specify multiple doc types to exclude. You may also exclude page types the same way by adding their doc type to the array, e.g., styleguide.

You can also do this at the command line, which is helpful when generating a map just for content strategy purposes:

node app apostrophe-site-map:map --format=text --indent --exclude-types=apostrophe-blog

Alternatively, you can set the sitemap option to false when configuring any module that extends apostrophe-custom-pages or apostrophe-pieces.

You can also explicitly set it to true if you wish to have sitemaps for a piece type that is normally excluded, like apostrophe-users. Of course this will only help if they have a _url property when fetched, usually via a corresponding module that extends apostrophe-pieces-pages.

Removing the siteMapPriority field globally

You may wish to not include the siteMapPriority field on any pieces or pages. To do this, add a noPriority option set to true when configuring apostrophe-site-map in your app.js:

  {
    'apostrophe-site-map': { noPriority: true }
  }

Integration with the apostrophe-workflow module

If you are using the apostrophe-workflow module, the sitemap module will automatically fetch content for the live versions of all configured locales.

By default, the result will be emitted as a single sitemap. According to Google, this is OK, although you must claim all of the sites under a single identity in the Google webmaster console. However, if you would prefer a separate sitemap file for each hostname found in the absolute URLs, you can set the perLocale option to true when configuring the module.

Or, if you're generating static sitemaps at the command line, you can pass the --per-locale option.

When you set the perLocale option, sitemaps are served by the module from /sitemaps/fr.xml, /sitemaps/en.xml, etc., and a sitemap index is served from /sitemaps/index.xml. Make sure you list /sitemaps/index.xml for your Sitemap directive in robots.txt.

If you generate static files instead with the apostrophe-site-map:map task, a physical public/sitemap folder is created. IF YOU CHANGE YOUR MIND AND WISH TO LET THE MODULE SERVE SITEMAPS FOR YOU, REMOVE THIS FOLDER. Otherwise the static files will always "win."

If the perLocale option is set to true for the module or the --per-locale command line parameter is passed, the --file command line parameter is ignored unless --format=text is also present. This allows you to still use the module for content strategy.

Performance

If you have thousands of pieces, building the sitemap may take a long time. By default, this module processes 100 pieces at a time, to avoid using too much memory. You can adjust this by setting the piecesPerBatch option to a larger number. However, be aware that if you have many fields and joins, it is possible to use a great deal of memory this way.

modules: {
  {
    'apostrophe-site-map': {
      piecesPerBatch: 500
    }
  }
}

Rewriting URLs

Normally the URLs output by this module are just what you'll want. However if Apostrophe is acting as a headless backend the URLs generated in the sitemap will point to that backend site and not necessarily to the right public URL. To customize the URLs, override the rewriteUrl method at project level, like this:

// in your lib/modules/apostrophe-site-map/index.js file at project level
// (do not alter it in node_modules)
module.exports = {
  construct(self, options) {
    self.rewriteUrl = url => {
      return url.replace('https://onesite.com', 'https://anothersite.com');
    };
  }
};

Getting the page tree programmatically

This module's primary purpose is creating a sitemap for Google and other search engines, but it is also useful in creating a sitemap for end users.

In order to build a sitemap page, you can use the method self.getPageTree from this module. It returns the nested pages and pieces pages in the right order. For each page you can access the array _children recursively to render the pages links at the right level.

This method has a large performance impact each time it is called on a site with a large page tree, or many pieces reachable via pieces-pages. Strongly consider caching the response for a period of time.

It is possible to exclude some pages or pieces types only for the page tree, without impacting the normal sitemap.xml generation. The excludeTypes option will exclude types from the sitemap file and from the getPageTree method. The excludeTypesFromPageTree option will exclude types only from the getPageTree method.

  {
    'apostrophe-site-map': {
      excludeTypesFromPageTree: [ 'article' ]
    }
  }

apostrophe-site-map's People

Contributors

abea avatar bgantick avatar boutell avatar chcap avatar houmark avatar jsumnersmith avatar krissimon avatar nagy-norbie avatar plantainrain avatar timashev avatar valjed avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

apostrophe-site-map's Issues

Sitemap generated as plaintext instead of xml when using it with apostrophe-workflow

PLEASE NOTE: make sure the bug exists in the latest patch level of the project. For instance, if you are running a 2.x version of Apostrophe, you should use the latest in that major version to confirm the bug.

To Reproduce

Step by step instructions to reproduce the behavior:

  1. Go to http://localhost:3000/sitemap.xml
  2. See the sitemap as a plaintext

Expected behavior

Generate the sitemap as a xml

Describe the bug

When using this module with apostrophe-workflow the sitemap is generated as a plaintext instead of a xml file

Screenshots

Here you have a sample project
apostrophe-boilerplate.zip

image

arrangeFields options ignored, always appears in its own `Info` group

Steps to reproduce:

  1. Install
  2. Create an apostrophe-pieces module
  3. Specify arrangeFields option and place the siteMapPriority field into some group
  4. Run project, open that pieces content type

Expected behavior

siteMapPriority field appears in group specified

Actual behavior

The siteMapPriority field always added to info group

Sample apostrophe-pieces module configuration

// in this example, the field is added to the 'info' group
module.exports = {
  name: 'something',
  extend: 'apostrophe-pieces',
  arrangeFields: [{
    name: 'sitemap',
    label: 'Site Map XML',
    fields: ['siteMapPriority']
  }]
};

// in this example, the field is added to a duplicate 'info' group
module.exports = {
  name: 'something',
  extend: 'apostrophe-pieces',
  arrangeFields: [{
    name: 'info',
    label: 'Info',
    fields: ['published', 'slug', 'tags', 'siteMapPriority']
  }]
};

Other Notes

I've tried a fix for this by modifying the way the fields are registered by this module, but no effect.

Any thoughts?

Sitemap generation error on browser

To Reproduce

  1. Visit the site: https://mysite.com/sitemap.xml

  2. This error will be shown on the browser:

XML Parsing Error: no root element found
Location: https://mysite.com/sitemap.xml
Line Number 1, Column 1:
  1. Visit the same site: again after waiting for few minutes

  2. This error will be shown in the browser:

error

This page isn’t working www.mysite.com took too long to respond. HTTP ERROR 504

  1. Visit the same site: again (third time) but only after waiting longer (few more minutes)

  2. Now the sitemap will be output correctly on the browser window

Expected behavior

The sitemap file is not always output correctly on the browser window and many times the browser show different errors

Describe the bug

Not sure why this keeps happening but this error is not consistently replicated while probably it has to do with the way the module generates the sitemap keeping in mind that all default configuration settings were used. Also not sure if there is cache setting added by the default settings. No manual command line tasks or cron jobs were used to generate the XML file

Details

Server Operating System:
Ubuntu/nginx

Version of Node.js:
8.9.3

Version of Apostrophe
2.94.1

Version of Apostrophe site map
2.5.0

Additional context:

These errors even though are different each time I visit the browser, were always caused when the browser keeps waiting for the XML sitemap to be generated and after few seconds the browser starts to show all these errors.

This is not consistent and can not always be replicated because the XML sitemap is output correctly on some occasions and on many others is not

Screenshots

504-error-chrome

Include a self-referencing <link rel="hreflang"> tag for each <url> on localized sites

First of all, this module is a real gem – it works much as you'd expect right out of the box!

I noticed something that relates to the apostrophe-workflow integration however. It's the fact that Google's guidelines state the following:

Each <url> element must have a child element <xhtml:link rel="alternate" hreflang="supported_language-code"> that lists every alternate version of the page, including itself.

This module doesn't include a self-referencing <xhtml:link rel="alternate"> tag for each <url> tag however, but only links to other locales' variant of the same document.

I am happy to submit a PR for this, I just wanted to raise the issue before doing so.

sitemap: false not working for excluding piece from sitemap

To Reproduce

Step by step instructions to reproduce the behavior:

  1. Add sitemap: false to a piece's options.

Expected behavior

Piece should be removed from sitemap.xml.

Describe the bug

Piece is not removed from sitemap.xml.

Details

Version of Node.js:
12.19.0

Server Operating System:
macOS 10.15.7 - Catalina

Additional context:
Using excludeTypes option in the site's app.js did work for me but the above method did not.

Screenshots
None

Remove the Priority and Changefreq tags

I received this yesterday from an SEO consultancy working on a client's SEO project in reference to the Apos site-map.

Remove the Priority and Changefreq tags included before each URL. Google pays no mind to these and removing will help decrease the page load burden.

I've requested more information and am starting this to get it in the discussion. "Load burden" does seem a bit much for most site-maps, but if it's big enough maybe it matters.

Works with apostrophe 2.1.2?

Sorry for polluting the lists ;)

shout site-map works with apostrophe 2.1.2? I get an error:

/node_modules/moog/index.js:313
          if (key.substr(0, 2) === '__') {
                  ^

TypeError: key.substr is not a function
    at /node_modules/moog/index.js:313:19
    at Function.forEach (/node_modules/moog/node_modules/lodash/dist/lodash.js:3298:15)
    at applyOptions (/node_modules/moog/index.js:309:11)
    at /build/node_modules/moog/index.js:226:15
    at iterate (/node_modules/moog/node_modules/async/lib/async.js:146:13)
    at Immediate._onImmediate (/node_modules/moog/node_modules/async/lib/async.js:157:25)
    at tryOnImmediate (timers.js:534:15)
    at processImmediate [as _immediateCallback] (timers.js:514:5)

I am using:
node --version
v5.11.1

Pages without a top-level parent are excluded from the site-map

A client mentioned that some pages are missing from the site-map. It turns out these are pages that were children of a top-level page that's unpublished. It's unpublished since it's only needed for site organization and the nav uses an unlinked drop-down.

Due to this, those second level pages are missing:

    self.findPages = function(req) {
      return self.apos.pages.find(req, { level: 0 }).children({ depth: 20 });
    };

I poked a bit to just change this to all published pages, but it's clearly going to be more complicated than that.

Set sitemap url in robots.txt dynamically

Hello Apostrophe-team,

I have used this package in a project to produce the sitemap.xml for SEO. I have also added the xml path in robots.txt.

So currently I got a requirement that need to replace the sitemap.xml path dynamically in robots.txt file.

I have tried many ways to achieve the functionality but didn't succeed.

Please let me know how can achieve this kind of functionality.

Ignore posts with publication date in future

Hello,

we just added this package to our apostrophe installation. Almost everything worked out of the box, it skipped unpublished blogPosts, but published blogPosts with publicationDate and publicationTime in the future are shown in the sitemap.

Do we use this wrong or is there a bug?

Thanks again :)

The alternative configuration method for setting the baseUrl is not setting on the link/url structure in sitemap.xml

To Reproduce

Step by step instructions to reproduce the behavior:

  1. Setting the baseUrl using the alternative configuration method as explain per guide: https://github.com/apostrophecms/apostrophe-site-map/#alternative-configuration
  2. Loading http://localhost:3000/sitemap.xml - The baseUrl is not showing on the links
  3. Have also run node app apostrophe-site-map:clear as well and reimplemented steps 1 and 2

Expected behavior

When applying the baseUrl in the 'apostrophe-site-map' as option, it should be include in the sitemap.xml generated as part of the links/url structure. For eg 0.9 daily http://localhost:3000/about-us

Describe the bug

When applying the baseUrl in the 'apostrophe-site-map' as option (as per https://github.com/apostrophecms/apostrophe-site-map/#alternative-configuration), it should be included in the sitemap.xml as part of the links/url structure.
for eg 0.9 daily /about-us

Details

Version of Node.js:
v8.11.4

Server Operating System:
MacOS Mojave

Additional context:

When using the main baseUrl configuration, it does prepend the baseUrl to the links/url in the sitemap.xml - for eg 0.9 daily http://localhost:3000/about-us

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.