Giter Club home page Giter Club logo

Comments (7)

sofam avatar sofam commented on July 21, 2024

Just an update, I upgraded to the latest version of Cm_RedisSession. If I switch one of the web nodes to use memcached as session backend, I get more requests through for that web node (12 instead of 2 / sec). Redis is set up to not save and not do appendfsync. I am using separate redis instances for sessions and cache.

from cm_redissession.

colinmollenhour avatar colinmollenhour commented on July 21, 2024

Wow, that is a lot of nodes. I am really curious, how many key requests per second does your Redis server get?

Some notes:

I don't see any reason for max_concurrency to be so high. I don't remember writing the 10% guidance but I think 5 is generally a good max. If you have some special need (e.g. load a huge gallery of image that are generated real-time) you could raise it further. This variable is mainly to prevent one user from finding a slow page on your site and hammering it to lock up your web server. After they surpass max_concurrency they would just start getting 503 errors.

It sounds like since you can switch a single node to memcached your cluster uses sticky sessions? If you are already using sticky sessions then one possibility is to run a separate Redis process for each app node. E.g. if you have 16 app nodes with 16GB of RAM each, run Redis on localhost on each one and dedicate 1GB of RAM to it and you have enough for around 11 million sessions. Of course if one node goes down you lose 1/16th of your sessions.

Serving 3.9 million sessions with no persistence would scare the crap out of me.. IMO session persistence is an absolute must since the cost of losing all of your sessions is annoyed customers and a lot of lost sales. If you aren't using SSDs, persisting to a location on an SSD should help. There are also some code tweaks you can make to Magento to remove some data duplication which will help shrink your storage cost and thereby help reduce your time to persist to disk.

If your Redis backend is simply too large to persist to disk (without locking up your site) then you are approaching the limits of a single Redis node. Options:

  1. Shard using your load balancer's sticky sessions feature. That is, have each node connect to a separate Redis process whether those are on separate machines or one machine.
  2. Add a clustering feature using Credis_Cluster and run say 16 separate Redis processes and have them persist to disk in alternating time increments.
  3. Porting the Redis session code to a database that is better at persistence, such as MongoDb.

The log messages include a lock wait count. If the lock wait count is high (e.g. number of seconds minus 1) then there is a locking contention issue. If it is low then the delay is due to something other than locking.

from cm_redissession.

parhamr avatar parhamr commented on July 21, 2024

I added the 10% bit to the config, based upon some rough testing. It’s guidance, but I haven’t had the chance to verify if things go wrong when that’s set too high (I only know it’s problematic for search bots when set too low).

from cm_redissession.

sofam avatar sofam commented on July 21, 2024

Hi, first of all thanks for the quick and in depth reply, we're actually not using sticky sessions, the reason for 16 nodes is so we could scale out horizontally instead of making fewer larger nodes.
We're also using Varnish in front of the webnodes and the Turpentine module for magento for handling the ESI etc.

During peak hours I can see upwards of 4500 instantaneous ops / sec on redis. Is there another metric I should be looking at?

I'll try adjusting the value of max concurrency down to 5 and start from there.

Have you seen anything similar to this behaviour before? The redis node isn't even that loaded really, and disabling saving momentarily doesn't affect performance at all, so saving to disk isn't an issue.

All other backend services seem to be operating normally as well, but when you see a traffic peak it really starts hurting, and the only thing I've been able to find so far to point me in any direction is the redis session handling.

from cm_redissession.

parhamr avatar parhamr commented on July 21, 2024

Redis readily scales to hundreds of thousands of operations per second.

Here’s an example cluster with 11 web nodes that has been observed at a combined throughput of 11,500 operations per second (typically with more than 6,000,000 sessions in Redis).

from cm_redissession.

sofam avatar sofam commented on July 21, 2024

Oh and I of course missed the last part, the lock wait count is pretty high, low numbers around 4-11 and upwards of 60-90 at worst. What can I do to fix the lock contention?

from cm_redissession.

colinmollenhour avatar colinmollenhour commented on July 21, 2024

Looking at the code it seems the timing reported was different depending on your log level so I just pushed a fix for this. If you were below debug then it was reporting the write time as the total time since the class was instantiated rather than just the time to write the session.

Lowering the max_concurrency is definitely a good idea IMO. Most bots (including GoogleBot) don't even use cookies so max_concurrency won't apply to them. If a user has a really fast finger and you have a really slow page load, a user might be able to get up to 5 with some practice.

Given that in your config break_after_frontend is 6, I don't see how any more than ~6 seconds of the request time can be attributed to the session backend. If the lock number is 90, then I would think that means there are 15 or more processes fighting for the lock. Reducing max_concurrency will result in some 503s for this user but should reduce lock contention issues. If your "attempts" number is greater than 6 then there is a bug.

The good thing is that these locking issues are mostly harmless. Unless you have a high percentage of issues to users I would just make the changes I suggested and ignore the lock issue reports.

from cm_redissession.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.