Giter Club home page Giter Club logo

Comments (12)

arm4b avatar arm4b commented on June 12, 2024

Stats:

from st2-packages.

arm4b avatar arm4b commented on June 12, 2024

https://circleci.com/gh/StackStorm/st2-packages/291
On el6 we had some logs:

last 20 lines from log file of service st2api
>>>
2016-01-27 15:35:19,160 140432054114048 ERROR log [-] Conflict while trying to save in DB.
Traceback (most recent call last):
  File "/usr/share/python/st2/lib/python2.7/site-packages/st2common/persistence/base.py", line 155, in add_or_update
    model_object = cls._get_impl().add_or_update(model_object)
  File "/usr/share/python/st2/lib/python2.7/site-packages/st2common/models/db/__init__.py", line 174, in add_or_update
    instance.save()
  File "/usr/share/python/st2/lib/python2.7/site-packages/mongoengine/document.py", line 286, in save
    raise NotUniqueError(message % unicode(err))
NotUniqueError: Tried to save duplicate unique keys (E11000 duplicate key error index: st2.trigger_d_b.$pack_1_name_1  dup key: { : "core", : "st2.sensor.process_exit" })
2016-01-27 15:35:19,170 140432054114048 ERROR log [-] Shadow trigger creation of "core.st2.sensor.process_exit" failed with uniqueness conflict.
Traceback (most recent call last):
  File "/usr/share/python/st2/lib/python2.7/site-packages/st2common/triggers.py", line 40, in _register_internal_trigger_type
    trigger_db = create_shadow_trigger(trigger_type_db)
  File "/usr/share/python/st2/lib/python2.7/site-packages/st2common/services/triggers.py", line 253, in create_shadow_trigger
    return create_or_update_trigger_db(trigger)
  File "/usr/share/python/st2/lib/python2.7/site-packages/st2common/services/triggers.py", line 154, in create_or_update_trigger_db
    trigger_db = Trigger.add_or_update(trigger_db)
  File "/usr/share/python/st2/lib/python2.7/site-packages/st2common/persistence/base.py", line 167, in add_or_update
    model_object=model_object)
StackStormDBObjectConflictError: Tried to save duplicate unique keys (E11000 duplicate key error index: st2.trigger_d_b.$pack_1_name_1  dup key: { : "core", : "st2.sensor.process_exit" })

When st2api failed to start. Not sure if it's real reason, probably another ghost bug. Will investigate more.

from st2-packages.

arm4b avatar arm4b commented on June 12, 2024

Ok, here: https://circleci.com/gh/armab/st2-packages/442 st2auth is reported to be not listening.
It's wheezy.

Was lucky to catch that via CircleCI SSH debug.

Service st2auth reports that everything is running:

root@06c3f4732296:/var/log# service st2auth status
[ ok ] st2auth is running.

No processes really for st2auth service at all:

root@06c3f4732296:/var/log# ps -aux | grep st2auth
warning: bad ps syntax, perhaps a bogus '-'?
See http://gitorious.org/procps/procps/blobs/master/Documentation/FAQ
root       1459  0.0  0.0   6308   596 ?        S+   18:00   0:00 grep st2auth

/var/log/st2auth.log:

2016-01-27 17:45:23,660 139834305980160 INFO log [-] Connecting to database "st2" @ "172.17.0.1:27017" as user "None".
2016-01-27 17:45:23,666 139834305980160 DEBUG log [-] Ensuring database indexes...
2016-01-27 17:45:23,766 139834305980160 DEBUG log [-] Ensuring indexes for model "UserDB"...
2016-01-27 17:45:23,785 139834305980160 DEBUG log [-] Ensuring indexes for model "TokenDB"...
2016-01-27 17:45:23,787 139834305980160 DEBUG log [-] Ensuring indexes for model "ApiKeyDB"...
2016-01-27 17:45:23,790 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionDB"...
2016-01-27 17:45:23,807 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionExecutionDB"...
2016-01-27 17:45:23,845 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionExecutionStateDB"...
2016-01-27 17:45:23,854 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionAliasDB"...
2016-01-27 17:45:23,863 139834305980160 DEBUG log [-] Ensuring indexes for model "LiveActionDB"...
2016-01-27 17:45:23,878 139834305980160 DEBUG log [-] Ensuring indexes for model "RunnerTypeDB"...
2016-01-27 17:45:23,881 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionAliasDB"...
2016-01-27 17:45:23,882 139834305980160 DEBUG log [-] Ensuring indexes for model "KeyValuePairDB"...
2016-01-27 17:45:23,897 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionExecutionDB"...
2016-01-27 17:45:23,898 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionExecutionStateDB"...
2016-01-27 17:45:23,898 139834305980160 DEBUG log [-] Ensuring indexes for model "LiveActionDB"...
2016-01-27 17:45:23,898 139834305980160 DEBUG log [-] Ensuring indexes for model "PolicyTypeDB"...
2016-01-27 17:45:23,904 139834305980160 DEBUG log [-] Ensuring indexes for model "PolicyDB"...
2016-01-27 17:45:23,909 139834305980160 DEBUG log [-] Ensuring indexes for model "RuleDB"...
2016-01-27 17:45:23,924 139834305980160 DEBUG log [-] Ensuring indexes for model "RunnerTypeDB"...
2016-01-27 17:45:23,925 139834305980160 DEBUG log [-] Ensuring indexes for model "SensorTypeDB"...
2016-01-27 17:45:23,934 139834305980160 DEBUG log [-] Ensuring indexes for model "TriggerTypeDB"...
2016-01-27 17:45:23,953 139834305980160 DEBUG log [-] Ensuring indexes for model "TriggerDB"...
2016-01-27 17:45:23,954 139834305980160 DEBUG log [-] Ensuring indexes for model "TriggerInstanceDB"...
2016-01-27 17:45:23,956 139834305980160 DEBUG log [-] Registering exchanges...
2016-01-27 17:45:23,984 139834305980160 DEBUG connection [-] Start from server, version: 0.9, properties: {u'information': u'Licensed under the MPL.  See http://www.rabbitmq.com/', u'product': u'RabbitMQ', u'copyright': u'Copyright
2016-01-27 17:45:23,986 139834305980160 DEBUG connection [-] Open OK!
2016-01-27 17:45:23,986 139834305980160 DEBUG channel [-] using channel_id: 1
2016-01-27 17:45:23,989 139834305980160 DEBUG channel [-] Channel open
2016-01-27 17:45:23,990 139834305980160 DEBUG log [-] registered exchange st2.execution.
2016-01-27 17:45:23,992 139834305980160 DEBUG log [-] registered exchange st2.liveaction.
2016-01-27 17:45:23,994 139834305980160 DEBUG log [-] registered exchange st2.trigger.
2016-01-27 17:45:23,996 139834305980160 DEBUG log [-] registered exchange st2.trigger_instances_dispatch.
2016-01-27 17:45:23,997 139834305980160 DEBUG log [-] registered exchange st2.sensor.
2016-01-27 17:45:24,006 139834305980160 DEBUG channel [-] Closed channel #1
2016-01-27 17:45:24,007 139834305980160 DEBUG log [-] Registering internal trigger: st2.generic.actiontrigger
2016-01-27 17:45:24,020 139834305980160 DEBUG log [-] verified trigger and formulated TriggerDB=TriggerTypeDB(description="Trigger encapsulating the completion of an action execution.", id=None, name="st2.generic.actiontrigger", pac
2016-01-27 17:45:24,027 139834305980160 INFO log [-] Registered trigger: st2.generic.actiontrigger.
2016-01-27 17:45:24,128 139834305980160 ERROR log [-] Conflict while trying to save in DB.
Traceback (most recent call last):
  File "/usr/share/python/st2/local/lib/python2.7/site-packages/st2common/persistence/base.py", line 155, in add_or_update
    model_object = cls._get_impl().add_or_update(model_object)
  File "/usr/share/python/st2/local/lib/python2.7/site-packages/st2common/models/db/__init__.py", line 174, in add_or_update
    instance.save()
  File "/usr/share/python/st2/local/lib/python2.7/site-packages/mongoengine/document.py", line 286, in save
    raise NotUniqueError(message % unicode(err))
NotUniqueError: Tried to save duplicate unique keys (E11000 duplicate key error index: st2.trigger_d_b.$pack_1_name_1  dup key: { : "core", : "st2.generic.actiontrigger" })
2016-01-27 17:45:24,131 139834305980160 ERROR log [-] Shadow trigger creation of "core.st2.generic.actiontrigger" failed with uniqueness conflict.
Traceback (most recent call last):
  File "/usr/share/python/st2/local/lib/python2.7/site-packages/st2common/triggers.py", line 40, in _register_internal_trigger_type
    trigger_db = create_shadow_trigger(trigger_type_db)
  File "/usr/share/python/st2/local/lib/python2.7/site-packages/st2common/services/triggers.py", line 253, in create_shadow_trigger
    return create_or_update_trigger_db(trigger)
  File "/usr/share/python/st2/local/lib/python2.7/site-packages/st2common/services/triggers.py", line 154, in create_or_update_trigger_db
    trigger_db = Trigger.add_or_update(trigger_db)
  File "/usr/share/python/st2/local/lib/python2.7/site-packages/st2common/persistence/base.py", line 167, in add_or_update
    model_object=model_object)
StackStormDBObjectConflictError: Tried to save duplicate unique keys (E11000 duplicate key error index: st2.trigger_d_b.$pack_1_name_1  dup key: { : "core", : "st2.generic.actiontrigger" })

Some error in logs, it happens. So st2auth died on this.
It's interesting why service st2auth is reported to be running.
And why it couldn't restart st2auth + gunicorn by startup service.


Now I know, will test locally by killing manually those pink unicorns and see if they can still revive.

from st2-packages.

arm4b avatar arm4b commented on June 12, 2024

Yeah, so those startup scripts need adjustments too. If child process died, service is considered as OK (running) and no restart/revive for dead (what is wrong).

$ service st2api status
[ ok ] st2api is running.

$ ps -aux | grep st2api
st2        1691  0.6  0.0 121416 40868 ?        S    19:56   0:00 /usr/share/python/st2/bin/python /usr/share/python/st2/bin/gunicorn_pecan /usr/share/python/st2/lib/python2.7/site-packages/st2api/gunicorn_config.py -k eventlet -b 127.0.0.1:9101 --threads 10 --workers 1
st2        1715  0.3  0.0 138244 45080 ?        S    19:56   0:00 /usr/share/python/st2/bin/python /usr/share/python/st2/bin/gunicorn_pecan /usr/share/python/st2/lib/python2.7/site-packages/st2api/gunicorn_config.py -k eventlet -b 127.0.0.1:9101 --threads 10 --workers 1
root       1775  0.0  0.0  11112   712 ?        S+   19:57   0:00 grep st2api

$ kill 1691

# no processes running for service
$ ps -aux | grep st2api

# service status is OK
$ service st2api status
[ ok ] st2api is running.

Will look tomorrow.

from st2-packages.

dennybaa avatar dennybaa commented on June 12, 2024

I confirm same error on local test runs https://gist.github.com/16950b151fc4f85a85ad

StackStormDBObjectConflictError: Tried to save duplicate unique keys (E11000 duplicate key error collection: st2.trigger_d_b index: pack_1_name_1 dup key: { : "core", : "st2.key_value_pair.value_change" })

from st2-packages.

arm4b avatar arm4b commented on June 12, 2024

FYI it's good to know also that after service restart DB error is gone, so service starts normally at the 2nd run (probably because DB error appears randomly, don't really know).

It's clear then why we had this error under the wheezy, jessie, and never had it for trusty.
Because trusty uses upstart with respawn which restarts process on hang down automatically.

from st2-packages.

lakshmi-kannan avatar lakshmi-kannan commented on June 12, 2024

StackStorm/st2#2435 Let's see if that does it.

from st2-packages.

lakshmi-kannan avatar lakshmi-kannan commented on June 12, 2024

We (@lakshmi-kannan and @enykeev) couldn't repro this in last few builds. So we shall call this done until we can repro again.

from st2-packages.

arm4b avatar arm4b commented on June 12, 2024

Yesterday @dennybaa could reproduce it not on CircleCI, but locally and had the same DB error, so it's not infra problem.
The bug is still there.

We can't close it, because it fails the builds. We can theoretically revert all gunicorn changes related to st2auth and st2api just to see if problem still persist (as part of investigation).
It started to appear only after moving to gunicorn.


Answering questions from Slack. DB is clean every time you start CircleCI build, everything is fresh.

I'll implement #104 later that will help us to investigate this problem better.

from st2-packages.

arm4b avatar arm4b commented on June 12, 2024

Debugging help

@dennybaa @lakshmi-kannan I've implemented #104, now you can see logs via CircleCI Artifacts:
dvmzm2q 1

That helped to further debug our problem. You can find Artifacts section in CircleCI UI just near the Debug SSH button.

The upstream st2 fix

@lakshmi-kannan @manasdk https://github.com/StackStorm/st2/pull/2435/files I can confirm that fix localized the problem but we still have root issue in there.

Because of that st2 upstream fix, process is not dying anymore. See log (it's from CircleCI artifacts):

So the process never dies (service is running), but it reloads when that DB error is catched.

Questions

It's good that we now have green builds and services are running. But, to clarify what we should do else here, a few questions:

  • Is that OK for API service to be reloaded every few cycles?
  • Are we going to leave it as is and close the issue?
  • Are we going to investigate it further?

Just FYI: it appeared only after moving to gunicorn.
Did you have that error before? Make sure you don't get confused with different DEBUG errors that always were here like: dup key: { : "admin" }).

from st2-packages.

arm4b avatar arm4b commented on June 12, 2024

Upstream st2 fix: StackStorm/st2#2437 in WIP

from st2-packages.

arm4b avatar arm4b commented on June 12, 2024

Closed by StackStorm/st2#2437 gunicorn fix, which is backported into 1.3 st2 version.

@enykeev FYI also, it would be interesting that bug we experienced was upstream st2 bug.

from st2-packages.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.