Comments (12)
Stats:
st2auth
failed onwheezy
: https://circleci.com/gh/StackStorm/st2/485st2api
failed onwheezy
: https://circleci.com/gh/StackStorm/st2-packages/287st2auth
failed onwheezy
: https://circleci.com/gh/StackStorm/st2-packages/289st2api
failed onel6
: https://circleci.com/gh/StackStorm/st2-packages/291 (even after liftingST2_WAITFORSTART
timeouts to15sec
)
from st2-packages.
https://circleci.com/gh/StackStorm/st2-packages/291
On el6
we had some logs:
last 20 lines from log file of service st2api
>>>
2016-01-27 15:35:19,160 140432054114048 ERROR log [-] Conflict while trying to save in DB.
Traceback (most recent call last):
File "/usr/share/python/st2/lib/python2.7/site-packages/st2common/persistence/base.py", line 155, in add_or_update
model_object = cls._get_impl().add_or_update(model_object)
File "/usr/share/python/st2/lib/python2.7/site-packages/st2common/models/db/__init__.py", line 174, in add_or_update
instance.save()
File "/usr/share/python/st2/lib/python2.7/site-packages/mongoengine/document.py", line 286, in save
raise NotUniqueError(message % unicode(err))
NotUniqueError: Tried to save duplicate unique keys (E11000 duplicate key error index: st2.trigger_d_b.$pack_1_name_1 dup key: { : "core", : "st2.sensor.process_exit" })
2016-01-27 15:35:19,170 140432054114048 ERROR log [-] Shadow trigger creation of "core.st2.sensor.process_exit" failed with uniqueness conflict.
Traceback (most recent call last):
File "/usr/share/python/st2/lib/python2.7/site-packages/st2common/triggers.py", line 40, in _register_internal_trigger_type
trigger_db = create_shadow_trigger(trigger_type_db)
File "/usr/share/python/st2/lib/python2.7/site-packages/st2common/services/triggers.py", line 253, in create_shadow_trigger
return create_or_update_trigger_db(trigger)
File "/usr/share/python/st2/lib/python2.7/site-packages/st2common/services/triggers.py", line 154, in create_or_update_trigger_db
trigger_db = Trigger.add_or_update(trigger_db)
File "/usr/share/python/st2/lib/python2.7/site-packages/st2common/persistence/base.py", line 167, in add_or_update
model_object=model_object)
StackStormDBObjectConflictError: Tried to save duplicate unique keys (E11000 duplicate key error index: st2.trigger_d_b.$pack_1_name_1 dup key: { : "core", : "st2.sensor.process_exit" })
When st2api
failed to start. Not sure if it's real reason, probably another ghost bug. Will investigate more.
from st2-packages.
Ok, here: https://circleci.com/gh/armab/st2-packages/442 st2auth
is reported to be not listening.
It's wheezy
.
Was lucky to catch that via CircleCI SSH debug.
Service st2auth
reports that everything is running:
root@06c3f4732296:/var/log# service st2auth status
[ ok ] st2auth is running.
No processes really for st2auth service at all:
root@06c3f4732296:/var/log# ps -aux | grep st2auth
warning: bad ps syntax, perhaps a bogus '-'?
See http://gitorious.org/procps/procps/blobs/master/Documentation/FAQ
root 1459 0.0 0.0 6308 596 ? S+ 18:00 0:00 grep st2auth
/var/log/st2auth.log
:
2016-01-27 17:45:23,660 139834305980160 INFO log [-] Connecting to database "st2" @ "172.17.0.1:27017" as user "None".
2016-01-27 17:45:23,666 139834305980160 DEBUG log [-] Ensuring database indexes...
2016-01-27 17:45:23,766 139834305980160 DEBUG log [-] Ensuring indexes for model "UserDB"...
2016-01-27 17:45:23,785 139834305980160 DEBUG log [-] Ensuring indexes for model "TokenDB"...
2016-01-27 17:45:23,787 139834305980160 DEBUG log [-] Ensuring indexes for model "ApiKeyDB"...
2016-01-27 17:45:23,790 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionDB"...
2016-01-27 17:45:23,807 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionExecutionDB"...
2016-01-27 17:45:23,845 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionExecutionStateDB"...
2016-01-27 17:45:23,854 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionAliasDB"...
2016-01-27 17:45:23,863 139834305980160 DEBUG log [-] Ensuring indexes for model "LiveActionDB"...
2016-01-27 17:45:23,878 139834305980160 DEBUG log [-] Ensuring indexes for model "RunnerTypeDB"...
2016-01-27 17:45:23,881 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionAliasDB"...
2016-01-27 17:45:23,882 139834305980160 DEBUG log [-] Ensuring indexes for model "KeyValuePairDB"...
2016-01-27 17:45:23,897 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionExecutionDB"...
2016-01-27 17:45:23,898 139834305980160 DEBUG log [-] Ensuring indexes for model "ActionExecutionStateDB"...
2016-01-27 17:45:23,898 139834305980160 DEBUG log [-] Ensuring indexes for model "LiveActionDB"...
2016-01-27 17:45:23,898 139834305980160 DEBUG log [-] Ensuring indexes for model "PolicyTypeDB"...
2016-01-27 17:45:23,904 139834305980160 DEBUG log [-] Ensuring indexes for model "PolicyDB"...
2016-01-27 17:45:23,909 139834305980160 DEBUG log [-] Ensuring indexes for model "RuleDB"...
2016-01-27 17:45:23,924 139834305980160 DEBUG log [-] Ensuring indexes for model "RunnerTypeDB"...
2016-01-27 17:45:23,925 139834305980160 DEBUG log [-] Ensuring indexes for model "SensorTypeDB"...
2016-01-27 17:45:23,934 139834305980160 DEBUG log [-] Ensuring indexes for model "TriggerTypeDB"...
2016-01-27 17:45:23,953 139834305980160 DEBUG log [-] Ensuring indexes for model "TriggerDB"...
2016-01-27 17:45:23,954 139834305980160 DEBUG log [-] Ensuring indexes for model "TriggerInstanceDB"...
2016-01-27 17:45:23,956 139834305980160 DEBUG log [-] Registering exchanges...
2016-01-27 17:45:23,984 139834305980160 DEBUG connection [-] Start from server, version: 0.9, properties: {u'information': u'Licensed under the MPL. See http://www.rabbitmq.com/', u'product': u'RabbitMQ', u'copyright': u'Copyright
2016-01-27 17:45:23,986 139834305980160 DEBUG connection [-] Open OK!
2016-01-27 17:45:23,986 139834305980160 DEBUG channel [-] using channel_id: 1
2016-01-27 17:45:23,989 139834305980160 DEBUG channel [-] Channel open
2016-01-27 17:45:23,990 139834305980160 DEBUG log [-] registered exchange st2.execution.
2016-01-27 17:45:23,992 139834305980160 DEBUG log [-] registered exchange st2.liveaction.
2016-01-27 17:45:23,994 139834305980160 DEBUG log [-] registered exchange st2.trigger.
2016-01-27 17:45:23,996 139834305980160 DEBUG log [-] registered exchange st2.trigger_instances_dispatch.
2016-01-27 17:45:23,997 139834305980160 DEBUG log [-] registered exchange st2.sensor.
2016-01-27 17:45:24,006 139834305980160 DEBUG channel [-] Closed channel #1
2016-01-27 17:45:24,007 139834305980160 DEBUG log [-] Registering internal trigger: st2.generic.actiontrigger
2016-01-27 17:45:24,020 139834305980160 DEBUG log [-] verified trigger and formulated TriggerDB=TriggerTypeDB(description="Trigger encapsulating the completion of an action execution.", id=None, name="st2.generic.actiontrigger", pac
2016-01-27 17:45:24,027 139834305980160 INFO log [-] Registered trigger: st2.generic.actiontrigger.
2016-01-27 17:45:24,128 139834305980160 ERROR log [-] Conflict while trying to save in DB.
Traceback (most recent call last):
File "/usr/share/python/st2/local/lib/python2.7/site-packages/st2common/persistence/base.py", line 155, in add_or_update
model_object = cls._get_impl().add_or_update(model_object)
File "/usr/share/python/st2/local/lib/python2.7/site-packages/st2common/models/db/__init__.py", line 174, in add_or_update
instance.save()
File "/usr/share/python/st2/local/lib/python2.7/site-packages/mongoengine/document.py", line 286, in save
raise NotUniqueError(message % unicode(err))
NotUniqueError: Tried to save duplicate unique keys (E11000 duplicate key error index: st2.trigger_d_b.$pack_1_name_1 dup key: { : "core", : "st2.generic.actiontrigger" })
2016-01-27 17:45:24,131 139834305980160 ERROR log [-] Shadow trigger creation of "core.st2.generic.actiontrigger" failed with uniqueness conflict.
Traceback (most recent call last):
File "/usr/share/python/st2/local/lib/python2.7/site-packages/st2common/triggers.py", line 40, in _register_internal_trigger_type
trigger_db = create_shadow_trigger(trigger_type_db)
File "/usr/share/python/st2/local/lib/python2.7/site-packages/st2common/services/triggers.py", line 253, in create_shadow_trigger
return create_or_update_trigger_db(trigger)
File "/usr/share/python/st2/local/lib/python2.7/site-packages/st2common/services/triggers.py", line 154, in create_or_update_trigger_db
trigger_db = Trigger.add_or_update(trigger_db)
File "/usr/share/python/st2/local/lib/python2.7/site-packages/st2common/persistence/base.py", line 167, in add_or_update
model_object=model_object)
StackStormDBObjectConflictError: Tried to save duplicate unique keys (E11000 duplicate key error index: st2.trigger_d_b.$pack_1_name_1 dup key: { : "core", : "st2.generic.actiontrigger" })
Some error in logs, it happens. So st2auth
died on this.
It's interesting why service st2auth
is reported to be running.
And why it couldn't restart st2auth + gunicorn
by startup service.
Now I know, will test locally by killing manually those pink unicorns and see if they can still revive.
from st2-packages.
Yeah, so those startup scripts need adjustments too. If child process died, service is considered as OK (running) and no restart/revive for dead (what is wrong).
$ service st2api status
[ ok ] st2api is running.
$ ps -aux | grep st2api
st2 1691 0.6 0.0 121416 40868 ? S 19:56 0:00 /usr/share/python/st2/bin/python /usr/share/python/st2/bin/gunicorn_pecan /usr/share/python/st2/lib/python2.7/site-packages/st2api/gunicorn_config.py -k eventlet -b 127.0.0.1:9101 --threads 10 --workers 1
st2 1715 0.3 0.0 138244 45080 ? S 19:56 0:00 /usr/share/python/st2/bin/python /usr/share/python/st2/bin/gunicorn_pecan /usr/share/python/st2/lib/python2.7/site-packages/st2api/gunicorn_config.py -k eventlet -b 127.0.0.1:9101 --threads 10 --workers 1
root 1775 0.0 0.0 11112 712 ? S+ 19:57 0:00 grep st2api
$ kill 1691
# no processes running for service
$ ps -aux | grep st2api
# service status is OK
$ service st2api status
[ ok ] st2api is running.
Will look tomorrow.
from st2-packages.
I confirm same error on local test runs https://gist.github.com/16950b151fc4f85a85ad
StackStormDBObjectConflictError: Tried to save duplicate unique keys (E11000 duplicate key error collection: st2.trigger_d_b index: pack_1_name_1 dup key: { : "core", : "st2.key_value_pair.value_change" })
from st2-packages.
FYI it's good to know also that after service restart DB error is gone, so service starts normally at the 2nd run (probably because DB error appears randomly, don't really know).
It's clear then why we had this error under the wheezy
, jessie
, and never had it for trusty
.
Because trusty
uses upstart with respawn which restarts process on hang down automatically.
from st2-packages.
StackStorm/st2#2435 Let's see if that does it.
from st2-packages.
We (@lakshmi-kannan and @enykeev) couldn't repro this in last few builds. So we shall call this done until we can repro again.
from st2-packages.
Yesterday @dennybaa could reproduce it not on CircleCI, but locally and had the same DB error, so it's not infra problem.
The bug is still there.
We can't close it, because it fails the builds. We can theoretically revert all gunicorn
changes related to st2auth
and st2api
just to see if problem still persist (as part of investigation).
It started to appear only after moving to gunicorn
.
Answering questions from Slack. DB is clean every time you start CircleCI build, everything is fresh.
I'll implement #104 later that will help us to investigate this problem better.
from st2-packages.
Debugging help
@dennybaa @lakshmi-kannan I've implemented #104, now you can see logs via CircleCI Artifacts:
That helped to further debug our problem. You can find Artifacts
section in CircleCI UI just near the Debug SSH
button.
The upstream st2
fix
@lakshmi-kannan @manasdk https://github.com/StackStorm/st2/pull/2435/files I can confirm that fix localized the problem but we still have root issue in there.
Because of that st2
upstream fix, process is not dying anymore. See log (it's from CircleCI artifacts):
- https://circle-artifacts.com/gh/StackStorm/st2-packages/326/artifacts/0/home/ubuntu/packages/jessie/log/st2/st2api.log
L104
: Our famousERROR
:NotUniqueError: Tried to save duplicate unique keys
(previously process died here)L113
: NewWARNING
with your fix, catching that Exception from: https://github.com/StackStorm/st2/pull/2435/files, so the process never dies now.L124
: The most important, entire service is reloaded from this point.
So the process never dies (service is running), but it reloads
when that DB error is catched.
Questions
It's good that we now have green builds and services are running. But, to clarify what we should do else here, a few questions:
- Is that OK for API service to be reloaded every few cycles?
- Are we going to leave it as is and close the issue?
- Are we going to investigate it further?
Just FYI: it appeared only after moving to gunicorn
.
Did you have that error before? Make sure you don't get confused with different DEBUG
errors that always were here like: dup key: { : "admin" })
.
from st2-packages.
Upstream st2
fix: StackStorm/st2#2437 in WIP
from st2-packages.
Closed by StackStorm/st2#2437 gunicorn fix, which is backported into 1.3
st2
version.
@enykeev FYI also, it would be interesting that bug we experienced was upstream st2
bug.
from st2-packages.
Related Issues (20)
- Bug in echo command on st2bootstrap-deb.sh HOT 4
- EL8: convert RabbitMQ install to EPEL rpm when released
- 'repoquery -y' fails on EL6 and EL7 HOT 2
- Move MongoDB 3.4 -> 4.0 for EL7/U16 due to EOL
- packagingrunner failed at ppc linux - exec user process caused "exec format error" HOT 2
- EWC OSS: RBAC assignments and definitions integration HOT 2
- (action file most likely doesn't exist or contains invalid syntax): No module named cx_Oracle HOT 1
- Move packaging into the st2 repo
- Remove the st2resultstracker service from packaging
- Migrate Ubuntu 16.04 LTS (Xenial) from py2 to py3 HOT 1
- The st2 port is configured in the config file. But directly fixed in files such as service, socket, client, etc. HOT 2
- Installing st2 v3.4dev does not wait for the y/n input to confirm the python3.6 installation form the deadsnakes repository
- Use bash -n to check scripts
- rabbitMQ version on EL8 hardcoded HOT 3
- Review build process for ldap and rbac HOT 2
- Drop Ubuntu 18.04 support from packaging
- Add RockyLinux 9 (RHEL9)
- Drop CentOS7 (RHEL7) support
- Add Ubuntu 22.04 support
- Remove all references of CentOS
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from st2-packages.