Comments (17)
Sure, I will create a PR tomorrow :)
from spiff-arena.
i guess you modified the docker compose file at spiff-arena/docker-compose.yml for postgres? here's another version that is known to work with postgres: https://github.com/sartography/arena-compose-postgres/. does it work for you? we're happy to add configuration options if they are needed, but we've seen postgres work without this option.
from spiff-arena.
I didn't notice that there was a docker compose file for postgres.
I'm indeed deploying arena in k8s, I translated the docker compose file into k8s resources (deployment, configmap, etc.), provided an entrypoint to install libpq5
on container startup (so I don't have to build a custom image and push to some registry).
Here is the entrypoint:
#!/bin/bash
set -e
apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends libpq5
exec /app/bin/boot_server_in_docker
Here is environment variables set on spiffworkflow-backend container:
SPIFFWORKFLOW_BACKEND_ENV: production # NOT SURE what values are allowed here?
FLASK_DEBUG: "0"
FLASK_SESSION_SECRET_KEY: "[REDACTED]"
SPIFFWORKFLOW_BACKEND_URL: "[REDACTED]"
SPIFFWORKFLOW_BACKEND_BPMN_SPEC_ABSOLUTE_DIR: /app/process_models
SPIFFWORKFLOW_BACKEND_CONNECTOR_PROXY_URL: http://spiffworkflow-connector
SPIFFWORKFLOW_BACKEND_DATABASE_TYPE: postgres
SPIFFWORKFLOW_BACKEND_DATABASE_URI: postgresql://USER:PASS@HOST/DB
SPIFFWORKFLOW_BACKEND_LOAD_FIXTURE_DATA: "false"
SPIFFWORKFLOW_BACKEND_LOG_LEVEL: "debug"
SPIFFWORKFLOW_BACKEND_OPEN_ID_CLIENT_ID: "[REDACTED]"
SPIFFWORKFLOW_BACKEND_OPEN_ID_CLIENT_SECRET_KEY: "[REDACTED]"
SPIFFWORKFLOW_BACKEND_OPEN_ID_SERVER_URL: "[REDACTED]" # I have a local deployment of dex: https://github.com/dexidp/dex
#SPIFFWORKFLOW_BACKEND_PERMISSIONS_FILE_NAME: example.yml
SPIFFWORKFLOW_BACKEND_PERMISSIONS_FILE_ABSOLUTE_PATH: /app/permissions.yaml
SPIFFWORKFLOW_BACKEND_PORT: "8000"
SPIFFWORKFLOW_BACKEND_RUN_BACKGROUND_SCHEDULER_IN_CREATE_APP: "true"
SPIFFWORKFLOW_BACKEND_UPGRADE_DB: "true"
SPIFFWORKFLOW_BACKEND_URL_FOR_FRONTEND: "[REDACTED]"
FORWARDED_ALLOW_IPS: "*"
I have examined arena-compose-postgres, and seems there are no difference in env variable settings, except it builds an image with libpq-dev (I think libpq5 is enough, libpq-dev is for compiling softwares which depends on libpq, e.g. when installing psycopg2).
I will try arena-compose-postgres locally, and will report back later (since the problem happened irregularly, I have to wait some time to see if that happens).
from spiff-arena.
BTW, the documentation says the backend can be seperated into three deployment, API, Background, and Celery Worker, but I didn't find any hints on how to deploy like this.
I searched scripts in spiffworkflow-backend/bin/
, found a start_celery_worker
which I think is to start Celery Worker, but didn't find any script to start Backend. Could you please provide some instructions?
from spiff-arena.
There is another problem. I'm using dex as openid connect provider. It works fine, but after some time (usually over night), it will fail to login, the /v1.0/login_return
interface returns:
{
"error_code": "invalid_token",
"message": "Cannot decode token.",
"status_code": 401
}
If I restart the spiffworkflow-backend container, it works again. I don't know the details behind openid auth flow, not sure how to debug this problem, can you provide some instructions? Is it caused by misconfiguration?
from spiff-arena.
ok, will be interested to see if the arena compose postgres replicates the issue.
here's a command to start the background container, aka apscheduler: ["./bin/start_blocking_apscheduler"]
when you get Cannot decode token
, the logs in the API container might be interesting. that's interesting that bouncing the spiff container fixes it.
from spiff-arena.
I did some search in the source code, the error should be returned from here:
def _get_decoded_token(token: str) -> dict:
try:
decoded_token: dict = AuthenticationService.parse_jwt_token(_get_authentication_identifier_from_request(), token)
except Exception as e:
current_app.logger.warning(f"Received exception when attempting to decode token: {e.__class__.__name__}: {str(e)}")
AuthenticationService.set_user_has_logged_out()
raise ApiError(error_code="invalid_token", message="Cannot decode token.", status_code=401) from e
...
Then I searched the log, found this:
{"level": "WARNING", "message": "Received exception when attempting to decode token: StopIteration: ", "loggerName": "spiffworkflow_backend", "processName": "MainProcess", "processID": 149, "threadName": "ThreadPoolExecutor-1_1", "threadID": 139777974761152, "timestamp": "2024-07-09T10:35:14.537Z"}
{"level": "WARNING", "message": "Received exception: ApiError: Cannot decode token.. . Since we do not want this particular exception in sentry, we cannot use logger.exception or logger.error, so there will be no backtrace. see api_error.py", "loggerName": "spiffworkflow_backend", "processName": "MainProcess", "processID": 149, "threadName": "ThreadPoolExecutor-1_1", "threadID": 139777974761152, "timestamp": "2024-07-09T10:35:14.537Z"}
The StopIteration
exception should be throwed from parse_jwt_token
, but I have not idea how this exception can be raised. It is usually used inside iteration and almost never exposed outside of iteration.
I changed that logger.warning
to logger.error(..., stack_info=True, exc_info=True)
, and will check if any interesting info will be logged the next time it happens.
from spiff-arena.
The first problem, psycopg2.OperationalError
, I believe it also affects arena-compose-postgres.
I didn't wait for the error happen, just use psql to kill connections from db server side:
SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE pg_stat_activity.datname = 'spiffworkflow' AND pid <> pg_backend_pid();
Then visit arena frontend, and the error occurs.
In contrast, I made another deployment, modified /app/src/spiffworkflow_backend/config/__init__.py
, manually added pool_pre_ping
param:
app.config["SQLALCHEMY_ENGINE_OPTIONS"]["pool_pre_ping"] = True
and then kill connections from db side, everything works fine.
SQLAlchemy maintains a connection pool, whenever a conn is required, sqlalchemy checks out a conn from pool, if the conn was terminated for some reason (usually because of idle timeout, which I think not only postgres has, mysql by default has an 8 hour idle timeout), then error occurs. If pool_pre_ping
is configured, then sqlalchemy will issue SELECT 1
to check if the conn is healthy on every checkout, and checkout another one if unhealthy.
So I believe pool_pre_ping
is a must have option, especially for low traffic sites.
from spiff-arena.
I’m convinced, thank you for the research and doing the experiment. If you want to add the config option (maybe to default.py), we’d gladly accept a PR.
from spiff-arena.
I configured local sentry and caught the StopIteration
exception:
key_id
does not exist in any of jwks_configs["keys"]
, so next()
raised StopIteration.
I'm really not familiar with openid authentication internals, so I cannot understand the problem. I just tried to follow the code.
@classmethod
def jwks_public_key_for_key_id(cls, authentication_identifier: str, key_id: str) -> dict:
jwks_uri = cls.open_id_endpoint_for_name("jwks_uri", authentication_identifier)
jwks_configs = cls.get_jwks_config_from_uri(jwks_uri)
json_key_configs: dict = next(jk for jk in jwks_configs["keys"] if jk["kid"] == key_id)
return json_key_configs
It tries to get key config for given key_id
, from openid provider server (which I guessed from the code), and get_jwks_config_from_uri()
maintained a cache, and will made request to openid server only if the cache is empty.
Here, I believe the cache is not empty, so the code is looking for key_id
in cached configs, and failed to find one.
I manually made a request to jwks_uri (https://dex.my.domain/keys
), it returns 5 keys, I then compared with the 5 keys logged in sentry, only 1 match, the other 4 are different.
So, I guest dex has a key rotation mechanism, each key is only valid for a relatively short period. I then did some googling with keyword jwks + rotation
, found this answer, it indicates that the provider can revoke a key at any time for any reason.
So I believe, in spiffworkflow, either do not use cache for jwks, or should cache keys by key_id
, instead of cache a provider's keys by jwks_uri
.
from spiff-arena.
we updated the jwks handling to hopefully handle this key rotation. it's in main. are you using :latest tags? if so, since we only bump :latest on release, perhaps you could switch to the latest timestamped main tag of backend based on this commit: https://github.com/sartography/spiff-arena/actions/runs/9880631567
from spiff-arena.
I have submitted a PR to add pool_pre_ping
option.
BTW, do you mind to install libpq5
in spiffworkflow-backend
docker image? So there is no need to build a dedicated image for postgres use case?
from spiff-arena.
@StephenPCG sure, we added libpq5
to the backend docker image.
from spiff-arena.
@burnettk Wow! Thank you!
However, from the commit diff 42a3110 , it seems only two lines of comments were added, libpq5
was not installed 😂
from spiff-arena.
@StephenPCG lol, yes; i've found it's harder to add new bugs if you only add comments. :D
thanks for catching! db8433d actually adds the package.
from spiff-arena.
I works like a charm now! Thank you very much!
from spiff-arena.
@StephenPCG great to hear. it's been lovely working with you. i hope you will keep us informed on your progress on here or discord.
from spiff-arena.
Related Issues (20)
- PI Migration - Revert missing HOT 2
- Call Activity Instance diagram related error HOT 3
- 'View process instance at the time when this task was active' feature is not working HOT 4
- Errors during model implementation are displayed differently for different properties HOT 1
- PI Migration - migration not allowed HOT 11
- PI Migration - Odd behaviour with Guest task HOT 1
- Service Task Retry Logic HOT 1
- Displaying old graphs for non-upgraded instances
- Set Permissions / Multi-Instance Support HOT 1
- Adding new users to in-flight instances HOT 16
- Some other RBAC HOT 7
- PI Migration - Active Call activity/Sub process - Pre script/Post script update (Improvement) HOT 1
- more context for call activity navigation
- Handling of BPMN Error Events HOT 10
- Finish typeahead updates HOT 1
- Call Activity - Reset process here
- 'ID must be unique' error message HOT 4
- Spiff github app to show before and after visuals on PRs
- Authorized Party Check does not make sense HOT 4
- Not able to docker pull the image HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spiff-arena.