nicholaschiasson / ngx_upstream_jdomain Goto Github PK
View Code? Open in Web Editor NEWAn asynchronous domain name resolution module for nginx upstream.
License: BSD 2-Clause "Simplified" License
An asynchronous domain name resolution module for nginx upstream.
License: BSD 2-Clause "Simplified" License
Currently, we have hardcoded the cases of NGX_RESOLVE_FORMERR
and NGX_RESOLVE_NXDOMAIN
as errors for which we always use the fallback. It would be more interesting to allow the configuration to select which resolve errors should always use the fallback. This could possibly be an enhancement of the strict
attribute.
See here for list of errors: http://lxr.nginx.org/source/xref/nginx/src/core/ngx_resolver.h#27
#define NGX_RESOLVE_FORMERR 1
#define NGX_RESOLVE_SERVFAIL 2
#define NGX_RESOLVE_NXDOMAIN 3
#define NGX_RESOLVE_NOTIMP 4
#define NGX_RESOLVE_REFUSED 5
#define NGX_RESOLVE_TIMEDOUT NGX_ETIMEDOUT
Add a linting job and integrate a style validation process with the automated workflow to fail workflows which do not pass.
What versions of Nginx support this module? On the latest version (v1.21.6
) I keep seeing the following error:
module "/etc/nginx/modules/ngx_http_upstream_jdomain_module.so" version 1021006 instead of 1016001 in /etc/nginx/nginx.conf:2
I'm not sure how best to troubleshoot this, or if support for this module only runs up to a certain older version of nginx. I've tested this and it works currently on v1.18.0
.
When applying the strict
fallback, we only check if the upstream has other servers but not if any of those other servers are actually up. This could be problematic if any go down (if there are only bad jdomain
servers for example).
To solve this, instead of just looking at the count of servers, we should quickly loop through the servers and check for server->down
, breaking the moment we find one with false
.
When using the DOMAIN UPSTREAM module, the VTS Module UPSTREAM counter is all 0.
Can this problem be corrected by adjusting any parameters?
1.nginx_Config
upstream backend {
jdomain x.x.net port=443;
keepalive 300;
}
2.nginx-module-vts status
{
"upstreamZones": {
"https_backend": [
{
"server": "x.x.x.x:443",
"requestCounter": 0,
"inBytes": 0,
"outBytes": 0,
"responses": {
"1xx": 0,
"2xx": 0,
"3xx": 0,
"4xx": 0,
"5xx": 0
},
"requestMsecCounter": 0,
"requestMsec": 0,
"requestMsecs": {
"times": [],
"msecs": []
},
"requestBuckets": {
"msecs": [],
"counters": []
},
"responseMsecCounter": 0,
"responseMsec": 0,
"responseMsecs": {
"times": [],
"msecs": []
},
"responseBuckets": {
"msecs": [],
"counters": []
},
"weight": 1,
"maxFails": 1,
"failTimeout": 10,
"backup": false,
"down": false,
"overCounts": {
"maxIntegerSize": 18446744073709552000,
"requestCounter": 0,
"inBytes": 0,
"outBytes": 0,
"1xx": 0,
"2xx": 0,
"3xx": 0,
"4xx": 0,
"5xx": 0,
"requestMsecCounter": 0,
"responseMsecCounter": 0
}
}
],
Not 100% sure how to accomplish this yet, but the goal would be to be able to do something like the following:
upstream test {
server 127.0.0.1:11111 backup;
jdomain this-is-${some_variable}-an-example.com;
}
This domain name resolution would be guaranteed to fail on load, but I believe it should work at runtime using the context of each request. The reason this becomes unfeasible but not impossible is due to state management. I believe we would need a growable hash of the state object per evaluation of the domain name with its variables.
Some documentation on how to achieve this:
I think we should try to prefer this newer health check module. It is noted that it is still in development, however, it has support for prometheus output and also nginx stream module.
Update the ./.github/actions/nginx-module-toolbox/Dockerfile
and ./scripts/build.sh
files to make nginx build against this healthcheck module rather than the current one. One thing to be careful of is the patches. I know this new repo uses a weird naming convention for those patches...
In t/004.compatibility_nginx.t
, test 3 is supposed to use jdomain
with the least_conn
algorithm, but the test is actually quite weak, making it not obvious to determine is the least_conn
algorithm is even properly working.
The test only makes single connections at a time to the test nginx server, resulting in the load balancer effectively using round robin, which is indeed a valid behaviour of least_conn
, but not representative of a live scenario where the peer is actually chosen based on least connected peers.
This test case should be improved to simulated many simultaneous connections to the upstream block so we can really see least_conn
in action with jdomain
upstreams.
The semver version bump on merge requests is always a minor version (the default) due to the misuse of the github.head_ref
variable in the workflows of merge commits on master.
github.head_ref
is only useful on PR workflows, so we need to find a way to get the name of the ref that was merged into master from a workflow triggered by merging a PR to master... I would like to do it in a robust way if possible, otherwise I suppose there's no issue just using a little scripting and parsing git log
to determine what kind of semver bump to apply 🤷.
Inspired by wdaike/ngx_upstream_jdomain#10.
Contrary to that pull request however, I believe this should only apply to the fixed length switch attributes like retry_off
and strict
. It's an unnecessary check for the other attributes.
Splitting the image out of this repository would speed up the actions workflow since it wouldn't need to build per job.
We currently run are using ngx_upstream_jdomain
release 1.4.0 as a forwarding proxy for talking to some downstream services that have rotating IP addresses. We run this in Kubernetes clusters in multiple cloud regions. When IP addresses rotate for a subset of our downstreams (for example today in 2/3 regions in 5/24 pods) we see
2023/08/29 08:32:14 [error] 12#12: ngx_http_upstream_jdomain_module: resolver failed, "www.example.com" (110: Operation timed out)
This issue does not recover itself, and we stick on a historic IP address for the service. Our jdomain configs look like
resolver 8.8.8.8 8.8.4.4;
upstream example{
keepalive 32;
keepalive_requests 100;
keepalive_timeout 60s;
jdomain www.example.com port=443 interval=60;
}
We've experienced the issue on both nginx-1.20.1
and 1.23.3
.
Any help on recommended next steps or debugging would be appreciated. The error is a timeout error when talking to the DNS server, so I did wonder does ngx_upstream_jdomain
try and re-establish a connection to DNS servers when there's a connection issue?
The fallback becomes the peer to use in the case of DNS resolution failure. In many cases, this can be due to a sporadic failure, thus it becomes undesirable to use the fallback address for the full interval duration. Allowing one or a few errors is acceptable, but allowing errors for several seconds, possibly minutes, due to an interval configured to match a DNS TTL, this can be very bad.
In the case where the fallback address is being used, we should attempt the DNS resolution until it succeeds.
After reading some more documentation, it seems it may be possible to trigger workflows from comments on pull requests.
We could add pull_request_comment
to the list of on
events, and then validate the comment body matches some string like "retry" or something like that.
Hello,
There is a bug in how resolve.status
is set.
When nginx actually does the lookup, ngx_http_upstream_jdomain_resolve_handler
is called from polled events which works fine.
But when TTL is set to automatic (no valid
argument for resolver
directive) and if it is very high (like 30+ seconds, maybe even less), ngx_http_upstream_jdomain_resolve_handler
is called from cached values in ngx_resolve_name_locked
at line 669.
As you can see the function stack in bottom left, ngx_http_upstream_init_jdomain_peer
directly calls ngx_http_upstream_jdomain_resolve_handler
.
The bug is caused when above happens. As ngx_http_upstream_jdomain_resolve_handler
sets NGX_JDOMAIN_STATUS_DONE
for instance resolve status, it returns to ngx_http_upstream_init_jdomain_peer
which sets NGX_JDOMAIN_STATUS_WAIT
as the last step in the loop.
This bug is irreversible as once this happens and instance gets set to NGX_JDOMAIN_STATUS_WAIT
, it never does another lookup again and stays stuck with old peer address.
We should support a flag per jdomain instance indicating if we explicitly want nginx to do the DNS lookup or not on startup.
This has implications on memory management of course, but this could be a very important improvement.
Currently, the way nginx is forced to do a DNS lookup for each jdomain occurrence on startup, the startup time can become very very excessively long if the nginx config includes many jdomain directives (as is the case in my own production config now...).
If we were to allow nginx to start without doing the initial lookup, then nginx could start up very snappy as it usually does, and defer the lookups to later. This is effectively taking the fallback (backup server, as of jdomain 1.0) mechanism to the next step, so I think it shouldn't be that difficult to implement.
Hi, I've been trying to build nginx with this jdomain module but have been encountering the following errors.
Believe that it's caused by incompatibility with openssl 1.1.1.
-o nginx-1.18.0/addon/src/ngx_http_upstream_jdomain.o
/root/ngx_upstream_jdomain/src/ngx_http_upstream_jdomain.c
In file included from src/core/ngx_core.h:60,
from /root/ngx_upstream_jdomain/src/ngx_http_upstream_jdomain.c:8:
/root/ngx_upstream_jdomain/src/ngx_http_upstream_jdomain.c: In function ‘ngx_http_upstream_set_jdomain_peer_session’:
/root/ngx_upstream_jdomain/src/ngx_http_upstream_jdomain.c:605:42: error: dereferencing pointer to incomplete type ‘SSL_SESSION’ {aka ‘struct ssl_session_st’}
605 | ssl_session ? ssl_session->references : 0);
| ^~
src/core/ngx_log.h:93:48: note: in definition of macro ‘ngx_log_debug’
93 | ngx_log_error_core(NGX_LOG_DEBUG, log, VA_ARGS)
| ^~~~~~~~~~~
/root/ngx_upstream_jdomain/src/ngx_http_upstream_jdomain.c:600:2: note: in expansion of macro ‘ngx_log_debug2’
600 | ngx_log_debug2(NGX_LOG_DEBUG_HTTP,
| ^~~~~~~~~~~~~~
make[1]: *** [nginx-1.18.0/Makefile:1578: nginx-1.18.0/addon/src/ngx_http_upstream_jdomain.o] Error 1
make[1]: Leaving directory '/root/nginx-1.18.0'
make: *** [Makefile:8: build] Error 2
As an alternative to #9, it would be preferable to disregard the fallback usage in the case of hostname resolution failures (timeouts or other such network failures). That way, the there would be no outage at all in the case of exceptional cases, and the fallback would be used only on valid DNS lookups where the record no longer resolves any peers. This could/should still be configurable as described in this comment on #9.
Document the project better.
I got an error when building and I'm not sure why
-o objs/addon/src/ngx_http_upstream_jdomain_module.o \
../ngx_upstream_jdomain/src/ngx_http_upstream_jdomain_module.c
../ngx_upstream_jdomain/src/ngx_http_upstream_jdomain_module.c:110:15: error: missing initializer for field ‘sin_family’ of ‘struct sockaddr_in’ [-Werror=missing-field-initializers]
static struct sockaddr_in NGX_JDOMAIN_INVALID_ADDR_SOCKADDR_IN = { };
^
In file included from /usr/include/bits/socket.h:151:0,
from /usr/include/sys/socket.h:39,
from src/os/unix/ngx_linux_config.h:44,
from src/core/ngx_config.h:26,
from ../ngx_upstream_jdomain/src/ngx_http_upstream_jdomain_module.c:2:
/usr/include/netinet/in.h:242:5: note: ‘sin_family’ declared here
_SOCKADDR_COMMON (sin);
^
cc1: all warnings being treated as errors
make[1]: *** [objs/addon/src/ngx_http_upstream_jdomain_module.o] Error 1
make[1]: Leaving directory `/root/nginx-1.18.0'
make: *** [build] Error 2
From time to time, the test step, specifically when running prove
, will come to a halt and cause the step to time out and fail.
This produces false failures and is very annoying if occurring on a workflow on master branch, since there's no way to re-run workflows to show the build actually was good.
Hi,
I have setup Nginx with ngx_upstream_jdomain to point to a VPC endpoint exposed over AWS Elasticsearch Service.
resolver 127.0.0.34;
upstream backend {
jdomain xx.xx.xx.xx.com port=443 max_ips=1 interval=20 strict;
keepalive 24;
}
location / {
proxy_pass https://backend;
}
But in case I am changing the data nodes of the cluster triggering a blue/green deployment and assigning a new IP address to the VPC endpoint, Nginx is not able to connect to the new endpoint.
I am using:
Nginx - 1.19.2
Jdomain - 1.1.5
Fix failing tests in t/002.upstream_dynamic.t.
Should get alternating responses, for example in Test 1, responses should be 201, 202, 201, 202, etc. Instead we have a clear pattern 201, 201, 202, 202, etc.
e.g weight, max_fails, fail_timeout
This does raise the question as to how we specify these (equally?) to each server member
Hey Folks,
We are seeing some weird issue after we upgrade jdomain to the latest release.
We have a very dynamic upstream which DNS updated quite often. We have been using jdomain to help with the upstream IP resolving. After upgrade, we are starting to see the issue that request keeps sending to old upstream after DNS update. We have no idea what could be wrong.
Any help will be appreciated,
Our upstream nginx config setup is pretty simple:
upstream upstream-upstream {
jdomain xxx.xxx.xxx.xxx port=xx;
keepalive 256;
}
we are using openresty with version 1.19.3.1
It would be nice to have some tooling for this project for executing jobs, such as building or running tests.
Consider npm
, cargo-make
, or alternatives.
Would be better to share jdomain state among all workers in order to save on redundant DNS queries and also keep all workers in sync when an update occurs.
Clone of wdaike/ngx_upstream_jdomain#7.
I expect this could be a rather large change, as it represents changing the underlying way the module caches resolved IP addresses so that they are provided to the upstream
via server
. I think in doing this, it opens the door to a bigger change to expose the features exposed by the server
directive through the jdomain
directive. This could potentially be an entire design (breaking) change of the jdomain
directive to effectively wrap the server
directive, allowing jdomain
to support all the same attributes server
has, with the added functionality jdomain
offers.
This module would really be more clean if the trigger for DNS query was on a timer event. That way, all jdomain DNS would be self updating and not require any traffic to an upstream just in order to keep it up to date with the DNS record.
we need to save the server addr somewhere else, just like below
static ngx_int_t
ngx_http_upstream_get_jdomain_peer(ngx_peer_connection_t *pc,
void *data)
{
ngx_http_upstream_jdomain_peer_data_t *jp = data;
ngx_http_request_t *r;
ngx_http_upstream_t *u;
ngx_int_t rc;
ngx_str_t *addr;
u_char *p;
size_t len;
ngx_log_debug0(NGX_LOG_DEBUG_HTTP, pc->log, 0,
"get jdomain peer");
rc = jp->original_get_peer(pc, jp->data);
if (rc != NGX_OK) {
return rc;
}
r = jp->request;
u = r->upstream;
len = pc->name->len;
p = ngx_pnalloc(r->pool, len);
if (p == NULL) {
return NGX_ERROR;
}
ngx_memcpy(p, pc->name->data, len);
addr = ngx_palloc(r->pool, sizeof(ngx_str_t));
if (addr == NULL) {
return NGX_ERROR;
}
addr->data = p;
addr->len = len;
pc->name = addr;
return NGX_OK;
}
Depends on #48
Blocking mode? For real? Are you serious? I know, it sounds crazy, but I think it should be an option.
Add a directive attribute blocking
which when passed will cause the peer init handler to use ngx_parse_url
instead of the configured resolver.
The reasons I think this is important are a bit complicated and come down to DNS resolution stability differences between using differing resolvers during runtime... Using this blocking
option would ensure the DNS resolution each interval uses the same resolver (I suppose the system one?) as the one used during initialization.
Hello!
I hope you are doing well!
We are a security research team. Our tool automatically detected a vulnerability in this repository. We want to disclose it responsibly. GitHub has a feature called Private vulnerability reporting, which enables security research to privately disclose a vulnerability. Unfortunately, it is not enabled for this repository.
Can you enable it, so that we can report it?
Thanks in advance!
PS: you can read about how to enable private vulnerability reporting here: https://docs.github.com/en/code-security/security-advisories/repository-security-advisories/configuring-private-vulnerability-reporting-for-a-repository
Hi,
I want to know if jdomain support the headers:
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
or my config is wrong
This is my nginx.config:
`
load_module /usr/local/nginx/modules/objs/ngx_http_upstream_jdomain_module.so;
user nginx;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
events {
worker_connections 1024;
multi_accept on;
}
http {
client_max_body_size 0;
resolver 127.0.0.11 valid=30s;
server {
listen 80;
proxy_buffer_size 128k;
proxy_buffers 4 256k;
proxy_busy_buffers_size 256k;
server_name http://proxy;
proxy_connect_timeout 1200s;
proxy_send_timeout 1200s;
proxy_read_timeout 1200s;
fastcgi_send_timeout 1200s;
fastcgi_read_timeout 1200s;
location ~ ^/(?!(api/)){
set $test_arch_archivistica_ui http://test_arch_archivistica_ui:80;
rewrite /(.*) /$1 break;
proxy_pass $test_arch_archivistica_ui;
}
location /api/notifications_alert{
rewrite /api/notifications_alert/(.*) /$1 break;
set $test_notifications_alert test_notifications_alert;
proxy_pass http://$test_arch_notifications_notifications_alert;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Host $host;
}
}
upstream test_notifications_alert{
server 127.0.0.2 backup;
jdomain test_arch_notifications strict interval=10 port=3001;
}
server {
listen 127.0.0.2:80;
return 502 'An error.';
}
}
`
and the way I connect is:
`
var private_socket = io('192.168.0.227/private-alert', {path:"/test_arch/api/notifications_alert/socket.io"} );
private_socket.on('connect', function(msg) {
console.log('Usuario ${msg} conectado')
});
`
the error that marks me is:
WebSocket connection to 'ws://192.168.0.227/test_arch/api/notifications_alert/socket.io/?EIO=3&transport=websocket' failed:
thanks in advance :D
Hey @nicholaschiasson, firstly thanks for the module here.
I'm looking through the code, trying to understand the reason for this block of code below:
ngx_upstream_jdomain/src/ngx_http_upstream_jdomain_module.c
Lines 157 to 160 in bcf71ff
What's the reason that peer pointers & max_ips
have to be the same? What I'm seeing is that if an upstream server has 2 separate ports such as below:
upstream test {
jdomain backend_1 port=1000 max_ips=2;
jdomain backend_1 port=2000 max_ips=2;
}
Starting up nginx will trigger the num peerps does not match max_ips
error message.
Changing the upstream to max_ips=1
doesn't work either, e.g.
upstream test {
jdomain backend_1 port=1000 max_ips=1;
jdomain backend_1 port=2000 max_ips=1;
}
will also yield the same num peerps does not match max_ips
error message.
Trying to understand if this is an intended design constraint or unintended behavior. I'm happy to contribute & help out if you'd like.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.