nicholaschiasson / ngx_upstream_jdomain Goto Github PK

An asynchronous domain name resolution module for nginx upstream.

License: BSD 2-Clause "Simplified" License

C 37.26% Shell 7.61% Perl 55.13%

nginx http server upstream dns c task-runner dns-resolution dns-lookup upstreams resolver hacktoberfest

ngx_upstream_jdomain's Issues

Support configurability of the NGX_RESOLVE_ errors to use fallback for

Currently, we have hardcoded the cases of NGX_RESOLVE_FORMERR and NGX_RESOLVE_NXDOMAIN as errors for which we always use the fallback. It would be more interesting to allow the configuration to select which resolve errors should always use the fallback. This could possibly be an enhancement of the strict attribute.

See here for list of errors: http://lxr.nginx.org/source/xref/nginx/src/core/ngx_resolver.h#27

#define NGX_RESOLVE_FORMERR   1
#define NGX_RESOLVE_SERVFAIL  2
#define NGX_RESOLVE_NXDOMAIN  3
#define NGX_RESOLVE_NOTIMP    4
#define NGX_RESOLVE_REFUSED   5
#define NGX_RESOLVE_TIMEDOUT  NGX_ETIMEDOUT

Linting

Add a linting job and integrate a style validation process with the automated workflow to fail workflows which do not pass.

Supported Nginx Versions

What versions of Nginx support this module? On the latest version (v1.21.6) I keep seeing the following error:

module "/etc/nginx/modules/ngx_http_upstream_jdomain_module.so" version 1021006 instead of 1016001 in /etc/nginx/nginx.conf:2

I'm not sure how best to troubleshoot this, or if support for this module only runs up to a certain older version of nginx. I've tested this and it works currently on v1.18.0.

Make sure alternative server is up in logic for strict

When applying the strict fallback, we only check if the upstream has other servers but not if any of those other servers are actually up. This could be problematic if any go down (if there are only bad jdomain servers for example).

To solve this, instead of just looking at the count of servers, we should quickly loop through the servers and check for server->down, breaking the moment we find one with false.

When using the DOMAIN UPSTREAM module, the VTS Module UPSTREAM counter is all 0.

When using the DOMAIN UPSTREAM module, the VTS Module UPSTREAM counter is all 0.
Can this problem be corrected by adjusting any parameters?

1.nginx_Config
upstream backend {
jdomain x.x.net port=443;
keepalive 300;
}

2.nginx-module-vts status

{
"upstreamZones": {
"https_backend": [
{
"server": "x.x.x.x:443",
"requestCounter": 0,
"inBytes": 0,
"outBytes": 0,
"responses": {
"1xx": 0,
"2xx": 0,
"3xx": 0,
"4xx": 0,
"5xx": 0
},
"requestMsecCounter": 0,
"requestMsec": 0,
"requestMsecs": {
"times": [],
"msecs": []
},
"requestBuckets": {
"msecs": [],
"counters": []
},
"responseMsecCounter": 0,
"responseMsec": 0,
"responseMsecs": {
"times": [],
"msecs": []
},
"responseBuckets": {
"msecs": [],
"counters": []
},
"weight": 1,
"maxFails": 1,
"failTimeout": 10,
"backup": false,
"down": false,
"overCounts": {
"maxIntegerSize": 18446744073709552000,
"requestCounter": 0,
"inBytes": 0,
"outBytes": 0,
"1xx": 0,
"2xx": 0,
"3xx": 0,
"4xx": 0,
"5xx": 0,
"requestMsecCounter": 0,
"responseMsecCounter": 0
}
}
],

Evaluate script or complex values in domain

Not 100% sure how to accomplish this yet, but the goal would be to be able to do something like the following:

upstream test {
    server 127.0.0.1:11111 backup;
    jdomain this-is-${some_variable}-an-example.com;
}

This domain name resolution would be guaranteed to fail on load, but I believe it should work at runtime using the context of each request. The reason this becomes unfeasible but not impossible is due to state management. I believe we would need a growable hash of the state object per evaluation of the domain name with its variables.

Some documentation on how to achieve this:

http://antoine.bonavita.free.fr/nginx_mod_dev_en.html
- See section 2.7 for info about scripts
http://nginx.org/en/docs/dev/development_guide.html#http_complex_values

Test against newer health check module

I think we should try to prefer this newer health check module. It is noted that it is still in development, however, it has support for prometheus output and also nginx stream module.

Update the ./.github/actions/nginx-module-toolbox/Dockerfile and ./scripts/build.sh files to make nginx build against this healthcheck module rather than the current one. One thing to be careful of is the patches. I know this new repo uses a weird naming convention for those patches...

Improve `least_conn` test case

In t/004.compatibility_nginx.t, test 3 is supposed to use jdomain with the least_conn algorithm, but the test is actually quite weak, making it not obvious to determine is the least_conn algorithm is even properly working.

The test only makes single connections at a time to the test nginx server, resulting in the load balancer effectively using round robin, which is indeed a valid behaviour of least_conn, but not representative of a live scenario where the peer is actually chosen based on least connected peers.

This test case should be improved to simulated many simultaneous connections to the upstream block so we can really see least_conn in action with jdomain upstreams.

Fix versioning

The semver version bump on merge requests is always a minor version (the default) due to the misuse of the github.head_ref variable in the workflows of merge commits on master.

github.head_ref is only useful on PR workflows, so we need to find a way to get the name of the ref that was merged into master from a workflow triggered by merging a PR to master... I would like to do it in a robust way if possible, otherwise I suppose there's no issue just using a little scripting and parsing git log to determine what kind of semver bump to apply 🤷.

Fix usage of ngx_strcmp

Inspired by wdaike/ngx_upstream_jdomain#10.

Contrary to that pull request however, I believe this should only apply to the fixed length switch attributes like retry_off and strict. It's an unnecessary check for the other attributes.

Support dynamic module

Inspired by wdaike/ngx_upstream_jdomain#13.

See https://www.nginx.com/nginx-wiki/build/dirhtml/extending/converting/.

Split out the docker image and github action

Splitting the image out of this repository would speed up the actions workflow since it wouldn't need to build per job.

Stale DNS Lookup Issue

We currently run are using ngx_upstream_jdomain release 1.4.0 as a forwarding proxy for talking to some downstream services that have rotating IP addresses. We run this in Kubernetes clusters in multiple cloud regions. When IP addresses rotate for a subset of our downstreams (for example today in 2/3 regions in 5/24 pods) we see

2023/08/29 08:32:14 [error] 12#12: ngx_http_upstream_jdomain_module: resolver failed, "www.example.com" (110: Operation timed out)

This issue does not recover itself, and we stick on a historic IP address for the service. Our jdomain configs look like

resolver 8.8.8.8 8.8.4.4;

upstream example{
        keepalive 32;
        keepalive_requests 100;
        keepalive_timeout 60s;
        jdomain www.example.com port=443  interval=60;
}

We've experienced the issue on both nginx-1.20.1 and 1.23.3.

Any help on recommended next steps or debugging would be appreciated. The error is a timeout error when talking to the DNS server, so I did wonder does ngx_upstream_jdomain try and re-establish a connection to DNS servers when there's a connection issue?

If using fallback, should stop respecting the interval period

The fallback becomes the peer to use in the case of DNS resolution failure. In many cases, this can be due to a sporadic failure, thus it becomes undesirable to use the fallback address for the full interval duration. Allowing one or a few errors is acceptable, but allowing errors for several seconds, possibly minutes, due to an interval configured to match a DNS TTL, this can be very bad.

In the case where the fallback address is being used, we should attempt the DNS resolution until it succeeds.

Add support for retrying github workflows

After reading some more documentation, it seems it may be possible to trigger workflows from comments on pull requests.

We could add pull_request_comment to the list of on events, and then validate the comment body matches some string like "retry" or something like that.

DNS resolution state bug

Hello,

There is a bug in how resolve.status is set.

When nginx actually does the lookup, ngx_http_upstream_jdomain_resolve_handler is called from polled events which works fine.

But when TTL is set to automatic (no valid argument for resolver directive) and if it is very high (like 30+ seconds, maybe even less), ngx_http_upstream_jdomain_resolve_handler is called from cached values in ngx_resolve_name_locked at line 669.

As you can see the function stack in bottom left, ngx_http_upstream_init_jdomain_peer directly calls ngx_http_upstream_jdomain_resolve_handler.

The bug is caused when above happens. As ngx_http_upstream_jdomain_resolve_handler sets NGX_JDOMAIN_STATUS_DONE for instance resolve status, it returns to ngx_http_upstream_init_jdomain_peer which sets NGX_JDOMAIN_STATUS_WAIT as the last step in the loop.

This bug is irreversible as once this happens and instance gets set to NGX_JDOMAIN_STATUS_WAIT, it never does another lookup again and stays stuck with old peer address.

I think fixing this bug will resolve #60 and #61

Allow nginx startup without doing initial DNS lookup

We should support a flag per jdomain instance indicating if we explicitly want nginx to do the DNS lookup or not on startup.

This has implications on memory management of course, but this could be a very important improvement.

Currently, the way nginx is forced to do a DNS lookup for each jdomain occurrence on startup, the startup time can become very very excessively long if the nginx config includes many jdomain directives (as is the case in my own production config now...).

If we were to allow nginx to start without doing the initial lookup, then nginx could start up very snappy as it usually does, and defer the lookups to later. This is effectively taking the fallback (backup server, as of jdomain 1.0) mechanism to the next step, so I think it shouldn't be that difficult to implement.

Build failed with openssl 1.1.1g

Hi, I've been trying to build nginx with this jdomain module but have been encountering the following errors.
Believe that it's caused by incompatibility with openssl 1.1.1.

-o nginx-1.18.0/addon/src/ngx_http_upstream_jdomain.o
/root/ngx_upstream_jdomain/src/ngx_http_upstream_jdomain.c
In file included from src/core/ngx_core.h:60,
from /root/ngx_upstream_jdomain/src/ngx_http_upstream_jdomain.c:8:
/root/ngx_upstream_jdomain/src/ngx_http_upstream_jdomain.c: In function ‘ngx_http_upstream_set_jdomain_peer_session’:
/root/ngx_upstream_jdomain/src/ngx_http_upstream_jdomain.c:605:42: error: dereferencing pointer to incomplete type ‘SSL_SESSION’ {aka ‘struct ssl_session_st’}
605 | ssl_session ? ssl_session->references : 0);
| ^~
src/core/ngx_log.h:93:48: note: in definition of macro ‘ngx_log_debug’
93 | ngx_log_error_core(NGX_LOG_DEBUG, log, VA_ARGS)
| ^~~~~~~~~~~
/root/ngx_upstream_jdomain/src/ngx_http_upstream_jdomain.c:600:2: note: in expansion of macro ‘ngx_log_debug2’
600 | ngx_log_debug2(NGX_LOG_DEBUG_HTTP,
| ^~~~~~~~~~~~~~
make[1]: *** [nginx-1.18.0/Makefile:1578: nginx-1.18.0/addon/src/ngx_http_upstream_jdomain.o] Error 1
make[1]: Leaving directory '/root/nginx-1.18.0'
make: *** [Makefile:8: build] Error 2

Do not use fallback on erroneous hostname resolutions

As an alternative to #9, it would be preferable to disregard the fallback usage in the case of hostname resolution failures (timeouts or other such network failures). That way, the there would be no outage at all in the case of exceptional cases, and the fallback would be used only on valid DNS lookups where the record no longer resolves any peers. This could/should still be configurable as described in this comment on #9.

Improve README.md

Document the project better.

Features: all supported directive attributes and what they do.
Development tools to use for local development: docker, act, etc.
General instructions for local development: building, testing, running github actions locally, etc.

error: missing initializer for field ‘sin_family’ of ‘struct sockaddr_in’

I got an error when building and I'm not sure why

-o objs/addon/src/ngx_http_upstream_jdomain_module.o \
../ngx_upstream_jdomain/src/ngx_http_upstream_jdomain_module.c

../ngx_upstream_jdomain/src/ngx_http_upstream_jdomain_module.c:110:15: error: missing initializer for field ‘sin_family’ of ‘struct sockaddr_in’ [-Werror=missing-field-initializers]
static struct sockaddr_in NGX_JDOMAIN_INVALID_ADDR_SOCKADDR_IN = { };
^
In file included from /usr/include/bits/socket.h:151:0,
from /usr/include/sys/socket.h:39,
from src/os/unix/ngx_linux_config.h:44,
from src/core/ngx_config.h:26,
from ../ngx_upstream_jdomain/src/ngx_http_upstream_jdomain_module.c:2:
/usr/include/netinet/in.h:242:5: note: ‘sin_family’ declared here
_SOCKADDR_COMMON (sin);
^
cc1: all warnings being treated as errors
make[1]: *** [objs/addon/src/ngx_http_upstream_jdomain_module.o] Error 1
make[1]: Leaving directory `/root/nginx-1.18.0'
make: *** [build] Error 2

Test step sometimes halts and times out

From time to time, the test step, specifically when running prove, will come to a halt and cause the step to time out and fail.

This produces false failures and is very annoying if occurring on a workflow on master branch, since there's no way to re-run workflows to show the build actually was good.

Not able to resolve the VPC endpoint of AWS ES service

Hi,

I have setup Nginx with ngx_upstream_jdomain to point to a VPC endpoint exposed over AWS Elasticsearch Service.

  resolver 127.0.0.34;
  
  upstream backend {
    jdomain xx.xx.xx.xx.com port=443 max_ips=1 interval=20 strict;
    keepalive 24;
  }
  
  location / {
     proxy_pass https://backend;
  }

But in case I am changing the data nodes of the cluster triggering a blue/green deployment and assigning a new IP address to the VPC endpoint, Nginx is not able to connect to the new endpoint.

I am using:
Nginx - 1.19.2
Jdomain - 1.1.5

After resolving new peers, always uses previously resolved peer one more time

Fix failing tests in t/002.upstream_dynamic.t.

Should get alternating responses, for example in Test 1, responses should be 201, 202, 201, 202, etc. Instead we have a clear pattern 201, 201, 202, 202, etc.

Support for additional LB options

e.g weight, max_fails, fail_timeout

This does raise the question as to how we specify these (equally?) to each server member

Potential issue with jdomain - keep seeing requests sending to old upstream after DNS update

Hey Folks,

We are seeing some weird issue after we upgrade jdomain to the latest release.

We have a very dynamic upstream which DNS updated quite often. We have been using jdomain to help with the upstream IP resolving. After upgrade, we are starting to see the issue that request keeps sending to old upstream after DNS update. We have no idea what could be wrong.

Any help will be appreciated,

Our upstream nginx config setup is pretty simple:

upstream upstream-upstream {
    jdomain xxx.xxx.xxx.xxx port=xx;
    keepalive 256;
  }

we are using openresty with version 1.19.3.1

Tooling for local development

It would be nice to have some tooling for this project for executing jobs, such as building or running tests.

Consider npm, cargo-make, or alternatives.

Use shared memory

Would be better to share jdomain state among all workers in order to save on redundant DNS queries and also keep all workers in sync when an update occurs.

wdaike/ngx_upstream_jdomain#7

Clone of wdaike/ngx_upstream_jdomain#7.

I expect this could be a rather large change, as it represents changing the underlying way the module caches resolved IP addresses so that they are provided to the upstream via server. I think in doing this, it opens the door to a bigger change to expose the features exposed by the server directive through the jdomain directive. This could potentially be an entire design (breaking) change of the jdomain directive to effectively wrap the server directive, allowing jdomain to support all the same attributes server has, with the added functionality jdomain offers.

Change DNS query trigger to timer event basis

This module would really be more clean if the trigger for DNS query was on a timer event. That way, all jdomain DNS would be self updating and not require any traffic to an upstream just in order to keep it up to date with the DNS record.

when dns resolve update, because server addr maybe changed, hence we may get the wrong $upstream_addr http variable

we need to save the server addr somewhere else, just like below

static ngx_int_t
ngx_http_upstream_get_jdomain_peer(ngx_peer_connection_t *pc,
void *data)
{
ngx_http_upstream_jdomain_peer_data_t *jp = data;
ngx_http_request_t *r;
ngx_http_upstream_t *u;
ngx_int_t rc;

ngx_str_t                             *addr;
u_char                                *p;

size_t                                 len;

ngx_log_debug0(NGX_LOG_DEBUG_HTTP, pc->log, 0,
           "get jdomain peer");

rc = jp->original_get_peer(pc, jp->data);

if (rc != NGX_OK) {
    return rc;
}

r = jp->request;
u = r->upstream;

len = pc->name->len;
p = ngx_pnalloc(r->pool, len);
if (p == NULL) {
    return NGX_ERROR;
}

ngx_memcpy(p, pc->name->data, len);

addr = ngx_palloc(r->pool, sizeof(ngx_str_t));
if (addr == NULL) {
    return NGX_ERROR;
}

addr->data = p;
addr->len = len;
pc->name = addr;

return NGX_OK;

}

Add blocking mode

Depends on #48

Blocking mode? For real? Are you serious? I know, it sounds crazy, but I think it should be an option.

Add a directive attribute blocking which when passed will cause the peer init handler to use ngx_parse_url instead of the configured resolver.

The reasons I think this is important are a bit complicated and come down to DNS resolution stability differences between using differing resolvers during runtime... Using this blocking option would ensure the DNS resolution each interval uses the same resolver (I suppose the system one?) as the one used during initialization.

Reporting a vulnerability

Hello!

I hope you are doing well!

We are a security research team. Our tool automatically detected a vulnerability in this repository. We want to disclose it responsibly. GitHub has a feature called Private vulnerability reporting, which enables security research to privately disclose a vulnerability. Unfortunately, it is not enabled for this repository.

Can you enable it, so that we can report it?

Thanks in advance!

PS: you can read about how to enable private vulnerability reporting here: https://docs.github.com/en/code-security/security-advisories/repository-security-advisories/configuring-private-vulnerability-reporting-for-a-repository

Jdomain support sockets?

Hi,
I want to know if jdomain support the headers:

      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection "Upgrade";

or my config is wrong

This is my nginx.config:
`
load_module /usr/local/nginx/modules/objs/ngx_http_upstream_jdomain_module.so;
user nginx;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
events {
worker_connections 1024;
multi_accept on;
}
http {
client_max_body_size 0;
resolver 127.0.0.11 valid=30s;

server {
    listen 80;
    proxy_buffer_size 128k;
    proxy_buffers 4 256k;
    proxy_busy_buffers_size 256k;
    server_name  http://proxy;
    proxy_connect_timeout 1200s;
    proxy_send_timeout 1200s;
    proxy_read_timeout 1200s;
    fastcgi_send_timeout 1200s;
    fastcgi_read_timeout 1200s;


    location ~ ^/(?!(api/)){
        set $test_arch_archivistica_ui http://test_arch_archivistica_ui:80;
        rewrite /(.*) /$1 break;
        proxy_pass $test_arch_archivistica_ui;
    }

    location /api/notifications_alert{
        rewrite /api/notifications_alert/(.*) /$1 break;
        set $test_notifications_alert test_notifications_alert;
        proxy_pass http://$test_arch_notifications_notifications_alert;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_set_header Host $host;
    }

}

upstream test_notifications_alert{
    server 127.0.0.2 backup;
    jdomain test_arch_notifications strict interval=10  port=3001;
}

  server {
        listen 127.0.0.2:80;
        return 502 'An error.';
      }

}
`
and the way I connect is:

`
var private_socket = io('192.168.0.227/private-alert', {path:"/test_arch/api/notifications_alert/socket.io"} );

   private_socket.on('connect', function(msg) {
       console.log('Usuario ${msg} conectado')
  });

the error that marks me is:
WebSocket connection to 'ws://192.168.0.227/test_arch/api/notifications_alert/socket.io/?EIO=3&transport=websocket' failed:

thanks in advance :D

num peerps does not match max_ips

Hey @nicholaschiasson, firstly thanks for the module here.

I'm looking through the code, trying to understand the reason for this block of code below:

ngx_upstream_jdomain/src/ngx_http_upstream_jdomain_module.c

Lines 157 to 160 in bcf71ff

 if (j != instance[i].conf.max_ips) { 

 ngx_conf_log_error(NGX_LOG_EMERG, cf, 0, "ngx_http_upstream_jdomain_module: num peerps does not match max_ips"); 

 return NGX_ERROR; 

 }

What's the reason that peer pointers & max_ips have to be the same? What I'm seeing is that if an upstream server has 2 separate ports such as below:

upstream test {
  jdomain backend_1 port=1000 max_ips=2;
  jdomain backend_1 port=2000 max_ips=2;
}

Starting up nginx will trigger the num peerps does not match max_ips error message.

Changing the upstream to max_ips=1 doesn't work either, e.g.

upstream test {
  jdomain backend_1 port=1000 max_ips=1;
  jdomain backend_1 port=2000 max_ips=1;
}

will also yield the same num peerps does not match max_ips error message.

Trying to understand if this is an intended design constraint or unintended behavior. I'm happy to contribute & help out if you'd like.

	if (j != instance[i].conf.max_ips) {
	ngx_conf_log_error(NGX_LOG_EMERG, cf, 0, "ngx_http_upstream_jdomain_module: num peerps does not match max_ips");
	return NGX_ERROR;
	}

nicholaschiasson / ngx_upstream_jdomain Goto Github PK

ngx_upstream_jdomain's Issues

Recommend Projects

Recommend Topics

Recommend Org