Giter Club home page Giter Club logo

Comments (12)

denis-tingaikin avatar denis-tingaikin commented on June 23, 2024

Tested on kind. The issue is reproducible.

from deployments-k8s.

denis-tingaikin avatar denis-tingaikin commented on June 23, 2024

Logs from my local running:

nsc-init.txt
nsc.txt

from deployments-k8s.

denis-tingaikin avatar denis-tingaikin commented on June 23, 2024

@edwarnicke Can we consider this issue ASAP?

from deployments-k8s.

edwarnicke avatar edwarnicke commented on June 23, 2024

@denis-tingaikin Yes

from deployments-k8s.

denis-tingaikin avatar denis-tingaikin commented on June 23, 2024

Root cause:

  1. Spire gives a certificate for 1h
  2. NSM schedules refresh after 1h * 1/3
  3. On refreshing spire updates certificates for all applications
  4. At refresh request moment nsmgr has a new certificate from spire, but authInfo from gRPC keeps the old certificate from step1.
  5. nsmgr updates token with new certificate
  6. client can not validate the token from nsmgr because authInfo from gRPC keeps the old certificate from step1. (failure here)

Currently, I'm not found a good solution for this issue, started to look into gRPC source code.

from deployments-k8s.

denis-tingaikin avatar denis-tingaikin commented on June 23, 2024

Tested today two workarounds:

  1. https://golang.org/pkg/crypto/tls/#RenegotiationSupport -- it is not helped
  2. remove connection caching in connect -- it is working

Still looking for other solutions.

from deployments-k8s.

denis-tingaikin avatar denis-tingaikin commented on June 23, 2024

@edwarnicke

I've asked spire guys about the issue and got the next answer:

Andrew Harding 14 hours ago
This is expected. gRPC will reuse the existing connection when you issue RPCs unless you redial. Since no new TLS handshake takes place, the new client credential is never communicated.
white_check_mark
eyes
raised_hands

Andrew Harding 14 hours ago
The x509source returns a channel from Updated() that callers can use to know when the SVID has been updated so they can re-establish a connection with the new credential.

Question: Can we modify connect chain elements to wait for update SVID to make re-dial?

Note: we can just pass option to wait to channel to not depend on spire functions

from deployments-k8s.

denis-tingaikin avatar denis-tingaikin commented on June 23, 2024

Currently we have the next options to fix the issue:

  1. Do redial as suggested spire guys on svid updating #1929 (comment)
  2. Remove policy last token signed.
  3. Keep and use first certificates for client and server on token generating.
  4. Your option.

For me option 1 looks good.

@edwarnicke Please share your thoughts on these options.

from deployments-k8s.

edwarnicke avatar edwarnicke commented on June 23, 2024

I'm curious... are they saying that GRPC won't close existing connections that have a TLS certificate that has expired after the connection was established?

from deployments-k8s.

denis-tingaikin avatar denis-tingaikin commented on June 23, 2024

Yes, as I got it, a handshake is doing once per dial.

from deployments-k8s.

edwarnicke avatar edwarnicke commented on June 23, 2024

@denis-tingaikin It looks like we need to do something that involves option 1 above... but lets try to keep it simple and natural :)

from deployments-k8s.

denis-tingaikin avatar denis-tingaikin commented on June 23, 2024

The root cause is fixed in networkservicemesh/sdk#1005

But I found that the issue can be reproduced via unstable healing. This reproducing periodically.
Tested a fix networkservicemesh/sdk#1005 without heal and it working fine in 100% cases.

from deployments-k8s.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.