Giter Club home page Giter Club logo

Comments (29)

Cellebyte avatar Cellebyte commented on July 26, 2024

@ysksuzuki are you able to reproduce it?

from coil.

ysksuzuki avatar ysksuzuki commented on July 26, 2024

@Cellebyte I'm sorry for keeping you waiting. I will check this next week.

from coil.

Cellebyte avatar Cellebyte commented on July 26, 2024

@ysksuzuki not dramatic ^^ I‘am currently running with the QuickFix from the PR #209

from coil.

Cellebyte avatar Cellebyte commented on July 26, 2024

@ysksuzuki are you now able to reproduce it? :D

from coil.

ysksuzuki avatar ysksuzuki commented on July 26, 2024

@Cellebyte Actually, I can't. The dual-stack address pool is working fine in my kind cluster where ipFamily is dual.

apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
networking:
  ipFamily: dual
  disableDefaultCNI: true
nodes:
- role: control-plane
- role: worker
- role: worker
- role: worker

from coil.

ysksuzuki avatar ysksuzuki commented on July 26, 2024

Could you elaborate on why vishvananda/netlink#576 could be related to this dual-stack problem, and the PR you sent us can be a solution?

from coil.

Cellebyte avatar Cellebyte commented on July 26, 2024

@ysksuzuki as the issue creator elaborates the not specifically set cidr mask could lead to issues when assigning dual-stack ips via netlink as it wrongly guesses the subnetmask when you change the order of ip addresses to ipv6 first. It tries to squeeze v6 into v4 because we only pass the ip without a cidrmask to netlink. And netlink at least in el8 based distros will fail with the error messages above because the attribute size could get to big.

from coil.

ysksuzuki avatar ysksuzuki commented on July 26, 2024

@Cellebyte

the not specifically set cidr mask could lead to issues when assigning dual-stack ips via netlink as it wrongly guesses the subnetmask when you change the order of ip addresses to ipv6 first.

The author of the vishvananda/netlink#576 reports that the mask passed into an address in 16-byte format derived from ParseIP causes the problem, and users can work around it by Mask: net.IPMask(net.ParseIP("255.255.255.0").To4()). He doesn't mention what you are saying if I understand it correctly.

Even if that causes the problem, netlink.NewIPNet which coil calls when it adds an IP to a link device sets a cidr mask accordingly. So I don't think that can cause the problem.

IPNet: netlink.NewIPNet(pn.hostIPv4),

https://github.com/vishvananda/netlink/blob/5e915e0149386ce3d02379ff93f4c0a5601779d5/netlink.go#L35

As for the PR you sent, I think your patch doesn't have any effects on IPs coil allocates. netutil.IPAdd returns an IP in 4-byte representation for IPv4 and in 16-byte representation for IPv6. Therefore ipv4.Mask(net.CIDRMask(32, 32)) and ipv6.Mask(net.CIDRMask(128, 128)) won't modify those IPs.

https://github.com/cybozu-go/netutil/blob/7b3ee6fd8aa95d6611bb03d0ef175cac2287f15f/calc.go#L11

from coil.

Cellebyte avatar Cellebyte commented on July 26, 2024

It is not modifying the IPs but setting the correct mask on older netlink implementations.
Try a DualStack pool on an AlmaLinux System and you will see what I mean.

from coil.

ysksuzuki avatar ysksuzuki commented on July 26, 2024

I mean ipv4.Mask(net.CIDRMask(32, 32)) and ipv6.Mask(net.CIDRMask(128, 128)) won't have any effects on the IPs after netutil.IPAdd. That always returns the same slice of bytes. Please elaborate on why/how this patch setting the correct mask on older netlink implementations, e.g. show us example code.

https://go.dev/play/p/MNcBHSK0s-m

from coil.

Cellebyte avatar Cellebyte commented on July 26, 2024

@ysksuzuki the byte array length is not the issue. The mask itself is the problem because as the error message points out netlink things it is a smaller bytearray then it actually is. So it breaks in the C API. Will check if I find the corresponding code line in netlink.

from coil.

ysksuzuki avatar ysksuzuki commented on July 26, 2024

Just to be clear, you could confirm that your patch fixed this problem in your environment, right? As I mentioned above, ipv4.Mask(net.CIDRMask(32, 32)) and ipv6.Mask(net.CIDRMask(128, 128)) won't have any effects on the IPs. Those always return the exact same slices of bytes. How can those affect the masks?

from coil.

Cellebyte avatar Cellebyte commented on July 26, 2024

@ysksuzuki Yep I can confirm that it works now. I do not know. I think it is something outside the Go Code itself.

from coil.

ysksuzuki avatar ysksuzuki commented on July 26, 2024

Does the problem occur without the patch? I can't imagine something outside the Go would fix a problem.

from coil.

Cellebyte avatar Cellebyte commented on July 26, 2024

@ysksuzuki without the patch I am not able to create any veth interfaces for my Pod.
netlink: 'coild': attribute type 2 has an invalid length. is the error message.

from coil.

ysksuzuki avatar ysksuzuki commented on July 26, 2024

That's weird...

from coil.

ymmt2005 avatar ymmt2005 commented on July 26, 2024

@Cellebyte Are you running on a non x86 system such as ARM64?

from coil.

Cellebyte avatar Cellebyte commented on July 26, 2024

@ymmt2005 nope x86_64 vm AlmaLinux8

from coil.

ymmt2005 avatar ymmt2005 commented on July 26, 2024

Hmm. I have no clue.
The proposed patch seems a no-op, as Yusuke is saying.

from coil.

ymmt2005 avatar ymmt2005 commented on July 26, 2024

I guessed that you could fix the problem because you rebuilt the Coil image.

from coil.

Cellebyte avatar Cellebyte commented on July 26, 2024

@ymmt2005 as the issue from the netlink library shows it could happen that it treats the IPAddress without a mask as /0 which could implement wrong routes.

from coil.

ymmt2005 avatar ymmt2005 commented on July 26, 2024

@Cellebyte That issue does not apply to Coil.
I'm aware of it, and Coil gives a proper netmask to each invocation of netlink.AddrAdd or netlink.RouteAdd.

from coil.

Cellebyte avatar Cellebyte commented on July 26, 2024

@ymmt2005 the question still remains. Why I was not able to use the current upstream version of coil with my AlmaLinux 8 Servers ^^

from coil.

ysksuzuki avatar ysksuzuki commented on July 26, 2024

@Cellebyte
All I can say for now is that there is nothing to do with vishvananda/netlink#576 and your patch itself doesn't fix this problem. You might be able to confirm it, e.g. try a rebuilt coil without the patch.

Can you investigate what is actually happening in your environment? It would be helpful if you could do that since we don't normally use Alma Linux.

from coil.

Cellebyte avatar Cellebyte commented on July 26, 2024

@ysksuzuki yeah I could try rebuilding the image without the patch.

from coil.

Cellebyte avatar Cellebyte commented on July 26, 2024

@ysksuzuki you were right. I had now time to further debug the bug and it seems like after I disabled CGO I am able to successfully deploy pods without any issue.

from coil.

Cellebyte avatar Cellebyte commented on July 26, 2024

@ysksuzuki I found the issue in coil, can we merge the Pull Request?

from coil.

ysksuzuki avatar ysksuzuki commented on July 26, 2024

@Cellebyte Thank you for investigating it. Could you recreate the PR or squash and tidy up the commits? And I would appreciate it if you could share why disabling CGO fixed this problem.

from coil.

Cellebyte avatar Cellebyte commented on July 26, 2024

@ysksuzuki I added a little comment on the PR.

from coil.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.