Comments (29)
@ysksuzuki are you able to reproduce it?
from coil.
@Cellebyte I'm sorry for keeping you waiting. I will check this next week.
from coil.
@ysksuzuki not dramatic ^^ I‘am currently running with the QuickFix from the PR #209
from coil.
@ysksuzuki are you now able to reproduce it? :D
from coil.
@Cellebyte Actually, I can't. The dual-stack address pool is working fine in my kind cluster where ipFamily is dual.
apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
networking:
ipFamily: dual
disableDefaultCNI: true
nodes:
- role: control-plane
- role: worker
- role: worker
- role: worker
from coil.
Could you elaborate on why vishvananda/netlink#576 could be related to this dual-stack problem, and the PR you sent us can be a solution?
from coil.
@ysksuzuki as the issue creator elaborates the not specifically set cidr mask could lead to issues when assigning dual-stack ips via netlink as it wrongly guesses the subnetmask when you change the order of ip addresses to ipv6 first. It tries to squeeze v6 into v4 because we only pass the ip without a cidrmask to netlink. And netlink at least in el8 based distros will fail with the error messages above because the attribute size could get to big.
from coil.
the not specifically set cidr mask could lead to issues when assigning dual-stack ips via netlink as it wrongly guesses the subnetmask when you change the order of ip addresses to ipv6 first.
The author of the vishvananda/netlink#576 reports that the mask passed into an address in 16-byte format derived from ParseIP causes the problem, and users can work around it by Mask: net.IPMask(net.ParseIP("255.255.255.0").To4())
. He doesn't mention what you are saying if I understand it correctly.
Even if that causes the problem, netlink.NewIPNet which coil calls when it adds an IP to a link device sets a cidr mask accordingly. So I don't think that can cause the problem.
Line 331 in 4766966
https://github.com/vishvananda/netlink/blob/5e915e0149386ce3d02379ff93f4c0a5601779d5/netlink.go#L35
As for the PR you sent, I think your patch doesn't have any effects on IPs coil allocates. netutil.IPAdd returns an IP in 4-byte representation for IPv4 and in 16-byte representation for IPv6. Therefore ipv4.Mask(net.CIDRMask(32, 32))
and ipv6.Mask(net.CIDRMask(128, 128))
won't modify those IPs.
https://github.com/cybozu-go/netutil/blob/7b3ee6fd8aa95d6611bb03d0ef175cac2287f15f/calc.go#L11
from coil.
It is not modifying the IPs but setting the correct mask on older netlink implementations.
Try a DualStack pool on an AlmaLinux System and you will see what I mean.
from coil.
I mean ipv4.Mask(net.CIDRMask(32, 32))
and ipv6.Mask(net.CIDRMask(128, 128))
won't have any effects on the IPs after netutil.IPAdd
. That always returns the same slice of bytes. Please elaborate on why/how this patch setting the correct mask on older netlink implementations
, e.g. show us example code.
https://go.dev/play/p/MNcBHSK0s-m
from coil.
@ysksuzuki the byte array length is not the issue. The mask itself is the problem because as the error message points out netlink things it is a smaller bytearray then it actually is. So it breaks in the C API. Will check if I find the corresponding code line in netlink.
from coil.
Just to be clear, you could confirm that your patch fixed this problem in your environment, right? As I mentioned above, ipv4.Mask(net.CIDRMask(32, 32))
and ipv6.Mask(net.CIDRMask(128, 128))
won't have any effects on the IPs. Those always return the exact same slices of bytes. How can those affect the masks?
from coil.
@ysksuzuki Yep I can confirm that it works now. I do not know. I think it is something outside the Go Code itself.
from coil.
Does the problem occur without the patch? I can't imagine something outside the Go would fix a problem.
from coil.
@ysksuzuki without the patch I am not able to create any veth interfaces for my Pod.
netlink: 'coild': attribute type 2 has an invalid length.
is the error message.
from coil.
That's weird...
from coil.
@Cellebyte Are you running on a non x86 system such as ARM64?
from coil.
@ymmt2005 nope x86_64 vm AlmaLinux8
from coil.
Hmm. I have no clue.
The proposed patch seems a no-op, as Yusuke is saying.
from coil.
I guessed that you could fix the problem because you rebuilt the Coil image.
from coil.
@ymmt2005 as the issue from the netlink library shows it could happen that it treats the IPAddress without a mask as /0 which could implement wrong routes.
from coil.
@Cellebyte That issue does not apply to Coil.
I'm aware of it, and Coil gives a proper netmask to each invocation of netlink.AddrAdd
or netlink.RouteAdd
.
from coil.
@ymmt2005 the question still remains. Why I was not able to use the current upstream version of coil with my AlmaLinux 8 Servers ^^
from coil.
@Cellebyte
All I can say for now is that there is nothing to do with vishvananda/netlink#576 and your patch itself doesn't fix this problem. You might be able to confirm it, e.g. try a rebuilt coil without the patch.
Can you investigate what is actually happening in your environment? It would be helpful if you could do that since we don't normally use Alma Linux.
from coil.
@ysksuzuki yeah I could try rebuilding the image without the patch.
from coil.
@ysksuzuki you were right. I had now time to further debug the bug and it seems like after I disabled CGO I am able to successfully deploy pods without any issue.
from coil.
@ysksuzuki I found the issue in coil, can we merge the Pull Request?
from coil.
@Cellebyte Thank you for investigating it. Could you recreate the PR or squash and tidy up the commits? And I would appreciate it if you could share why disabling CGO fixed this problem.
from coil.
@ysksuzuki I added a little comment on the PR.
from coil.
Related Issues (20)
- addressblocks are not freed when scheduled on master nodes HOT 10
- Allow modifications to Egress destinations HOT 1
- AddressBlocks not auto-removed and not manually removable HOT 3
- Egress NAT Deployment does not rollout restart
- Egress traffics is disconnected for about 30 seconds when deleting an Egress Pod HOT 1
- Support Kubernetes 1.23 and update dependencies
- Coil-egress accidentally deletes a peer
- Enhance CNI delete delay implementation HOT 1
- Create PDB for Egress NAT pods
- Enhance the graceful termination for Egress NAT HOT 2
- Support Kubernetes 1.25 and update dependencies
- Fix the IP address allocation logic from AddressBlock HOT 3
- Support Kubernetes 1.26 and update dependencies
- useless replace usage left in go.mod HOT 1
- Coil egress has downtime due to the timing of updating coild and coil controller HOT 7
- Use encap-sport auto in FOU tunnel setting for coil-egress HOT 1
- Support Kubernetes 1.27 and update dependencies
- Remove the unnecessary code block for v1 migration
- CNI issue in kind-created cluster HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from coil.