Comments (5)
Yeah just faced this issue but with detach behavior.
Assume the reason is our pod have 2 mounted csi volumes and while 1-st of them detaching server becomes locked for short amount of time.
level=debug ts=2019-09-26T16:12:07.571214176Z component=grpc-server msg="handling level=debug ts=2019-09-26T16:12:07.571214176Z component=grpc-server msg="handling request" req="volume_id:\"volume_id\" node_id:\"server_id_here\" "
level=info ts=2019-09-26T16:12:07.571564951Z component=api-volume-service msg="detaching volume from server" volume-id=volume_id server-id=server_id_here
level=info ts=2019-09-26T16:12:07.705827634Z component=api-volume-service msg="failed to detach volume" volume-id=volume_id err="cannot perform operation because server is locked (locked)"
level=error ts=2019-09-26T16:12:07.705971793Z component=grpc-server msg="handler failed" err="rpc error: code = Internal desc = failed to unpublish volume: cannot perform operation because server is locked (locked)"
@thcyron
Assume it should be done here. Or do you have a better idea to fix this bug for both Detach and Attach actions?
csi-driver/driver/controller.go
Lines 196 to 207 in 8da058c
from csi-driver.
As far as I understood from this specification
ABORTED code means that action already invoked and running for this volume. But "server_locked" error doesn't mean action already running for this volume, it can be another volume currently attaching/detaching to/from the server causing it to be locked.
I have no experience in go, but for me, it looks like if we returning ABORTED code Publish/Unpublish action will not be retried, and will mean volume is detached.
I believe it should work next way:
verifying the server isn't locked and wait till server not locked -> unpublish/publish volume -> in case got server locked error, go to the beginning, of course with some meaningful retries limit.
A little test
Decided to test if error reproducible using hcloud cli tool. For this purpose created 1 server and 3 volumes in the empty project we are using for testing (IP redacted because some time ago hetzner had assigned IP of the previously destructed instance to the new machine):
$ hcloud server list
ID NAME STATUS IPV4 IPV6 DATACENTER
3360348 testing-volume-concurency running 116.203.***.** 2a01:***:***:****::/64 nbg1-dc3
$ hcloud volume list
ID NAME SIZE SERVER LOCATION
3323542 test-vol1 10 GB - nbg1
3323544 test-vol2 10 GB - nbg1
3323545 test-vol3 10 GB - nbg1
Here the code snippet I used to verify this error can occur due to concurent opperations running against same instance:
hcloud volume attach --server <server_id> <vol_name_1> &
hcloud volume attach --server <server_id> <vol_name_2> &
hcloud volume attach --server <server_id> <vol_name_3> &
wait
And what I've got in the output:
$ hcloud volume attach --server 3360348 test-vol1 &
[1] 8153
$ hcloud volume attach --server 3360348 test-vol2 &
[2] 8154
$ hcloud volume attach --server 3360348 test-vol3 &
[3] 8155
$ wait
hcloud: cannot perform operation because server is locked (locked)
hcloud: cannot perform operation because server is locked (locked)
[1] Exit 1 hcloud volume attach --server 3360348 test-vol1
[2]- Exit 1 hcloud volume attach --server 3360348 test-vol2
1s [====================================================================] 100%
Volume 3323545 attached to server testing-volume-concurency
[3]+ Done hcloud volume attach --server 3360348 test-vol3
Results: Only test-vol-3 got mounted on the server, vol1 and vol2 attachments rejected by the hetzner doe to "server is locked" error.
To not make this message too long, want to mention the same behavior observed while detaching multiple volumes in a concurrent manner.
from csi-driver.
According to the code you posted, we already return error code aborted in case the server is locked:
case volumes.ErrLockedServer:
code = codes.Aborted
That that doesn’t seem to work. Would need to debug this.
from csi-driver.
I think you’re right. We don’t use the ABORTED
error code correctly. I’ve creatd #63 to address this.
from csi-driver.
Was fixed with #84
from csi-driver.
Related Issues (20)
- Volumes are attached to the http proxy HOT 5
- You must either provide secret.hcloudApiToken or secret.existingSecretName HOT 3
- Missing image 2.3.0 HOT 3
- Plans to make the CSI working on bare metal (root) servers? HOT 1
- Use hetznercloud csi driver in non hetzner-cloud servers. HOT 1
- Helm Chart: Permissions for leases in apiGroup coordination.k8s.io missing
- Allow passing file system formatting options (e.g. block size) HOT 1
- Missing "mount" directory on provisioned persistent volume HOT 6
- fix(chart): Make default values work with cloud/dedicated hybrid clusters HOT 2
- feat(helm): deploy Grafana dashboard HOT 1
- clarify nomad requirements? HOT 2
- Question: can I attach a Volume to pods running on nodes that are NOT provisioned in Hetzner cloud? HOT 1
- Volume is not attached to the instance, but VolumeAttachment is existent already HOT 3
- PVC Fail "existing disk format of " HOT 7
- Support mounting with SELinux mount options to prevent big volumes from not being able to mount into pods HOT 1
- registry.k8s.io 403, blacklisted ip HOT 1
- Running csi-driver on Hetzner bare-metal machines HOT 1
- ci: release process broken for 2.7.0 HOT 5
- Failed to increase pv size after successful increase of pvc size HOT 1
- Failed to recover after node took the drive (volume) offline
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from csi-driver.