Giter Club home page Giter Club logo

Comments (17)

weizhouapache avatar weizhouapache commented on July 23, 2024 1

@trapgate thanks a lot.

per you suggestion, I add a line before libvirt.New(conn) to drop a byte.

conn.Read(make([]byte, 1))

it works fine now

from go-libvirt.

berrange avatar berrange commented on July 23, 2024 1

I've reproduced this with a go program that doesn't import go-libvirt. After the call to tls.Dial() connects successfully to the libvirt daemon, a call to conn.Read() with a 1-byte buffer will return 1 byte. Subsequent Read calls will block waiting for the daemon to send something.

This is either a bug in libvirt, or a bug in the go TLS library. The old version of go-libvirt masked this bug - it accidentally ignored that odd 1-byte packet.

This is not a bug, it is part of the libvirt RPC protocol when running over TLS.

https://gitlab.com/libvirt/libvirt/-/blob/1dddd444519717c785a9fd93f393c3313c87c342/src/rpc/virnetserverclient.c#L300

    /* Checks have succeeded.  Write a '\1' byte back to the client to
     * indicate this (otherwise the socket is abruptly closed).
     * (NB. The '\1' byte is sent in an encrypted record).
     */

And on the client side

https://gitlab.com/libvirt/libvirt/-/blob/1dddd444519717c785a9fd93f393c3313c87c342/src/rpc/virnetclient.c#L986

    /* At this point, the server is verifying _our_ certificate, IP address,
     * etc.  If we make the grade, it will send us a '\1' byte.
     */

...snip...

    len = virNetTLSSessionRead(client->tls, buf, 1);
    if (len < 0 && errno != ENOMSG) {
        virReportSystemError(errno, "%s",
                             _("Unable to read TLS confirmation"));
        goto error;
    }
    if (len != 1 || buf[0] != '\1') {
        virReportError(VIR_ERR_RPC, "%s",
                       _("server verification (of our certificate or IP "
                         "address) failed"));
        goto error;
    }

Yes, we can argue this extra byte is pretty much redundant, but its there in the protocol for coming on for 15 years, so we can't remove it due to backcompat.

A compliant client must read this single byte after TLS handshake is done and verify that it is the value 0x1.

from go-libvirt.

trapgate avatar trapgate commented on July 23, 2024

I think what's happening here is that the change from PR #86 has affected the listen() function in rpc.go. I'd suggest instrumenting that function to see what's happening. It may be that the call to pktlen() or extractHeader() is now returning an error, where it once returned a short read instead. That could cause the listening goroutine to exit, or just to ignore an important packet. This hasn't affected non-TLS connections, at least to localhost; those are still succeeding, at least for me.

from go-libvirt.

a2geek avatar a2geek commented on July 23, 2024

I've never even debugged a Go application, believe it or not. :-)

But, the debugger worked for me and luckily the code was pretty easy to track down.

All of this is in rpc.go as you surmise.

On line 158 of listen(), the packet returned length is 16MB. Which seems fishy. At some point, it then does a io.ReadFull(...) for that size buffer. Which never shows up.

Tracing into pktlen(...) I added comments for the return values, if that's useful:

func pktlen(r io.Reader) (uint32, error) {
	buf := make([]byte, constants.PacketLengthSize /* 4 bytes? */)

	// read exactly constants.PacketLengthSize bytes
	_, err := io.ReadFull(r, buf)
       /* tracing through ReadFull indicates this is 4 bytes, as would be expected */
	if err != nil {
		return 0, err
	}

	/* buf is 1,0,0,0 (index 0,1,2,3) */
	return binary.BigEndian.Uint32(buf), nil
}

from go-libvirt.

trapgate avatar trapgate commented on July 23, 2024

I agree that a returned packet length of 16MB is pretty fishy.

The listen() function works like this:

  1. Read 4 bytes from the input stream, decode that as an integer packet length.
  2. Read 24 bytes from the input stream, decode that as a header.
  3. Read (PacketLength - 28) bytes from the stream, that's the payload.

But, notice the error handling: if step 1 or 2 returns an error, the routine just does a continue. That's almost certainly not correct: if step 2 returns an error for some reason, then we will re-enter the loop and read the next 4 bytes as the length of the following packet, without having read the payload of the previous packet. I think something like this is happening in your case, so I'm curious whether either pktlen or extractHeader returned an error prior to the call where pktlen returned a 16MB packet.

from go-libvirt.

a2geek avatar a2geek commented on July 23, 2024

Running through the debugger, notes:

  • No errors at all.
  • Looks to be 1st packet ever exchanged

From listen():

  • listen = 16777216; the buffer in ptklen(...) is [1,0,0,0].
  • header h:
    • Program = 606077056
    • Version = 2248146944
    • Procedure = 16777216
    • Type = 1107296256
    • Serial = 16777216
    • Status = 16777216

... so the values are definitely funky.

from go-libvirt.

a2geek avatar a2geek commented on July 23, 2024

Ran the debug session again, pasted in the old loop into pktlen(...) and observed something interesting!

The loop in pktlen ran twice:

  1. Reads 1 byte with a value of 1. But the bytes read (1) is less than the cap(buf) (4).
  2. Reads 4 bytes with the values of 0,0,0,36. Bytes read (5) is larger than cap(buf).
  3. Returns length value of 36 which is much more sensible.

So... is packet length 4 or 5 bytes? Or is there a different part of the header (maybe a protocol version or something) hidden in there?

from go-libvirt.

a2geek avatar a2geek commented on July 23, 2024

Finally, just in case this is useful, I created an Echo connection with a basic Read function like this:

func (e Echo) Read(b []byte) (n int, err error) {
	n, err = e.conn.Read(b)
	if n > 0 {
		fmt.Printf("b=")
		for i := 0; i < n; i++ {
			fmt.Printf("%02x ", b[i])
		}
		fmt.Printf("\n")
	}
	return n, err
}

The output is like this (byte sequences, each line is a Read):

$ go run -mod=vendor main.go
TCP connection to 'odin:16514' established
Libvirt client created
b=01 
b=00 00 00 24 20 00 80 86 00 00 00 01 00 00 00 42 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 00 
^Csignal: interrupt

(it hung at the ^C)

Not certain why there is an initial 0x01 in the connection.

from go-libvirt.

a2geek avatar a2geek commented on July 23, 2024

As another experiment, I ran a simple program with a socket connection followed by the TLS connection. The echo output suggests that the initial 0x01 in the stream is the only one:

Connected to '/var/run/libvirt/libvirt-sock'
Libvirt client created
write: 00 00 00 38 20 00 80 86 00 00 00 01 00 00 00 42 00 00 00 00 00 00 00 01 00 00 00 00 01 00 00 00 00 00 00 0e 71 65 6d 75 3a 2f 2f 2f 73 79 73 74 65 6d 00 00 00 00 00 00 
read: 00 00 00 24 20 00 80 86 00 00 00 01 00 00 00 42 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 00 
write: 00 00 00 38 20 00 80 86 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 02 00 00 00 00 01 00 00 00 00 00 00 0e 71 65 6d 75 3a 2f 2f 2f 73 79 73 74 65 6d 00 00 00 00 00 00 
read: 00 00 00 1c 20 00 80 86 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00 02 00 00 00 00 
Connected to libvirt
write: 00 00 00 1c 20 00 80 86 00 00 00 01 00 00 00 9d 00 00 00 00 00 00 00 03 00 00 00 00 
read: 00 00 00 24 20 00 80 86 00 00 00 01 00 00 00 9d 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00 00 00 3d 09 00 
Version: 4.0.0
write: 00 00 00 1c 20 00 80 86 00 00 00 01 00 00 00 02 00 00 00 00 00 00 00 04 00 00 00 00 
read: 00 00 00 1c 20 00 80 86 00 00 00 01 00 00 00 02 00 00 00 01 00 00 00 04 00 00 00 00 


TLS connection to 'odin:16514' established
Libvirt client created
write: 00 00 00 38 20 00 80 86 00 00 00 01 00 00 00 42 00 00 00 00 00 00 00 01 00 00 00 00 01 00 00 00 00 00 00 0e 71 65 6d 75 3a 2f 2f 2f 73 79 73 74 65 6d 00 00 00 00 00 00 
read: 01 
read: 00 00 00 24 20 00 80 86 00 00 00 01 00 00 00 42 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 00 
^Csignal: interrupt

If I then swallow that initial byte, it seems to work:

TLS connection to 'odin:16514' established
Libvirt client created
write: 00 00 00 38 20 00 80 86 00 00 00 01 00 00 00 42 00 00 00 00 00 00 00 01 00 00 00 00 01 00 00 00 00 00 00 0e 71 65 6d 75 3a 2f 2f 2f 73 79 73 74 65 6d 00 00 00 00 00 00 
read: 00 00 00 24 20 00 80 86 00 00 00 01 00 00 00 42 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 00 
write: 00 00 00 38 20 00 80 86 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 02 00 00 00 00 01 00 00 00 00 00 00 0e 71 65 6d 75 3a 2f 2f 2f 73 79 73 74 65 6d 00 00 00 00 00 00 
read: 00 00 00 1c 20 00 80 86 00 00 00 01 00 00 00 01 00 00 00 01 00 00 00 02 00 00 00 00 
Connected to libvirt
write: 00 00 00 1c 20 00 80 86 00 00 00 01 00 00 00 9d 00 00 00 00 00 00 00 03 00 00 00 00 
read: 00 00 00 24 20 00 80 86 00 00 00 01 00 00 00 9d 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00 00 00 3d 09 00 
Version: 4.0.0
write: 00 00 00 1c 20 00 80 86 00 00 00 01 00 00 00 02 00 00 00 00 00 00 00 04 00 00 00 00 
read: 00 00 00 1c 20 00 80 86 00 00 00 01 00 00 00 02 00 00 00 01 00 00 00 04 00 00 00 00 

Is this something with TLS I need to deal with? I can't figure out where that little bit of noise is coming from...

from go-libvirt.

trapgate avatar trapgate commented on July 23, 2024

It sure looks to me like that leftover byte comes from the TLS handshake, and should have been consumed by the TLS library before it completed. I did some googling to see if there were any bugs reported against the golang TLS library that looked like this, but I came up empty. What version of go are you using?

One approach would be to run tcpdump and observe the packets transferred during the TLS handshake. The last one would have to end in a byte with a value of 0x01, and that value would have to not be consumed by the TLS library.

Another possibility would be a bug in libvirtd, but such a bug would probably affect non-TLS connections too, so that seems unlikely.

Here's the documentation on the wire protocol libvirt uses: https://libvirt.org/internals/rpc.html.

Hope this helps!

from go-libvirt.

a2geek avatar a2geek commented on July 23, 2024

I guess that's possible. But what confuses me is that prior to PR #86, everything works fine. However, I think my debugging might suggest the original code was buggy (instead of reading 4 bytes, that pktlen routine actually read 5). I'm sitting on the commit prior to that PR and I have observed no issues.

I don't think it's Libvirt, as virsh is happy with the TLS connection:

$ virsh -c qemu+tls://odin/system?pkipath=/home/rob/libvirt-cert
Welcome to virsh, the virtualization interactive terminal.

Type:  'help' for help with commands
       'quit' to quit

virsh # version
Compiled against library: libvirt 4.0.0
Using library: libvirt 4.0.0
Using API: QEMU 4.0.0
Running hypervisor: QEMU 2.11.1

virsh # list
 Id    Name                           State
----------------------------------------------------
 30    vm-e7ceede4-b0e6-4b3d-566e-8490c7168d22 paused

virsh # quit

As far as libraries, I'm using Go 1.12. Current version is:

$ go version
go version go1.12.9 linux/amd64

Originally, this was 1.12.6, but I updated based on the suggestion there may be a bug.

Is there sample code somewhere that I can review for making a correct TLS connection? Maybe I've got something wonked, but I don't see it.

from go-libvirt.

a2geek avatar a2geek commented on July 23, 2024

I tried Wireshark and was able to capture the sequence. There are 3 packets, the initiating packet, this 0x01 packet and then the expected packet. Uncertain what specifically to capture, but this packet comes after the initiation packet to Libvirt (includes the qemu:///system content) - so this appears to be all Libvirt and the first response after...

Frame 22: 96 bytes on wire (768 bits), 96 bytes captured (768 bits) on interface 0
Ethernet II, Src: 00:00:00_00:00:00 (00:00:00:00:00:00), Dst: 00:00:00_00:00:00 (00:00:00:00:00:00)
Internet Protocol Version 4, Src: 127.0.1.1, Dst: 127.0.0.1
Transmission Control Protocol, Src Port: 16514, Dst Port: 45632, Seq: 1830, Ack: 1959, Len: 30
Secure Sockets Layer
    TLSv1.2 Record Layer: Application Data Protocol: Application Data
        Content Type: Application Data (23)
        Version: TLS 1.2 (0x0303)
        Length: 25
        Encrypted Application Data: 000000000000000118ff54e9f96357ef1c7cda384027ba8f...

Frame:
0000   00 00 00 00 00 00 00 00 00 00 00 00 08 00 45 00
0010   00 52 5f e7 40 00 40 06 db bc 7f 00 01 01 7f 00
0020   00 01 40 82 b2 40 8a 6b c9 08 ec f8 6b 83 80 18
0030   05 5d ff 46 00 00 01 01 08 0a 62 1f cd e8 f0 63
0040   3d 02[17/03 03/00 19/00 00 00 00 00 00 00 01 18  <== type, TLS 1.2, length, enc. packet
0050   ff 54 e9 f9 63 57 ef 1c 7c da 38 40 27 ba 8f 1b] <== continuation of encrypted packet

Decrypted:
0000   01

from go-libvirt.

trapgate avatar trapgate commented on July 23, 2024

I've reproduced this with a go program that doesn't import go-libvirt. After the call to tls.Dial() connects successfully to the libvirt daemon, a call to conn.Read() with a 1-byte buffer will return 1 byte. Subsequent Read calls will block waiting for the daemon to send something.

This is either a bug in libvirt, or a bug in the go TLS library. The old version of go-libvirt masked this bug - it accidentally ignored that odd 1-byte packet.

I should have some more time to look at this tomorrow.

from go-libvirt.

a2geek avatar a2geek commented on July 23, 2024

Good to hear I'm not crazy! (Although it would be easier if I just had a bug in my code.)

Let me know if you need me to test anything or look into something.

from go-libvirt.

benlemasurier avatar benlemasurier commented on July 23, 2024

It looks like @trapgate's investigation points to an upstream bug in libvirtd/tls. I'm going to close the issue as I don't think we should handle the problem here in go-libvirt. Please feel free to continue commenting.

from go-libvirt.

weizhouapache avatar weizhouapache commented on July 23, 2024

@a2geek @benlemasurier @trapgate

Gents, I face this issue recently.

I use latest go 1.15.6 and Ubuntu 20.04.
The following works

github.com/digitalocean/go-libvirt v0.0.0-20190626172931-4d226dd6c437

Any chance to test other OSes like centos ?
if same issue happens with some other OSes, it would be nice to find a workaround in go-libvirt.
Thanks !

from go-libvirt.

trapgate avatar trapgate commented on July 23, 2024

That older version of go-libvirt does avoid this particular bug, but it suffers from crashes when responses from libvirt are fragmented into multiple packets - that's the issue fixed by #86.

I have no reason to believe that this bug is OS-dependent.

The workaround to this problem would be to drain any packets that are received by the client after the tls.Dial call completes, before calling libvirt.New() with the tls connection.

from go-libvirt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.