Giter Club home page Giter Club logo

Comments (24)

bmalinowsky avatar bmalinowsky commented on June 12, 2024

I did not go through the log in detail, and I also don't know what exact loop you are doing (the log excerpt doesn't seem to be the first read): it looks like a L4 problem, there are immediate TL disconnects sent by calimero meaning that your Destination won't accept any responses. On your second attempt the destination is actually not trying to disconnect, and therefore, gets a positive ack by the remote endpoint (around second 18).

from calimero-core.

kgoderis avatar kgoderis commented on June 12, 2024

@bmalinowsky Do you know why calimero is sending the disconnects? And why would a Destination not accept any responses? I create my Destinations once, and do not tear them down in between sessions with the actors. Here is the actual source code

    public byte[] readDeviceMemory(IndividualAddress address, int startAddress, int bytes, boolean hex) {

        boolean success = false;
        byte[] result = null;

        while (!success) {
            try {

                logger.debug("Trying to read  {} bytes at memory location {} for {}",
                        new Object[] { bytes, startAddress, address });

                if (destinations.get(address) == null) {
                    this.createDestination(address);
                }

                result = mc.readMemory(destinations.get(address), startAddress, bytes);
                logger.debug("Reading  {} bytes at memory location {} for {} yields {} bytes",
                        new Object[] { bytes, startAddress, address, result == null ? null : result.length });
                success = true;

            } catch (KNXTimeoutException e) {
                logger.error("An exception occurred while trying to read the memory for address '{}' : {}",
                        address.toString(), e.getMessage());
            } catch (KNXRemoteException e) {
                logger.error("An exception occurred while trying to read the memory for '{}' : {}", address.toString(),
                        e.getMessage());
            } catch (KNXDisconnectException e) {
                logger.error("An exception occurred while trying to read the memory for '{}' : {}", address.toString(),
                        e.getMessage());
            } catch (KNXLinkClosedException e) {
                logger.error("An exception occurred while trying to read the memory for '{}' : {}", address.toString(),
                        e.getMessage());
            } catch (KNXException e) {
                logger.error("An exception occurred while trying to read the memory for '{}' : {}", address.toString(),
                        e.getMessage());
            } catch (InterruptedException e) {
                logger.error("An exception occurred while trying to read the memory for '{}' : {}", address.toString(),
                        e.getMessage());
                e.printStackTrace();
            }
        }
        return result;

    }

the createDestination call does merely a

    private void createDestination(IndividualAddress address) {
        if (mc != null) {
            Destination destination = mc.createDestination(address, true, false, false);
            destinations.put(address, destination);
        }
    }

Am I right that 0xFFFF authorization is not really needed in this case? I tried to observe what ETS4 itself is doing when you query information from a device (I do not have an IP bridge that supports the Bus Monitor) and I saw that it authenticates with ReadProperties and ReadMemory variably.

Karel

from calimero-core.

bmalinowsky avatar bmalinowsky commented on June 12, 2024

Do you know why calimero is sending the disconnects?

No. As I said before, the log shows that disconnects are sent. Already line 2 of your log is a disconnect. I cannot know what happened before (at least not from your log).

And why would a Destination not accept any responses?

Because reading memory is L4 p2p connection-oriented. No connection, no accept.

Side-note (ignore if there is some other code foo going on I don't know about): on link-closed and remote exceptions there is no positive future outcome in continuing reading. Either you operate on illegal state (and for that i would then catch the RTEs), or the remote endpoint doesn't want you to read. Receiving memory data with different length as requested is non-spec anyway.

from calimero-core.

bmalinowsky avatar bmalinowsky commented on June 12, 2024

Re authorization: non-authorized reads will execute on minimum access rights, i.e., as long as you receive data everything is ok. Otherwise, you'll get a KnxRemoteException. (Old devices might behave differently, though.)

(I'm not sure what 0xFFFF means, you mean the 4 byte key?)

from calimero-core.

kgoderis avatar kgoderis commented on June 12, 2024

Re authorization: non-authorized reads will execute on minimum access rights, i.e., as long as you receive data everything is ok. Otherwise, you'll get a KnxRemoteException. (Old devices might behave differently, though.)

(I'm not sure what 0xFFFF means, you mean the 4 byte key?)

0xffff is indeed the 4 byte key. I have seen any other one on my KNX network so far, so my quick assumption is that it is a defect standard key.

What I have seen however, is that after a bunch of misreads and other exceptions, the IP Gateway seems to be completely disconnected with reality, amongst other things reporting that is does not get Acks and so forth. I then have to reset the device (it is an MDT one)

from calimero-core.

kgoderis avatar kgoderis commented on June 12, 2024

Because reading memory is L4 p2p connection-oriented. No connection, no accept.

Side-note (ignore if there is some other code foo going on I don't know about): on link-closed and remote exceptions there is no positive future outcome in continuing reading. Either you operate on illegal state (and for that i would then catch the RTEs), or the remote endpoint doesn't want you to read. Receiving memory data with different length as requested is non-spec anyway.

Ok - is the right way forward then to not cache Destinations, but rather .destroy() them and rebuild them at each usage (or at least on the link-closed and remote exceptions? So far the exception handling was a copy/paste in order to be elaborate and catch them all, but it needs refinement obviously.

from calimero-core.

bmalinowsky avatar bmalinowsky commented on June 12, 2024

[...] after a bunch of misreads and other exceptions, the IP Gateway seems to be completely disconnected with reality [...]

but those should show up on L2 or in the tunneling protocol. L4 connections should not cause that; at least in the log the cEMI and tunneling stuff is fine, even though the memory read fails.

Ok - is the right way forward then to not cache Destinations [...]

You can cache the destinations for a single mgmt task, that's actually good. Accessing a remote endpoint ideally uses a single connect, send all the data (e.g., read/write memory, properties), then do a single disconnect. Idle destinations time out after 6 seconds without keep-alive, so if there is no special reason to keep them around, destroy them (also the remote endpoint will time out).

If the network link got closed, connection state is lost, and the mgmt client always gets detached and all destinations destroyed anyway.

It's not visible from your log, but I think you got a disconnect exception, meaning that both endpoints have disconnected (or the remote endpoint is still doing so). Network and devices are quite slow, allow them a little time to reach a consistent state. Trying to reconnect while there is still a disconnect msg on its way will certainly manifest itself like that.

from calimero-core.

kgoderis avatar kgoderis commented on June 12, 2024

[...] after a bunch of misreads and other exceptions, the IP Gateway seems to be completely disconnected with reality [...]
but those should show up on L2 or in the tunneling protocol. L4 connections should not cause that; at least in the log the cEMI and tunneling stuff is fine, even though the memory read fails.

They are indeed - they are not in the logs, it is a side effect currently

from calimero-core.

bmalinowsky avatar bmalinowsky commented on June 12, 2024

Ah ok, I misread -- thought it had the same cause :)

from calimero-core.

kgoderis avatar kgoderis commented on June 12, 2024

It's not visible from your log, but I think you got a disconnect exception, meaning that both endpoints have disconnected (or the remote endpoint is still doing so). Network and devices are quite slow, allow them a little time to reach a consistent state. Trying to reconnect while there is still a disconnect msg on its way will certainly manifest itself like that.

Ah! I was just in the process of testing things, and I see regularly the following sequence of tpdus:
80 - connect. that's good
42 02 40 01 or whatever, e.g. related to memory read or alike. good as well
a bunch of 81 - disconnects, but why a bunch? sometimes 4, sometimes 10+.
c2 - no clue

In an "operation", the 81 is also sent by the L4 (?) after the method (e.g. memory read) returned its byte[]. So, when this happens, and I have a fast sequence of operations (e.g. memory reads), it seems that the 81 of the previous operation is screwing up the next operation. In the logs I see that the next operation does not start with a tpdu 80, but simply tries to start to send the apdu, retrying until it triggers a TimeOutException, and then on the next iteration, when things are cleared, a tpdu 80 is issued.

I know that the KNX bus is very very very timing sensitive. When I run ETS4 in a virtual machine on my Mac, just increasing the load of the machine may already generate troubles for ETS4 itself. I have an installation with 1 main and 4 lines, all connected with line couplers, and the IP gateway sitting on the main line. In total about 120 actors, and about 4000 GA's in use (many 12-fold actors). Just in order to be able to program a device, I have to shut down couplers in order to reduce the bus load on the main line, if not, ETS4 simply fails at the task.

from calimero-core.

kgoderis avatar kgoderis commented on June 12, 2024

then do a single disconnect

by that, I presume Destination.destroy() ? So far, I have not triggered any disconnects, the tpdu 81 seem to be coming from the L4 layer itself

from calimero-core.

bmalinowsky avatar bmalinowsky commented on June 12, 2024

if all the bunch of disconnects you mention log "send disconnect to x.y.z", (x.y.z being the same address you're talking to) then this is just the aftermath of being in disconnected state. On any received L4 data_connected it will do so. That's actually expected and not surprising.

That a sequence of quick sends does not issue a connect each time is also normal, because at that moment the destination still knows it's connected (it checks its state for that, you can also do that with .getState() ). Local L4 won't even send out a message if disconnected, but throws KNXDisconnectException.

So, the interesting question is, where the first disconnect comes from.

The disconnect that is sent after your method returned with a byte[] can locally only be caused by a timeout. What you can try to avoid sending that, is creating the destination with keep-alive set true. And see if that makes a difference. Otherwise the disconnect comes from the remote endpoint.

from calimero-core.

bmalinowsky avatar bmalinowsky commented on June 12, 2024

then do a single disconnect

by that, I presume Destination.destroy()

yes (after that, you have to create a new destination)

from calimero-core.

bmalinowsky avatar bmalinowsky commented on June 12, 2024

c2 are acks

from calimero-core.

kgoderis avatar kgoderis commented on June 12, 2024

On any received L4 data_connected it will do so. That's actually expected and not surprising.

So, just to for my understanding, in L4, each time a piece of data is sent to the remote actor, it will by default disconnect? e.g. it is my side that is the trigger of the first 81 being send.

Does that also effectively means that when doing multiple operations, you have to first wait until the Destination is finally in its disconnected state, before starting a new operation altogether?

It is a bit strange that after a "TL 192.168.0.10:3671: disconnected from 1.1.11", there are still disconnects being issued from the TL

I am currently delving into the calimero code base itself in to understand all you write, but I sense the easiest solution, despite the overhead, is not to cache destinations, and on each operation create a new destination, and then destroy it to clean up things

[I just tried my code with the keep alive and it does not change the outcome]

from calimero-core.

bmalinowsky avatar bmalinowsky commented on June 12, 2024

On any received L4 data_connected it will do so. That's actually expected and not surprising.

So, just to for my understanding, in L4, each time a piece of data is sent to the remote actor, it will by default disconnect? e.g. it is my side that is the trigger of the first 81 being send.

no. if, and only if, a L4 connection was established, and for some reason, a disconnect happened (which is allowed to happen, btw), the L4 state machine will enforce that disconnect. Meaning, all subsequent received connected_data will be answered with a disconnect.

It is a bit strange that after a "TL 192.168.0.10:3671: disconnected from 1.1.11", there are still disconnects being issued from the TL

no, that's expected to happen. it would be wrong otherwise.

[I just tried my code with the keep alive and it does not change the outcome]

There are only so many ways a disconnect is triggered, being 1) failure to send data, 2) timeout, 3) reception from remote endpoint 4) receiving ack/nack/data while disconnected (we can ignore that). keep-alive eliminates the timeout (easy to verify), leaving send data or remote. in such case, send data always terminates with disconnect exception, also easy to verify.

[...] on each operation create a new destination, and then destroy it to clean up things
yes, that's the easiest way

from calimero-core.

kgoderis avatar kgoderis commented on June 12, 2024

Btw, I tried to read out PID.IO_LIST on the Device Object Index, but it fails due to insufficient access rights. Probably due to the actor, but there is no way to instruct PropertyClient to authenticate before doing a getProperty()?. My quick solution is to assume that all Object Indexes are the same on all devices (they are probably), so the Address Table sits at index 1 and so forth.

from calimero-core.

kgoderis avatar kgoderis commented on June 12, 2024

It is a bit strange that after a "TL 192.168.0.10:3671: disconnected from 1.1.11", there are still disconnects being issued from the TL
no, that's expected to happen. it would be wrong otherwise.

My mistake, I was mixing TL and the L layers. However, TL is still sending disconnects after it received a c2 ack (at least, it shows like that in the logs)

from calimero-core.

kgoderis avatar kgoderis commented on June 12, 2024

Btw, I tried to read out PID.IO_LIST on the Device Object Index, but it fails due to insufficient access rights. Probably due to the actor, but there is no way to instruct PropertyClient to authenticate before doing a getProperty()?. My quick solution is to assume that all Object Indexes are the same on all devices (they are probably), so the Address Table sits at index 1 and so forth.

Ok - found public RemotePropertyServiceAdapter(final KNXNetworkLink link,
final IndividualAddress remote, final PropertyAdapterListener l,
final byte[] authorizeKey) throws KNXException, InterruptedException
in the code

from calimero-core.

kgoderis avatar kgoderis commented on June 12, 2024

@bmalinowsky Are there any concurrency limitations on the classes and methods in Calimero? When I trie to unleash my code on two different actors in parallel, and even when I secure the calls to Calimero by making the methods of the "communications hub at my side" synchronized, after a while, for example, calls to PropertyClient.getProperty() trigger KNXDisconnectedExceptions, even as the sequence of tpdus is correct. e.g. tpdu 80 executes well, and so a connection is supposed to be established.

from calimero-core.

kgoderis avatar kgoderis commented on June 12, 2024

@bmalinowsky Just a few questions for you, as I presume you have seen many KNX actor implementations: It seems that not all Actors adhere to the KNX specs as they should. I see some devices here in my network that do not implement the IO_LIST PID at all, but yet expose Address tables. Also, I have not a single device that implements Object Index 9 (mandatory, normally) that contains the configuration of the Group Objects. Have you seen that as well in your network?

from calimero-core.

bmalinowsky avatar bmalinowsky commented on June 12, 2024

Both of that is not mandatory, think older devices.

from calimero-core.

bmalinowsky avatar bmalinowsky commented on June 12, 2024

Closing, as it derailed completely into off-topic.

from calimero-core.

kgoderis avatar kgoderis commented on June 12, 2024

@bmalinowsky Are there any concurrency limitations on the classes and methods in Calimero? When I trie to unleash my code on two different actors in parallel, and even when I secure the calls to Calimero by making the methods of the "communications hub at my side" synchronized, after a while, for example, calls to PropertyClient.getProperty() trigger KNXDisconnectedExceptions, even as the sequence of tpdus is correct. e.g. tpdu 80 executes well, and so a connection is supposed to be established.

fine, but this issue is a real one. Reverting back to "directly" access to the underlying managementclient resolves this issue. There is something ongoing in the upper (PropertyClient, ...) classes with respect to concurrency.

from calimero-core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.