Giter Club home page Giter Club logo

Comments (5)

matthadf avatar matthadf commented on June 28, 2024 1

Hi fdominik,

A couple of questions to begin with:

The document you have linked is from a very early release, are you using a version of the Intelligence Analysis Platform or i2 Analyze?

Are you deploying a DAOD connector in the standard Intelligence Portal? If so, it might be easier to implement a custom version of the IExternalDataAdapter which you can specify in the ApolloServerSettingsDaodMandatory.properties file.
For more information about the IExternalDataAdapter, see http://ibm-i2.github.io/Analyze/docs/com/i2group/apollo/externaldata/adapter/IExternalDataAdapter.html

If not, it would be great to get more context about what you are working on trying to use the SubsetExplorationService.

Thanks, Matt

from analyze.

fdominik avatar fdominik commented on June 28, 2024

Hi,
I am using i2analyze. I have already implemented whole functionality using IExternalDataAdapter. I implemented fullTextSearch, dumbbellSearch, networkSearch and search methods (in expand I throw UnsupportedOperationException, because we dont need that one).

I should probably introduce you to the context. We are implementing a DAOD connector which connects to ArangoDB, where we have milions of entities and links (it is a result of other system, which collects them). And the problem becomes when a user clicks "Browse" in Intelligence Portal. I limited the queries to ArangoDB to 1000 per entity type. but still when I return around 18k entities and links (we have 10 types of entities and 8 types of links), then it takes around 50second to process everything (25 seconds to DAOD and 30 seconds some other processing...).

So I was thinking that maybe better approach would be to return only around 50-100 entities by default with some information, there are other entities (or information of how many entities are there together). But these entities would be queried once a user clicks "Next page".

I found that this would be possible if I implement IExternalDataSubsetExplorationService from scratch. But I dont know if there is other way how to resolve the issue with a huge amount of entities in the data source.

from analyze.

TonyJon avatar TonyJon commented on June 28, 2024

Hi Dominik

We do not recommend implementing Browse for a standard DAOD connector for exactly the reasons/issues that you are trying to find a solution to. We expect that DAOD should be used for more focused and targeted searching of an external data source which in general will be more likely to return a more manageable result set.

Browse functionality can be disabled for the DAOD connector by setting ScsBrowseSupported = "false" in your data source element of the topology file and re-deploying i2 Analyze.

It is possible for you to implement your own IExternalDataSubsetExplorationService from scratch but you can not get us to recognise what you are implementing by simply adding a new entry into our ApolloServerSettingsDaodMandatory.properties file.

The other entries we have in that file only work because their use is supported by GUICE Modules that we have written in order to make the integration of externally implemented code easier for you. (GUICE is a dependency injection system that we use throughout i2 Analyze).

We do not have such automated wire up available for all of our API's.

I understand your thinking on this but if you try and write your own IExternalDataSubsetExplorationService you will have a new problem which is that your code does not know how to interpret the contents of the SubsetToken that is contained within the ExternalDataSubsetIdentifier that we pass to each method on that service.

The structure of this token is not published and we can change this at any point so the only way so safely implement IExternalDataSubsetExplorationService is to also implement IExternalDataSubsetCreationService as that is the service that is generating the token that you receive.

These complexities are hidden in general from a developer of DAOD connectors as the I2 Analyze Portal has a specific Implementation of the IExternalDataSubsetCreationService that is called when you run a Search or Browse that generates an ExternalDataSubsetIdentifier and passes it through to the corresponding method on our implementation of the IExternalDataSubsetExplorationService which in turn calls a special implementation of the IExternalDataSubsetLocator which knows how to decode the SubsetToken within the ExternalDataSubsetIdentifier and then calls the correct method on the IExternalDataAdapter API that you have implemented to actually perform operation on the external data source.

The other complexity you have is that the operations that occur after the original Browse on a DAOD connector usually all occur on the Lucene index that is created at the time that you do the original Browse on the external system as they will be passed an ExternalDataSubsetIdentifier that matches the subset that is already cached.

Your implementation would have to do this differently and always go back out to the external data source. This would create a separate Lucene index and data cache whenever you clicked something in the Browse tree view and would not be ideal.

Another issue you would face is that, other than when using Show Context, our system is set up to either return data to the Portal or an error message not both. If you were to provide a very restricted set of results to the user you would not also be able to tell them that there were more available that did not get returned without significant effort. They could however be taught that just seeing ten items in the original Browse was not all that they could get and that more would be available when they moved to a specific type view but the Standard Portal UI will not tell them that.

All in all, I think that even though it might be possible to do what you are suggesting, this is not the best route forward here and you would be better to explain how DAOD is intended to be used to the client and disable Browse.

Cheers

from analyze.

fdominik avatar fdominik commented on June 28, 2024

Hi Tony, thank you for a detailed explanation. So we will probably implement a configuration property, where will be max. results for a Browse element. This setting can be configured at each customer differently (based on the HW provided and maximum time the analyst is willing to wait (e.g. 5s, 10s, 30s...)).
When we reach the maximum number of elements, we will just throw MaximumResultsSizeExceededRuntimeException.
Here I am not sure, that it is supported to throw the exception, I need to study the SDK for this.

Another issue we are facing is very similar but now with Visual Search. One of our customer has around 70-150 millions of records in data source. If an analysts does improper visual query, he can result with around 45000 records or even more. I expect, that paging the results is again not supported as in Browse functionality. And that we should throws MaximumResults Exception again...?

Thanks a lot

from analyze.

TonyJon avatar TonyJon commented on June 28, 2024

Hi Dominik

We do have paging on all of our returned results (other than getContext and getLatestItems) between the i2A Server and the Client (Portal) but, as you have seen, that only happens after the results have been initially sent via the DAOD connector to the Lucene Index. You are correct in assuming that this first part is not paged and therefore there will be an initial delay before results start to appear in the client.

If you want to raise an error so that the client can see some text that lets them know that too many results have been requested then you would do this by using the out of band error handling mechanism that we have using our ExternalDataSourceRuntimeException.

You can see the JavaDoc for this on GitHub here
http://ibm-i2.github.io/Analyze/docs/com/i2group/apollo/externaldata/ExternalDataSourceRuntimeException.html

There several constructors for this, for example you can just add your message in directly or you can catch and include another error as the throwable cause and so on.

If you throw our ExternalDataSourceRuntimeException, it will get passed correctly through to the portal and the user can then see your message when you click on the error dialogs "More info" button.

Cheers

from analyze.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.