graphsense / graphsense-python Goto Github PK

View Code? Open in Web Editor NEW

18.0 18.0 3.0 2.01 MB

A Python client for the GraphSense REST interface.

License: MIT License

Python 93.26% Shell 0.16% Makefile 0.03% Mustache 6.55%

graphsense

graphsense-python's Introduction

Graphsense Website

This is based on https://github.com/nicolas-van/bootstrap-4-github-pages.

A Bootstrap 4 project for Github Pages and Jekyll.

A full Bootstrap 4 theme usable both on Github Pages and with a standalone Jekyll.
Recompiles Bootstrap from SCSS files, which allows to customize Bootstrap's variables and use Bootstrap themes.
Full support of Bootstrap's JavaScript plugins.
Supports all features of Github Pages and Jekyll.

See the website for demonstration and documentation.

Development

Having Docker installed run make watch and point your browser to localhost:4000.

Statistics

Run make REST_ENDPOINT=http://example.com stats to fetch the latest Graphsense statistics and commit them.

graphsense-python's People

Contributors

Stargazers

Watchers

Forkers

arberx dav009 ethicalsecurity-agency

graphsense-python's Issues

Add demo notebooks demonstrating Python interface

list_tags raises ApiTypeError

There seems to be an issue with list_tags:

with graphsense.ApiClient(configuration) as api_client:
    api_instance = tags_api.TagsApi(api_client) 
    try:
        # Returns address and entity tags associated with a given label
        api_response = api_instance.list_tags(label="sextortion", currency="btc")
        print(api_response)
    except graphsense.ApiException as e:
        print("Exception when calling TagsApi->list_tags: %s\n" % e)

ApiTypeError: Invalid type for variable 'received_data'. Required value type is list and passed type was dict at ['received_data']

Add bulk interface example to /examples

Add example code showcasing use of bulk interface for some common tasks, e.g., retrieving all entities for a list of addresses (represented as df column)

Request line too long in list_addresses

I received the following error when requesting more than 100 addresses in a single request:

Exception when calling AddressesApi: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Connection': 'close', 'Content-Type': 'text/html', 'Content-Length': '163'})
HTTP response body: <html>
  <head>
    <title>Bad Request</title>
  </head>
  <body>
    <h1><p>Bad Request</p></h1>
    Request Line is too large (4117 &gt; 4094)
  </body>
</html>

If there is a limit, the API should inform the user about the limit. Is there a code snippet that shows how to retrieve a large number of address objects, possibly with several list_addresses calls?

Update setup.py

NAME = "graphsense-python"
VERSION = "0.5.1.dev" -> "dev" remove when released / merged into master

setup(
    name=NAME,
    version=VERSION,
    description="GraphSense Python API",
    author="GraphSense Core Dev Team",
    author_email="[email protected]",
    url="https://github.com/graphsense/graphsense-python",
    keywords=["cryptoassets", "graphsense", "openAPI"],
    python_requires=">=3.6",
    install_requires=REQUIRES,
    packages=find_packages(exclude=["test", "tests"]),
    include_package_data=True,
    long_description="""\
    A Python client for the GraphSense REST interface.
    """
)
```

Change from travis to github actions

...for continuous integration tests.

Fix contact mail address in setup.py

It's currently a default one of the openapi generator.

gs-frontend-lib: Create a utilities lib around gs-python

The gs-python repository is a simple wrapper around the gs-rest interface. This is often not very convenient to use and results in a lot of boilerplate code in notebooks etc.

Shipping certain utility functions with a well defined interface with gs-python could vastly improve the workflow for notebook users or other people that want to integrate gs in their tools.

It is also a great way to ship cool new features to users.

Avoid conversion of ints to floats

Since the response type of the bulk is open json, the generated python client can only guess on deserialization and always chooses float as target type rather than int.

Move code to regenerate the python client into this repo

The code to reproduce the client classes based on the openapi spec is currently maintained in an internal (private) repo (https://github.com/iknaio/kong/) which makes development cumbersome.

Cleanup stats endpoint

Currently the endpoint delivery lots of legacy information and requires cleanup. I propose the following structure:

{
'version': 'r.0.5.1'
'request_timestamp': '2021-11-09 13:02:20'
'ledgers': [
    {'name': 'btc',
     'no_address_relations: ...
    ... },
    {...}

],

}

Discrepancies in the logic for querying on Attribution Tags and possible bug in "list_tags_by_address"

The AddressesApi still mentions the option to return tags in the query result.
It says "Get an address, optionally with tags # noqa: E501"
However, the 'include_tags = True' Keyword Argument has disappeared.

Instead, it seems we are meant to use the "list_tags_by_address" but for me this throws an exception (See 3 below)

However,

this option to return tags is still included in the API 'list_address_neighbours' via the Keyword Argument 'include_labels = True'. I have not yet tested if this returns a "best_address_tag" or an exhaustive list of tags??
the API 'get_address_entity' returns one (and only one) tag (this isn't optional, it always does). The result is in "best_address_tag". Which raises another question: from a judicial investigation perspective, we would be better off with a list of tags and tags_sources so we may conduct our own confirmations and cross-checks on the different sources.
for me the API 'list_tags_by_address' unfortunately isn't functional and throws an exception due to a missing positional argument 'active' in "model_utils.py", line 1753, in get_allof_instances.
Error:
Invalid inputs given to generate an instance of 'Tag'. The input data was invalid for the allOf schema 'Tag' in the composed schema 'AddressTag'. Error=init() missing 1 required positional argument: 'active'
Traceback (most recent call last):
File "/Users/xxxx/GraphSense/graphsense/model_utils.py", line 1750, in get_allof_instances
allof_instance = allof_class(**model_args, **constant_args)

Be more verbose on bulk retrieval

Add another parameter to the bulk method verbose which displays an inline counter of received lines.

list_entity_neighbors raises Api Type Error

ApiTypeError: Invalid type for variable 'id'. Required value type is str and passed type was int at ['received_data']['neighbors'][0]['id']

Example

with graphsense.ApiClient(configuration) as api_client:
    api_instance = entities_api.EntitiesApi(api_client)
    try:
        # Retrieve the entity object (including tags)
        entity_neighbors_obj = api_instance.list_entity_neighbors('btc', 203719034, direction='in')
        # pprint(entity_neighbors_obj)
    except graphsense.ApiException as e:
        print("Exception when calling EntitiesApi: %s\n" % e)

Bulk: Add `num_pages` documentation

timeout/protocol error on large entities

This issue happens with the python client, not with curl:

Perhaps useful to know that I am encountering another issue with GraphSense that has been really difficult to work around so far.

I am trying to pull BTC transactions for specific entities through the bulk API. However, certain clusters have so many incoming/outgoing transactions (150k+) that GraphSense times out or gives the following error: ProtocolError: ('Connection broken: IncompleteRead(240923 bytes read)', IncompleteRead(240923 bytes read)).

Even when I limit my query to 1 entity for these bigger entities, it times out. Hence I don't know how to further optimise/decrease my query to have the output I like.

I am encountering the issue with the following entity IDs:
471682996, 106872794, 431705537

Feature Request: All Tx from a specific date

Hi!

I would like to have a feature, where I can display all transaction from a specific day. Is there any chance to get it?

Client generation via Docker fails

URL=https://github.com/graphsense/graphsense-openapi/blob/develop/graphsense.yaml
➜ graphsense-python git:(develop) ✗ docker run --rm
-v "${PWD}:/build"
openapitools/openapi-generator-cli:v5.1.1
generate -i "$URL"
-g python
-o /build
[main] WARN i.s.parser.util.DeserializationUtils - Error snake-parsing yaml content
io.swagger.parser.util.DeserializationUtils$SnakeException: Exception safe-checking yaml content (maxDepth 2000, maxYamlAliasesForCollections 2147483647)

Transparent handling of address chunks

At the moment, list_addresses supports retrieval of up to 1000 (?) addresses per request, which means larger set must be divided in chunks on the client side before. See example below.

Transparent handling of chunks would certainly improve API usability.

# Yield successive n-sized
# chunks from l.
def divide_chunks(l, n):
      
    # looping till length l
    for i in range(0, len(l), n): 
        yield l[i:i + n]

address_chunks = list(divide_chunks(df['address'].values.tolist(), 1000))

with graphsense.ApiClient(configuration) as api_client:
    api_instance = addresses_api.AddressesApi(api_client)
    try:
        # Iterate over all chunks
        for chunk in address_chunks:
            # Retrieve all addresses in batch
            addresses = api_instance.list_addresses(CURRENCY, ids=chunk)
            for a in addresses.addresses:
                address_details[a.address] = a
            print("Finished chunk")
    except graphsense.ApiException as e:
        print("Exception when calling AddressesApi: %s\n" % e)

Request of ETH addresses not working

The following code snippet works fine but UTXO ledgers, but not for Ethereum:

with graphsense.ApiClient(configuration) as api_client:
    api = addresses_api.AddressesApi(api_client)
    try:
        e = api.get_address_entity('eth', '0xFA8E3920daF271daB92Be9B87d9998DDd94FEF08')
        print(e)
    except ApiException as e:
        #pass
        print(e)

Support for pandas dataframes

The API currently returns data as arrays of JSON objects. On the client side people often work with pandas dataframes and must convert these arrays. Since this is repetitive, it would be great if the API could offer retrieved data optionally also flattened dataframe. Here is. the code I am using for the conversion:

lst = []
cols = ['address', 'total_received', 'balance', 'first_tx', 'last_tx', 'btc_senders', 'btc_recipients']
for a in address_details.values():
    lst.append([a['address'],
                a['total_received']['eur'],
                a['balance']['eur'],
                datetime.utcfromtimestamp(a['first_tx']['timestamp']).strftime('%Y-%m-%d %H:%M:%S'),
                datetime.utcfromtimestamp(a['last_tx']['timestamp']).strftime('%Y-%m-%d %H:%M:%S'),
                a['in_degree'],
                a['out_degree']
               ],    
              )

df1 = pd.DataFrame(lst, columns=cols)

Bulk request address entities

At the moment one can use

addresses_api.get_address_entity to map an address to an entity.

Having a bulk interface for that would be very useful. Maybe

/{currency}/addresses/entities?id=[...]