Giter Club home page Giter Club logo

graphsense-python's Introduction

Graphsense Website

This is based on https://github.com/nicolas-van/bootstrap-4-github-pages.

A Bootstrap 4 project for Github Pages and Jekyll.

  • A full Bootstrap 4 theme usable both on Github Pages and with a standalone Jekyll.
  • Recompiles Bootstrap from SCSS files, which allows to customize Bootstrap's variables and use Bootstrap themes.
  • Full support of Bootstrap's JavaScript plugins.
  • Supports all features of Github Pages and Jekyll.

See the website for demonstration and documentation.

Development

Having Docker installed run make watch and point your browser to localhost:4000.

Statistics

Run make REST_ENDPOINT=http://example.com stats to fetch the latest Graphsense statistics and commit them.

graphsense-python's People

Contributors

matteoromiti avatar myrho avatar nkoorty avatar soad003 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

graphsense-python's Issues

list_tags raises ApiTypeError

There seems to be an issue with list_tags:

with graphsense.ApiClient(configuration) as api_client:
    api_instance = tags_api.TagsApi(api_client) 
    try:
        # Returns address and entity tags associated with a given label
        api_response = api_instance.list_tags(label="sextortion", currency="btc")
        print(api_response)
    except graphsense.ApiException as e:
        print("Exception when calling TagsApi->list_tags: %s\n" % e)

ApiTypeError: Invalid type for variable 'received_data'. Required value type is list and passed type was dict at ['received_data']

Add bulk interface example to /examples

Add example code showcasing use of bulk interface for some common tasks, e.g., retrieving all entities for a list of addresses (represented as df column)

Request line too long in list_addresses

I received the following error when requesting more than 100 addresses in a single request:

Exception when calling AddressesApi: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Connection': 'close', 'Content-Type': 'text/html', 'Content-Length': '163'})
HTTP response body: <html>
  <head>
    <title>Bad Request</title>
  </head>
  <body>
    <h1><p>Bad Request</p></h1>
    Request Line is too large (4117 &gt; 4094)
  </body>
</html>

If there is a limit, the API should inform the user about the limit. Is there a code snippet that shows how to retrieve a large number of address objects, possibly with several list_addresses calls?

Update setup.py

NAME = "graphsense-python"
VERSION = "0.5.1.dev" -> "dev" remove when released / merged into master

setup(
    name=NAME,
    version=VERSION,
    description="GraphSense Python API",
    author="GraphSense Core Dev Team",
    author_email="[email protected]",
    url="https://github.com/graphsense/graphsense-python",
    keywords=["cryptoassets", "graphsense", "openAPI"],
    python_requires=">=3.6",
    install_requires=REQUIRES,
    packages=find_packages(exclude=["test", "tests"]),
    include_package_data=True,
    long_description="""\
    A Python client for the GraphSense REST interface.
    """
)
```

gs-frontend-lib: Create a utilities lib around gs-python

The gs-python repository is a simple wrapper around the gs-rest interface. This is often not very convenient to use and results in a lot of boilerplate code in notebooks etc.

Shipping certain utility functions with a well defined interface with gs-python could vastly improve the workflow for notebook users or other people that want to integrate gs in their tools.

It is also a great way to ship cool new features to users.

Avoid conversion of ints to floats

Since the response type of the bulk is open json, the generated python client can only guess on deserialization and always chooses float as target type rather than int.

Cleanup stats endpoint

Currently the endpoint delivery lots of legacy information and requires cleanup. I propose the following structure:

{
'version': 'r.0.5.1'
'request_timestamp': '2021-11-09 13:02:20'
'ledgers': [
    {'name': 'btc',
     'no_address_relations: ...
    ... },
    {...}

],

}


Discrepancies in the logic for querying on Attribution Tags and possible bug in "list_tags_by_address"

The AddressesApi still mentions the option to return tags in the query result.
It says "Get an address, optionally with tags # noqa: E501"
However, the 'include_tags = True' Keyword Argument has disappeared.

Instead, it seems we are meant to use the "list_tags_by_address" but for me this throws an exception (See 3 below)

However,

  1. this option to return tags is still included in the API 'list_address_neighbours' via the Keyword Argument 'include_labels = True'. I have not yet tested if this returns a "best_address_tag" or an exhaustive list of tags??
  2. the API 'get_address_entity' returns one (and only one) tag (this isn't optional, it always does). The result is in "best_address_tag". Which raises another question: from a judicial investigation perspective, we would be better off with a list of tags and tags_sources so we may conduct our own confirmations and cross-checks on the different sources.
  3. for me the API 'list_tags_by_address' unfortunately isn't functional and throws an exception due to a missing positional argument 'active' in "model_utils.py", line 1753, in get_allof_instances.
    Error:
    Invalid inputs given to generate an instance of 'Tag'. The input data was invalid for the allOf schema 'Tag' in the composed schema 'AddressTag'. Error=init() missing 1 required positional argument: 'active'
    Traceback (most recent call last):
    File "/Users/xxxx/GraphSense/graphsense/model_utils.py", line 1750, in get_allof_instances
    allof_instance = allof_class(**model_args, **constant_args)

list_entity_neighbors raises Api Type Error

ApiTypeError: Invalid type for variable 'id'. Required value type is str and passed type was int at ['received_data']['neighbors'][0]['id']

Example

with graphsense.ApiClient(configuration) as api_client:
    api_instance = entities_api.EntitiesApi(api_client)
    try:
        # Retrieve the entity object (including tags)
        entity_neighbors_obj = api_instance.list_entity_neighbors('btc', 203719034, direction='in')
        # pprint(entity_neighbors_obj)
    except graphsense.ApiException as e:
        print("Exception when calling EntitiesApi: %s\n" % e)

timeout/protocol error on large entities

This issue happens with the python client, not with curl:

Perhaps useful to know that I am encountering another issue with GraphSense that has been really difficult to work around so far.

I am trying to pull BTC transactions for specific entities through the bulk API. However, certain clusters have so many incoming/outgoing transactions (150k+) that GraphSense times out or gives the following error: ProtocolError: ('Connection broken: IncompleteRead(240923 bytes read)', IncompleteRead(240923 bytes read)).

Even when I limit my query to 1 entity for these bigger entities, it times out. Hence I don't know how to further optimise/decrease my query to have the output I like.

I am encountering the issue with the following entity IDs:
471682996, 106872794, 431705537

Client generation via Docker fails

URL=https://github.com/graphsense/graphsense-openapi/blob/develop/graphsense.yaml
โžœ graphsense-python git:(develop) โœ— docker run --rm
-v "${PWD}:/build"
openapitools/openapi-generator-cli:v5.1.1
generate -i "$URL"
-g python
-o /build
[main] WARN i.s.parser.util.DeserializationUtils - Error snake-parsing yaml content
io.swagger.parser.util.DeserializationUtils$SnakeException: Exception safe-checking yaml content (maxDepth 2000, maxYamlAliasesForCollections 2147483647)

Transparent handling of address chunks

At the moment, list_addresses supports retrieval of up to 1000 (?) addresses per request, which means larger set must be divided in chunks on the client side before. See example below.

Transparent handling of chunks would certainly improve API usability.

# Yield successive n-sized
# chunks from l.
def divide_chunks(l, n):
      
    # looping till length l
    for i in range(0, len(l), n): 
        yield l[i:i + n]

address_chunks = list(divide_chunks(df['address'].values.tolist(), 1000))

with graphsense.ApiClient(configuration) as api_client:
    api_instance = addresses_api.AddressesApi(api_client)
    try:
        # Iterate over all chunks
        for chunk in address_chunks:
            # Retrieve all addresses in batch
            addresses = api_instance.list_addresses(CURRENCY, ids=chunk)
            for a in addresses.addresses:
                address_details[a.address] = a
            print("Finished chunk")
    except graphsense.ApiException as e:
        print("Exception when calling AddressesApi: %s\n" % e)


Request of ETH addresses not working

The following code snippet works fine but UTXO ledgers, but not for Ethereum:

with graphsense.ApiClient(configuration) as api_client:
    api = addresses_api.AddressesApi(api_client)
    try:
        e = api.get_address_entity('eth', '0xFA8E3920daF271daB92Be9B87d9998DDd94FEF08')
        print(e)
    except ApiException as e:
        #pass
        print(e)

Support for pandas dataframes

The API currently returns data as arrays of JSON objects. On the client side people often work with pandas dataframes and must convert these arrays. Since this is repetitive, it would be great if the API could offer retrieved data optionally also flattened dataframe. Here is. the code I am using for the conversion:

lst = []
cols = ['address', 'total_received', 'balance', 'first_tx', 'last_tx', 'btc_senders', 'btc_recipients']
for a in address_details.values():
    lst.append([a['address'],
                a['total_received']['eur'],
                a['balance']['eur'],
                datetime.utcfromtimestamp(a['first_tx']['timestamp']).strftime('%Y-%m-%d %H:%M:%S'),
                datetime.utcfromtimestamp(a['last_tx']['timestamp']).strftime('%Y-%m-%d %H:%M:%S'),
                a['in_degree'],
                a['out_degree']
               ],    
              )

df1 = pd.DataFrame(lst, columns=cols)

Bulk request address entities

At the moment one can use

addresses_api.get_address_entity to map an address to an entity.

Having a bulk interface for that would be very useful. Maybe

/{currency}/addresses/entities?id=[...]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.