Giter Club home page Giter Club logo

grants-etl's People

Contributors

ccerv1 avatar davidgasquez avatar distributeddoge avatar ghostffcode avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

grants-etl's Issues

feat: add cGrants data

Ideally the same public data about projects that was displayed on grants explorer (and is now captured with allo)

Voters would only show the address, same as with allo

Add Passport data to support analysis

Here is some additional data that I think could help enhance the ETL tool
Passport score

  • could attach to the user table or have a separate table linked by address
  • Data can be found in the Indexer or through the APIs

Passport stamp data (indexer or API)

  • same as above, but pulling the full set of all credentials for a particular address
  • available through the APIs

Staking data

  • Current contract has all of the data on GTC staked on the passport address (stake) and staked on others (stake users)
  • Being able to build a Community staking social graph would be a great output
  • Total (current and historical) GTC staked on and by a passport

Historical data by address (only ceramic for now ๐Ÿ˜ฆ )

prepackaged queries

from omni:

I can tell you straight up that one of the best ways to make this tool useful is to have prepackaged queries that represent common views that data scientists / sybil hunters will use to get familiar with the platform. Stuff like:

  • Simple select all queries for all projects, all users in a round, all projects in a round.
  • Basic summary stats about each round, sample query of voters for a project.

The first type of queries allow new users to jump right into understanding the data and analyzing it without having to think much about the structure.

and the second type of queries help inform users on what can be done and get them up to speed on what has been done so they don't waste time reinventing the wheel.

Both of these are issues when getting onboarding with Gitcoin data. People spend a bunch of time understanding the schema and how it relates to the platform, then they recreate a bunch of basic statistics which, unfortunately, are already known.

Is the etl compatbile with windows

I'm getting this issue on Windows when I do docker-compose

pgadmin | postfix/postlog: starting the Postfix mail system
pgadmin | [2023-07-30 15:59:54 +0000] [1] [INFO] Starting gunicorn 20.1.0
pgadmin | [2023-07-30 15:59:54 +0000] [1] [INFO] Listening at: http://[::]:80 (1)
pgadmin | [2023-07-30 15:59:54 +0000] [1] [INFO] Using worker: gthread
pgadmin | [2023-07-30 15:59:54 +0000] [81] [INFO] Booting worker with pid: 81
pgadmin | [2023-07-30 15:59:58 +0000] [81] [ERROR] Exception in worker process
pgadmin | Traceback (most recent call last):
pgadmin | File "/venv/lib/python3.11/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
pgadmin | worker.init_process()
pgadmin | File "/venv/lib/python3.11/site-packages/gunicorn/workers/gthread.py", line 92, in init_process
pgadmin | super().init_process()
pgadmin | File "/venv/lib/python3.11/site-packages/gunicorn/workers/base.py", line 134, in init_process
pgadmin | self.load_wsgi()
pgadmin | File "/venv/lib/python3.11/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
pgadmin | self.wsgi = self.app.wsgi()
pgadmin | ^^^^^^^^^^^^^^^
pgadmin | File "/venv/lib/python3.11/site-packages/gunicorn/app/base.py", line 67, in wsgi
pgadmin | self.callable = self.load()
pgadmin | ^^^^^^^^^^^
pgadmin | File "/venv/lib/python3.11/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
pgadmin | return self.load_wsgiapp()
pgadmin | ^^^^^^^^^^^^^^^^^^^
pgadmin | File "/venv/lib/python3.11/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
pgadmin | return util.import_app(self.app_uri)
pgadmin | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pgadmin | File "/venv/lib/python3.11/site-packages/gunicorn/util.py", line 359, in import_app
pgadmin | mod = importlib.import_module(module)
pgadmin | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pgadmin | File "/usr/lib/python3.11/importlib/init.py", line 126, in import_module
pgadmin | return _bootstrap._gcd_import(name[level:], package, level)
pgadmin | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pgadmin | File "", line 1204, in _gcd_import
pgadmin | File "", line 1176, in _find_and_load
pgadmin | File "", line 1147, in _find_and_load_unlocked
pgadmin | File "", line 690, in _load_unlocked
pgadmin | File "", line 940, in exec_module
pgadmin | File "", line 241, in _call_with_frames_removed
pgadmin | File "/pgadmin4/run_pgadmin.py", line 4, in
pgadmin | from pgAdmin4 import app
pgadmin | File "/pgadmin4/pgAdmin4.py", line 104, in
pgadmin | app = create_app()
pgadmin | ^^^^^^^^^^^^
pgadmin | File "/pgadmin4/pgadmin/init.py", line 477, in create_app
pgadmin | run_migration_for_sqlite()
pgadmin | File "/pgadmin4/pgadmin/init.py", line 452, in run_migration_for_sqlite
pgadmin | os.chmod(config.SQLITE_PATH, 0o600)
pgadmin | PermissionError: [Errno 1] Operation not permitted: '/var/lib/pgadmin/pgadmin4.db'
pgadmin | [2023-07-30 15:59:58 +0000] [81] [INFO] Worker exiting (pid: 81)
pgadmin | [2023-07-30 15:59:58 +0000] [1] [INFO] Shutting down: Master
pgadmin | [2023-07-30 15:59:58 +0000] [1] [INFO] Reason: Worker failed to boot.
pgadmin exited with code 0

Mismatch in results

image

It says chain 1 has 29 rounds, while we can count 30 in https://indexer-grants-stack.gitcoin.co/data/1/rounds/
Any reason why is that so?

Feedback: Schema

From a perspective of a person that took part in sybil-seeking hackathon, being provided a schema + db like this would have been pretty nice. On the other hand - if you are looking at a single round, a flat .csv file with votes is probably a great low-effort starting point to jump into doing analysis.

Possible additions to existing schema:

  • For each transaction that carries a vote it would be nice to know gas i.e. gas price + gas spent.
  • Likewise for each project that did apply to round on-chain I would like to see hash of transaction used to do that (+ gas fee).
  • Reporting blockNumbers for vote is nice, would be even nicer if there was also approximate_timestamp for plotting timeseries.

Key friction for me usability wise, is that I want to get clean, processed data from authoritative source without having to re-run the pipelines myself.

Interesting external information about each voter/grant address: POAPs, ENS name history, Snapshot votes

bug: unable to complete etl run jobs

I'm trying to index GR18 data and encountering an error when I run. It is expecting chainId to come as an int not a string.

yarn run etl --chain [chainId]

I also tried running the jobs directly by modifying index.ts (eg, const chainId = argv.chainId ?? 10) and running yarn run etl and this generated an error:

Invalid value for argument `applicationsEndTime`: number too large to fit in target type. Expected big integer String.

If I switched to chainIds 1 or 424, then I got a fetch error, eg:

details: 'fetch is not defined',
  docsPath: undefined,
  metaMessages: [
    'URL: https://rpc.publicgoods.network',
    'Request body: {"method":"eth_getLogs","params":[{"address":"0x222EA76664ED77D18d4416d2B2E77937b76f0a35","topics":["0xca792622046325e9cd4e24b490cb000ef72acea3a15284efc14ee709307a5e00","0x3532b7116e113d629a3d0a0364840f52c9d93f6b81b2ecc61b2cb228c39ee9fb"],"fromBlock":"0xf9700","toBlock":"0xf9701"}]}'
  ],
  shortMessage: 'HTTP request failed.',
  version: '[email protected]',
  body: { method: 'eth_getLogs', params: [ [Object] ] },
  headers: undefined,
  status: undefined,
  url: 'https://rpc.publicgoods.network'
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.