supermodularxyz / grants-etl Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 7.0 111.17 MB

License: MIT License

TypeScript 98.48% JavaScript 1.52%

grants-etl's People

Contributors

Stargazers

Watchers

Forkers

ghostffcode distributeddoge ccerv1 aragard reneedaos leviscoffie farque65

grants-etl's Issues

feat: add cGrants data

Ideally the same public data about projects that was displayed on grants explorer (and is now captured with allo)

Voters would only show the address, same as with allo

add metabase support for easy querying of the dataset and query reuse

https://www.metabase.com/

add metabase support for easy querying of the dataset and query reuse

Add Passport data to support analysis

Here is some additional data that I think could help enhance the ETL tool
Passport score

could attach to the user table or have a separate table linked by address
Data can be found in the Indexer or through the APIs

Passport stamp data (indexer or API)

same as above, but pulling the full set of all credentials for a particular address
available through the APIs

Staking data

Current contract has all of the data on GTC staked on the passport address (stake) and staked on others (stake users)
Being able to build a Community staking social graph would be a great output
Total (current and historical) GTC staked on and by a passport

Historical data by address (only ceramic for now 😦 )

I can tell you straight up that one of the best ways to make this tool useful is to have prepackaged queries that represent common views that data scientists / sybil hunters will use to get familiar with the platform. Stuff like:

Simple select all queries for all projects, all users in a round, all projects in a round.
Basic summary stats about each round, sample query of voters for a project.

The first type of queries allow new users to jump right into understanding the data and analyzing it without having to think much about the structure.

and the second type of queries help inform users on what can be done and get them up to speed on what has been done so they don't waste time reinventing the wheel.

Both of these are issues when getting onboarding with Gitcoin data. People spend a bunch of time understanding the schema and how it relates to the platform, then they recreate a bunch of basic statistics which, unfortunately, are already known.

experiment with julius/llm support

https://julius.ai/

Is the etl compatbile with windows

I'm getting this issue on Windows when I do docker-compose

pgadmin | postfix/postlog: starting the Postfix mail system
pgadmin | [2023-07-30 15:59:54 +0000] [1] [INFO] Starting gunicorn 20.1.0
pgadmin | [2023-07-30 15:59:54 +0000] [1] [INFO] Listening at: http://[::]:80 (1)
pgadmin | [2023-07-30 15:59:54 +0000] [1] [INFO] Using worker: gthread
pgadmin | [2023-07-30 15:59:54 +0000] [81] [INFO] Booting worker with pid: 81
pgadmin | [2023-07-30 15:59:58 +0000] [81] [ERROR] Exception in worker process
pgadmin | Traceback (most recent call last):
pgadmin | File "/venv/lib/python3.11/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
pgadmin | worker.init_process()
pgadmin | File "/venv/lib/python3.11/site-packages/gunicorn/workers/gthread.py", line 92, in init_process
pgadmin | super().init_process()
pgadmin | File "/venv/lib/python3.11/site-packages/gunicorn/workers/base.py", line 134, in init_process
pgadmin | self.load_wsgi()
pgadmin | File "/venv/lib/python3.11/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
pgadmin | self.wsgi = self.app.wsgi()
pgadmin | ^^^^^^^^^^^^^^^
pgadmin | File "/venv/lib/python3.11/site-packages/gunicorn/app/base.py", line 67, in wsgi
pgadmin | self.callable = self.load()
pgadmin | ^^^^^^^^^^^
pgadmin | File "/venv/lib/python3.11/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
pgadmin | return self.load_wsgiapp()
pgadmin | ^^^^^^^^^^^^^^^^^^^
pgadmin | File "/venv/lib/python3.11/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
pgadmin | return util.import_app(self.app_uri)
pgadmin | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pgadmin | File "/venv/lib/python3.11/site-packages/gunicorn/util.py", line 359, in import_app
pgadmin | mod = importlib.import_module(module)
pgadmin | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pgadmin | File "/usr/lib/python3.11/importlib/init.py", line 126, in import_module
pgadmin | return _bootstrap._gcd_import(name[level:], package, level)
pgadmin | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pgadmin | File "", line 1204, in _gcd_import
pgadmin | File "", line 1176, in _find_and_load
pgadmin | File "", line 1147, in _find_and_load_unlocked
pgadmin | File "", line 690, in _load_unlocked
pgadmin | File "", line 940, in exec_module
pgadmin | File "", line 241, in _call_with_frames_removed
pgadmin | File "/pgadmin4/run_pgadmin.py", line 4, in
pgadmin | from pgAdmin4 import app
pgadmin | File "/pgadmin4/pgAdmin4.py", line 104, in
pgadmin | app = create_app()
pgadmin | ^^^^^^^^^^^^
pgadmin | File "/pgadmin4/pgadmin/init.py", line 477, in create_app
pgadmin | run_migration_for_sqlite()
pgadmin | File "/pgadmin4/pgadmin/init.py", line 452, in run_migration_for_sqlite
pgadmin | os.chmod(config.SQLITE_PATH, 0o600)
pgadmin | PermissionError: [Errno 1] Operation not permitted: '/var/lib/pgadmin/pgadmin4.db'
pgadmin | [2023-07-30 15:59:58 +0000] [81] [INFO] Worker exiting (pid: 81)
pgadmin | [2023-07-30 15:59:58 +0000] [1] [INFO] Shutting down: Master
pgadmin | [2023-07-30 15:59:58 +0000] [1] [INFO] Reason: Worker failed to boot.
pgadmin exited with code 0

Compose Up failing for me Ubuntu 22.04

I am facing this issue while I use the docker compose up And I am not sure where I should put the required JDBC drivers if need be

Mismatch in results

It says chain 1 has 29 rounds, while we can count 30 in https://indexer-grants-stack.gitcoin.co/data/1/rounds/
Any reason why is that so?

Feedback: Schema

From a perspective of a person that took part in sybil-seeking hackathon, being provided a schema + db like this would have been pretty nice. On the other hand - if you are looking at a single round, a flat .csv file with votes is probably a great low-effort starting point to jump into doing analysis.

Possible additions to existing schema:

For each transaction that carries a vote it would be nice to know gas i.e. gas price + gas spent.
Likewise for each project that did apply to round on-chain I would like to see hash of transaction used to do that (+ gas fee).
Reporting blockNumbers for vote is nice, would be even nicer if there was also approximate_timestamp for plotting timeseries.

Key friction for me usability wise, is that I want to get clean, processed data from authoritative source without having to re-run the pipelines myself.

Interesting external information about each voter/grant address: POAPs, ENS name history, Snapshot votes

bug: unable to complete etl run jobs

I'm trying to index GR18 data and encountering an error when I run. It is expecting chainId to come as an int not a string.

yarn run etl --chain [chainId]

I also tried running the jobs directly by modifying index.ts (eg, const chainId = argv.chainId ?? 10) and running yarn run etl and this generated an error:

Invalid value for argument `applicationsEndTime`: number too large to fit in target type. Expected big integer String.

If I switched to chainIds 1 or 424, then I got a fetch error, eg:

details: 'fetch is not defined',
  docsPath: undefined,
  metaMessages: [
    'URL: https://rpc.publicgoods.network',
    'Request body: {"method":"eth_getLogs","params":[{"address":"0x222EA76664ED77D18d4416d2B2E77937b76f0a35","topics":["0xca792622046325e9cd4e24b490cb000ef72acea3a15284efc14ee709307a5e00","0x3532b7116e113d629a3d0a0364840f52c9d93f6b81b2ecc61b2cb228c39ee9fb"],"fromBlock":"0xf9700","toBlock":"0xf9701"}]}'
  ],
  shortMessage: 'HTTP request failed.',
  version: '[email protected]',
  body: { method: 'eth_getLogs', params: [ [Object] ] },
  headers: undefined,
  status: undefined,
  url: 'https://rpc.publicgoods.network'
}

supermodularxyz / grants-etl Goto Github PK

grants-etl's People

Contributors

Stargazers

Watchers

Forkers

grants-etl's Issues

feat: add cGrants data

add metabase support for easy querying of the dataset and query reuse

Add Passport data to support analysis

prepackaged queries

experiment with julius/llm support

Is the etl compatbile with windows

Compose Up failing for me Ubuntu 22.04

Mismatch in results

Feedback: Schema

bug: unable to complete etl run jobs

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent