paradigmxyz / paradigm-data-portal Goto Github PK
View Code? Open in Web Editor NEWa collection of open source crypto datasets for researchers and tool builders
Home Page: https://data.paradigm.xyz/
License: Apache License 2.0
a collection of open source crypto datasets for researchers and tool builders
Home Page: https://data.paradigm.xyz/
License: Apache License 2.0
$ pdp download ethereum_contracts
Downloading dataset: ethereum_contracts
downloading 2 files
using output_dir /Users/sotashi/Downloads/pdp
downloading https://datasets.paradigm.xyz/datasets/ethereum_contracts/README.md
2
The process exits immediately with this "2" written to stdout, with exit code 1.
Tried with other datasets and got the same result.
probably need to add requests
as a dep in pyproject.toml
Hey - Thanks for open sourcing the dataset.
Noticed a small issue.
from_address and to_address field is transposed.
I'm just using ethereum_native_transfers__v1_0_0__16600000_to_16799999.parquet I've not checked others.
Example Tx: 0xb483dd868584c4d26eb5541d531c2bba92555cc9f55b756263bffac4dc985556
Moves 0.123 ETH as a bribe to the miner.
From the MEV bot (0xf8b721bff6bf7095a0e10791ce8f998baa254fd0) to the coinbase (0x5f927395213ee6b95de97bddcb1b2b1c0f16844f).
Parsing this from the the Paraquet files I get addresses transposed.
To show it's not my incorrect parsing, here is the DataColumn direct from Paraquet:
Not super urgent as I'm sure people who've noticed have just changed their parsers like myself, but worth giving the feedback.
I believe I found two instances where the link goes to the wrong page on the about
page.
Under These files are too big. How am I supposed to use them?
There are many options for processing large amounts of parquet data. In many cases, you don’t even need a database or a large amount of memory. See below for an overview of parquet, or see example dataset usage in [this notebook](https://github.com/paradigmxyz/paradigm-data-portal/blob/main/notebooks/ethereum_contracts.ipynb).
This link 404s (see below). I believe it should link to [this notebook](https://github.com/paradigmxyz/paradigm-data-portal/blob/main/notebooks/explore_ethereum_contracts.ipynb)
instead.
Under About Parquet
You can run efficient queries against parquet files without any database and without needing to fit the files in memory. See [this notebook](https://parquet.apache.org/) for example usage.
I believe this should link to the same notebook as above: [this notebook](https://github.com/paradigmxyz/paradigm-data-portal/blob/main/notebooks/explore_ethereum_contracts.ipynb)
how to replicate
pdp collect ethereum_contracts -b 16800000:16800100:10 -s -f parquet
import polars as pl
df = pl.scan_parquet('*.parquet')
# ComputeError: error while reading ethereum_contracts__v1_1_0__16800030_to_16800039.parquet: External format error: File out of specification: Repetition level must be defined for a primitive type
df = pl.scan_parquet('ethereum_contracts__v1_1_0__16800030_to_16800039.parquet')
# ArrowErrorException: ExternalFormat("File out of specification: Repetition level must be defined for a primitive type")
it appears that when a batch doesn't produce results, no schema is written to a parquet file, which makes it impossible to load it via glob.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.