Giter Club home page Giter Club logo

paradigm-data-portal's People

Contributors

sslivkoff avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

paradigm-data-portal's Issues

pdp download quits unexpectedly

$ pdp download ethereum_contracts  
Downloading dataset: ethereum_contracts
downloading 2 files

using output_dir /Users/sotashi/Downloads/pdp

downloading https://datasets.paradigm.xyz/datasets/ethereum_contracts/README.md
2

The process exits immediately with this "2" written to stdout, with exit code 1.
Tried with other datasets and got the same result.

Data Column Mismatch

Hey - Thanks for open sourcing the dataset.

Noticed a small issue.

from_address and to_address field is transposed.

I'm just using ethereum_native_transfers__v1_0_0__16600000_to_16799999.parquet I've not checked others.

Example Tx: 0xb483dd868584c4d26eb5541d531c2bba92555cc9f55b756263bffac4dc985556
Moves 0.123 ETH as a bribe to the miner.
From the MEV bot (0xf8b721bff6bf7095a0e10791ce8f998baa254fd0) to the coinbase (0x5f927395213ee6b95de97bddcb1b2b1c0f16844f).

Parsing this from the the Paraquet files I get addresses transposed.

image

To show it's not my incorrect parsing, here is the DataColumn direct from Paraquet:
image

Not super urgent as I'm sure people who've noticed have just changed their parsers like myself, but worth giving the feedback.

Wrong links on https://data.paradigm.xyz/about

I believe I found two instances where the link goes to the wrong page on the about page.

Under These files are too big. How am I supposed to use them?

There are many options for processing large amounts of parquet data. In many cases, you don’t even need a database or a large amount of memory. See below for an overview of parquet, or see example dataset usage in [this notebook](https://github.com/paradigmxyz/paradigm-data-portal/blob/main/notebooks/ethereum_contracts.ipynb).

This link 404s (see below). I believe it should link to [this notebook](https://github.com/paradigmxyz/paradigm-data-portal/blob/main/notebooks/explore_ethereum_contracts.ipynb) instead.

Screenshot 2023-05-02 at 6 03 52 PM

Under About Parquet

You can run efficient queries against parquet files without any database and without needing to fit the files in memory. See [this notebook](https://parquet.apache.org/) for example usage.

I believe this should link to the same notebook as above: [this notebook](https://github.com/paradigmxyz/paradigm-data-portal/blob/main/notebooks/explore_ethereum_contracts.ipynb)

small batches with no results make parquets not loadable via glob

how to replicate

pdp collect ethereum_contracts -b 16800000:16800100:10 -s -f parquet
import polars as pl
df = pl.scan_parquet('*.parquet')
# ComputeError: error while reading ethereum_contracts__v1_1_0__16800030_to_16800039.parquet: External format error: File out of specification: Repetition level must be defined for a primitive type

df = pl.scan_parquet('ethereum_contracts__v1_1_0__16800030_to_16800039.parquet')
# ArrowErrorException: ExternalFormat("File out of specification: Repetition level must be defined for a primitive type")

it appears that when a batch doesn't produce results, no schema is written to a parquet file, which makes it impossible to load it via glob.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.