Giter Club home page Giter Club logo

anaconda-package-data's Introduction

Conda Package Download Data

Creative Commons License

This repository describes the conda package download data provided by Anaconda, Inc. It includes package download counts starting from July 2017 for the following download sources:

  • Anaconda Distribution: The default channels hosted on repo.anaconda.com (and historically on repo.continuum.io)
  • Select Anaconda.org channels: Currently this includes conda-forge and bioconda.

Check out an example notebook using this data on Binder: Binder

Data Format

The download data is provided as record for every unique combination of:

  • data_source: anaconda for Anaconda distribution, conda-forge for the conda-forge channel on Anaconda.org, and bioconda for the bioconda channel on Anaconda.org.
  • time: UTC time, binned by hour
  • pkg_name: Package name (Ex: pandas)
  • pkg_version: Package version (Ex: 0.23.0)
  • pkg_platform: One of linux-32, linux-64, osx-64, win-32, win-64, linux-armv7, linux-ppcle64, linux-aarch64, or noarch
  • pkg_python: Python version required by the package, if any (Ex: 3.7)
  • counts: Number of downloads for this combination of attributs

The storage format is Parquet, one file per day, with SNAPPY compression. Files are hosted on S3, with the naming convention:

  • s3://anaconda-package-data/conda/[year]/[month]/[year]-[month]-[day].parquet

Data Catalog

To simplify using the dataset, we have also created an Intake catalog file, which you can load either directly from the repository if you have the intake, intake-parquet, and python-snappy packages installed:

import intake

cat = intake.Catalog('https://raw.githubusercontent.com/ContinuumIO/anaconda-package-data/master/catalog/anaconda_package_data.yaml')
monthly = cat.anaconda_package_data_by_month(year=2019, month=12).to_dask()

Or you can install the data package directly with conda, which will also fetch the required dependencies:

conda install -c intake anaconda-package-data

And then the data source will appear in the global catalog of your conda environment:

import intake

monthly = intake.cat.anaconda_package_data_by_month(year=2019, month=12).to_dask()

To minimize bandwidth usage, these catalogs are configured so that Intake will cache data locally to your system on first use.

Known Issues

There are some known gaps in the dataset, and Anaconda.org data doesn't appear in the data set until April 2017. See KNOWN_ISSUES.md for more details.

Updates

This data will be updated approximately monthly. Note that we may revise historical data if processing issues are discovered, or to add additional data (like new Anaconda.org channels). We will update the change log when new or revised data is posted.

License

This dataset is licensed under a Creative Commons Attribution 4.0 International License. We are offering this data to help the community understand the usage of conda packages, but with no warranty. If you use this data, please acknowledge Anaconda as the source and link back to this Github repository.

Feedback

If you have questions or find problems in the data, please open an issue on this repository. Thanks!

anaconda-package-data's People

Contributors

datapythonista avatar mariusvniekerk avatar seibert avatar sophiamyang avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.