Giter Club home page Giter Club logo

dff's Introduction

dff

The Dominick's Finer Foods data set: Promoting the use of a publicly available scanner data set in price index research and for capacity building.

The present material demonstrates how a publicly available scanner data set can be used for price index research and capacity building.

Data source

The Dominick's Finer Foods (a now-defunct Chicago-area grocery store chain) data set is a publicly available scanner dataset that is provided for academic research purposes only. It contains sales information at the store level on a weekly basis for each UPC (Universal Product Code) in a category. The data set covers more than 90 stores for almost 400 weeks from September 1989 to May 1997 and totals around 100 million observations (after cleansing) of about 18 000 UPCs (including re-launches) in 29 categories (from analgesics to toothpastes).

Overview

The documentation located in the docs/ folder introduces the data set and describes how the data can be acquired and pre-processed, followed by a presentation of the estimation of price index numbers showing the usefulness for both research and training purposes. The codes used are located in the SAS/ folder. The newly-made CSV files (see link below) should be used to run the code located in the R/ folder. Both sets of code allow generating analysis-ready data and basing calculations on the very same data, thus discounting the incomparability of different data sets.

In order to run the codes, it is necessary to download (and extract) all category-specific files, i.e. the UPC files and movement files (in SAS format for the SAS codes, in CSV format for the R code) from the website of the James M. Kilts Center at the University of Chicago Booth School of Business: https://www.chicagobooth.edu/research/kilts/datasets/dominicks.

Furthermore, we provide two files located in the CSV/ folder that prepare the information on the week variable and the stores included that was covered only in Dominick's Data Manual.

Description

  • CSV/: These files are needed to run the SAS code and R code, respectively. The weeks file codes the week for which a data point is recorded. The stores file lists the stores included in the Dominick's research project. The upcrfj file provides the UPC file information for refrigerated juices ('RFJ') in a SAS readable format (see documentation about acquiring the data in the docs/ folder). Note that, if using R, there is no movement file available in CSV format for refrigerated juices from the Dominick's website.
  • SAS/: The SAS codes replicate the data and results of the paper located in the docs/ folder. The upc part reads in all UPC files and adds a category identifier. The move part reads in all movement files, adds a category identifier, and calculates total dollar sales; suspect data are dropped. The weeks_stores part reads in the week and store files and merges them with the movement and UPC files. The wtpd example aggregates the data, calculates unit prices as well as expenditure shares per category, and derives price indices by means of the weighted time-product dummy (WTPD) method. The sas2csv code was used to convert SAS files to the CSV format newly available at the Dominick's website. The CSV files are provided to make them more useful to researchers.
  • R/: The R code generates analysis-ready data and derives price indices equivalent to the SAS codes (located in the SAS/ folder). Common to the two sets of codes is that for the sake of exposition the weekly store-level UPC data are aggregated to chain-wide item codes (attempt at tracking products across multiple UPCs) at monthly frequency โ€“ but this can be changed. The difference is that while the SAS codes calculate results for each category, the R code is restricted to one particular category, where the three-letter acronym for the category can be adapted. The folder includes two R codes that both create the same results. The first code can be run with the R base package whereas the second code requires the installation of the tidyverse package.
  • docs/: The documentation includes the paper demonstrating how the data set can be used for price index research and capacity building as well as the SAS output from the weighted time-product dummy method at monthly frequency across all 29 categories in CSV format. Note that, if using R, there is a small loss of information between conversion in the 'truncated' PRICE variable in the CSV files. The annex to the paper gives instructions on how to use the R code located in the R/ folder.

About

author Mehrhoff J.
status since 2018 โ€“ closed
version 2.0
license EUPL (cite the source code or the reference below!)

References

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.