Giter Club home page Giter Club logo

co-marijuana-dataset's Introduction

output
html_document
default

CO Counties Marijuana Dataset

Author: David Martinez
contact: [email protected]
github: https://github.com/davelovesdata/CO-Marijuana-Dataset

Project Description

The Colorado Counties Marijuana Dataset was created to use as a tool in determining if machine learning can be used to predict county level sales. As of August 2018, about thirty (46%) of Colorado's sixty-four counties prohibited medical marijuana sales and twenty-seven (42%) prohibited recreational sales. This is an impressive growth market for the marijuana industry and should machine learning predictive analytics prove effective, will identify counties to prioritize business efforts against.

Data Collection and Processing Methodology

Multiple public source datasets obtained from the Colorado Department of Revenue (https://www.colorado.gov/pacific/revenue) were manually wrangled into two excel workbooks: CO_County_Sales_2014_2018.xlsx and CO_County_Taxes_2014_2018.xlxs. Each workbook contains monthly level sales/taxes (compilied by year) and aggregate level sales/taxes (compiled over many years).

Data collection was performed against monthly reports found at:
https://www.colorado.gov/pacific/revenue/colorado-marijuana-sales-reports (monthly sales reports for each county by month and year).
https://www.colorado.gov/pacific/revenue/colorado-marijuana-tax-data (monthly taxation reports for each county by month and year).

1. Sales file description (CO_County_Sales_2014_2018.xlsx)

The sales files contains not only county level medical and recreational sales by year, but also population information and location information (State, County, Latitude, Longitude, Region). Additionally, medical and recreational sales for each county were applied against county population to determine an average of sales per county citizen for both medical and recreational sales.

Dataset fields:
State - Currently only "COLORADO"
County - Colorado County Name (e.g., "Adams" or "Yuma")
Latitude - Latitude of County center
Longitude - Longitude of County Center
Region - An arbitrary assignment I made to quarter the state into geographic quadrants.
Year - Collection Year
Population - Estimated population between census reporting periods
Med_Sales - County level sales of Medical Marijuana (see value explanation below)
Rec_Sales - County level sales of Recreational Marijuana (see value explanation below)
med_sales_per_citizen - a calculated value determined by dividing the "Med_Sales" value by the "Population" value.
rec_sales_pre_citizen - a calculated value determined by dividing the "Rec_Sales" value by the "Population" value.

Med_Sales, Rec_sales, and the two calculated values have three possible values:
0 = No Sales of legal Marijuana occurred in that county. The original source material did not include counties that had no sales. This information was added to show a full statewide picture as well as county adoption over time.
NR = Not releasable due to confidentiality requirements. The sum of all NR counties ("Not Reported" in the 'County' column) are captured as the last line for each year.
x = A positive number representing sales at the dollar level.

2. Taxes file description (CO_County_Taxes_2014_2018.xlsx)

The taxes file contains taxes collected per county in three columns: Medical Sales Tax (2.9%), Retail Sales Tax (2.9%), Retail Marijuana Special Sales Tax.

Dataset fields:
County - Colorado County Name (e.g., "Adams" or "Yuma")
Year - Collection Year
Medical Sales Tax (2.9%) - Sales tax applied to medical marijuana only. This is the only state tax paid. Retail Sales Tax (2.9%) - Sales tax applied to retail marijuana. Starting in 2018, this tax was no longer collected. Retail Marijuana Special Sales Tax - an additional tax on retail marijuana sales.

Medical Sales Tax (2.9%), Retail Sales Tax (2.9%), Retail Marijuana Special Sales Tax have three possible values:
0 = No taxes from legal Marijuana occurred in that county. The original source material did not include counties that had no tax information. This information was added to show a full statewide picture as well as county adoption over time.
NR = Not releasable due to confidentiality requirements. The sum of all NR counties ("Not Reported" in the 'County' column) are captured as the last line for each year.
x = A number representing taxes at the dollar level. Negative values indicate previous months overpayment of taxes being returned.

co-marijuana-dataset's People

Contributors

davelovesdata avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.