- An attempt was made to understand Hadoop/Spark capabilities
- Usage of various Big Data tools (e.g., Pig for ETL, Hive and/or Spark SQL for queries) done.
- Usage of Apache Zeppelin for performing data analysis, sharing results
Chicago’s Crimes - 2001 to present
https://data.cityofchicago.org/api/views/ijzp-q8t2/rows.csv?accessType=DOWNLOAD
This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. The dataset contains more than 65,000 records/rows of data.
For Each Year And District, The % Of Homicide That Lead To An Arrest.(Interpreted using pig)
Display The Number Of Thefts By Month, All Years Combined.(Interpreted using pig)
See The Trend Of All Kinds Of Crime Through The Years. (Interpreted using Hive)
See The Trend Of Specific Kind Of Crimes Though The Years (Interpreted using Hive)
Top 3 Day Of The Week Most Crime Are Committed For Each Year (Interpreted using Spark)