JSON Files

Introduction

We've started to investigate APIs and briefly got a preview of the most common response format for data: JSON. While there are other formats, such as XML, json is the current standard and the most common format you are apt to encounter. With that, let's take a look at how JSON files are structured.

JSON

JSON stand for JavaScript Object Notation. It came after XML and was meant to streamline many data transportation issues at the time. It is now the common standard amongst data transfers on the web and has numerous parsing packages for numerous languages (including Python)! Here's a brief preview of the same file above now in JSON:

The JSON Module

https://docs.python.org/3.6/library/json.html

import json

To load a json file, we first open the file using python's built in function and then pass that file object to the json module's load method. As you can see, this loaded the data as a dictionary.

f = open('nyc_2001_campaign_finance.json')
data = json.load(f)
print(type(data))

<class 'dict'>

Json files are often nested in a hierarchical strucutre and will have data structures analagous to python dictionaries and lists. We can begin to investigate a particular file by using our traditional python methods. Here's all of the built in supported data types in JSON and their counterparts in python:

Check the keys of the dictionary:

data.keys()

dict_keys(['meta', 'data'])

Investigate what data types are stored within the values associated with those keys:

for v in data.values():
    print(type(v))

<class 'dict'>
<class 'list'>

We can quickly preview the first dictionary as a DataFrame

pd.DataFrame.from_dict(data['meta'])

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	view
attribution	Campaign Finance Board (CFB)
averageRating	0
category	City Government
columns	[{'id': -1, 'name': 'sid', 'dataTypeName': 'me...
createdAt	1315950830
description	A listing of public funds payments for candida...
displayType	table
downloadCount	1470
flags	[default, restorable, restorePossibleForType]
grants	[{'inherited': False, 'type': 'viewer', 'flags...
hideFromCatalog	False
hideFromDataJson	False
id	8dhd-zvi6
indexUpdatedAt	1536596254
metadata	{'rdfSubject': '0', 'rdfClass': '', 'attachmen...
name	2001 Campaign Payments
newBackend	False
numberOfComments	0
oid	4140996
owner	{'id': '5fuc-pqz2', 'displayName': 'NYC OpenDa...
provenance	official
publicationAppendEnabled	False
publicationDate	1371845179
publicationGroup	240370
publicationStage	published
query	{}
rights	[read]
rowClass
rowsUpdatedAt	1371845177
rowsUpdatedBy	5fuc-pqz2
tableAuthor	{'id': '5fuc-pqz2', 'displayName': 'NYC OpenDa...
tableId	932968
tags	[finance, campaign finance board, cfb, nyccfb,...
totalTimesRated	0
viewCount	233
viewLastModified	1536605717
viewType	tabular

Notice the column names which will be very useful!

Investigate further information about the list stored under the 'data' key:

len(data['data'])

Previewing the first entry:

data['data'][0]

[1,
 'E3E9CC9F-7443-43F6-94AF-B5A0F802DBA1',
 1,
 1315925633,
 '392904',
 1315925633,
 '392904',
 '{\n  "invalidCells" : {\n    "1519001" : "TOTALPAY",\n    "1518998" : "PRIMARYPAY",\n    "1519000" : "RUNOFFPAY",\n    "1518999" : "GENERALPAY",\n    "1518994" : "OFFICECD",\n    "1518996" : "OFFICEDIST",\n    "1518991" : "ELECTION"\n  }\n}',
 None,
 'CANDID',
 'CANDNAME',
 None,
 'OFFICEBORO',
 None,
 'CANCLASS',
 None,
 None,
 None,
 None]

Summary

As you can see, there's still a lot going on here with the deeply nested structure of some of these data files. In the upcoming lab, you'll get a chance to practice loading files and conducting some initial preview of the data as we did here.

sami304 / ds-skills2-json-intro-london-ds-skills-011519 Goto Github PK

ds-skills2-json-intro-london-ds-skills-011519's Introduction

JSON Files

Introduction

JSON

The JSON Module

Summary

ds-skills2-json-intro-london-ds-skills-011519's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent