lgervasoni / urbansprawl Goto Github PK

Open framework for calculating spatial urban sprawl indices and performing disaggregated population estimates using open data

License: MIT License

Python 100.00%

openstreetmap overpass-api urban-planning urban-sprawl transportation urban-accessibility land-use-mix land-use urban-dispersion urban

urbansprawl's Introduction

Urbansprawl

The urbansprawl project provides an open source framework for assessing urban sprawl using open data. It uses OpenStreetMap (OSM) data to calculate its sprawling indices, divided in Accessibility, Land use mix, and Dispersion.

Locations of residential and activity (e.g. shop, commerce, office, among others) units are used to measure mixed use development and built-up dispersion, whereas the street network is used to measure the accessibility between different land uses. The output consists of spatial indices, which can be easily integrated with GIS platforms.

Additionally, a method to perform dissagregated population estimates at building level is provided. Our goal is to estimate the number of people living at the fine level of individual households by using open urban data (OpenStreetMap) and coarse-scaled population data (census tract).

Motivation:

Urban sprawl has been related to numerous negative environmental and socioeconomic impacts. Meanwhile, the number of people living in cities has been increasing considerably since 1950, from 746 million to 3.9 billion in 2014. More than 66% of the world's population are projected to live in urban areas by 2050, against 30% in 1950 (United Nations, 2014). The fact that urban areas have been growing at increasing rates urges for assessing urban sprawl towards sustainable development. However, sprawl is an elusive term and different approaches to measure it have lead to heterogeneous results.

Moreover, most studies rely on private/commercial data-sets and their software is rarely made public, impeding research reproducibility and comparability. Furthermore, many works give as result a unique value for a region of analysis, dismissing spatial information which is vital for urban planners and policy makers.

This situation brings new challenges on how to conceive cities that host such amounts of population in a sustainable way. Thus, this sustainability question should address several aspects, ranging from economical to social and environmental matters among others. Urbansprawl provides an open framework to aid in the process of calculating sprawling indices.

Framework characteristics:

Open data: we rely solely on open data in order to ensure replicability.
Open source: users are free to use the framework for any purpose.
World-wide coverage: the analysis can be applied to any city in the world, as long as sufficient data exists.
Data homogeneity: a set of statistical tools are applied to homogeneous and well-defined map features data.
Geo-localized data: precise location of features allow to cope with the Modifiable Areal Unit Problem (avoid using gridded data, e.g. Land Use Land Cover data).
Crowd-sourced data: rapid updates given an ever-increasing community.
GIS output: easy to integrate with other GIS frameworks.
Potential missing data: still few data exist for some regions in the world.

Disclaimer: This package is no longer maintained.

For more details, refer to:

Gervasoni Luciano, 2018. "Contributions to the formalization and implementation of spatial urban indices using open data : application to urban sprawl studies." Computers and Society [cs.CY]. Université Grenoble Alpes, 2018.
Gervasoni Luciano, Bosch Martí, Fenet Serge, and Sturm Peter. 2016. "A framework for evaluating urban land use mix from crowd-sourcing data." 2nd International Workshop on Big Data for Sustainable Development (IEEE Big Data 2016).
Gervasoni Luciano, Bosch Martí, Fenet Serge, and Sturm Peter. 2017. "LUM_OSM: une plateforme pour l'évaluation de la mixité urbaine à partir de données participatives." GAST Workshop, Conférence Extraction et Gestion de Connaissances (EGC 2017).
Gervasoni Luciano, Bosch Martí, Fenet Serge, and Sturm Peter. 2017. "Calculating spatial urban sprawl indices using open data." 15th International Conference on Computers in Urban Planning and Urban Management (CUPUM 2017).
Gervasoni Luciano, Fenet Serge, and Sturm Peter. 2018. "Une méthode pour l’estimation désagrégée de données de population à l’aide de données ouvertes." Conférence Internationale sur l'Extraction et la Gestion des Connaissances (EGC 2018).
Gervasoni Luciano, Fenet Serge, Perrier Régis and Sturm Peter. 2018. "Convolutional neural networks for disaggregated population mapping using open data." IEEE International Conference on Data Science and Advanced Analytics (DSAA 2018).

Installation

The urbansprawl framework works with Python 2+3.

Python dependencies:

osmnx scikit-learn psutil tensorflow keras jupyter

Using pip

Install the spatialindex library. Using apt-get (Linux):

sudo apt-get install libspatialindex-dev

Install the dependencies using pip

pip install osmnx scikit-learn psutil tensorflow keras jupyter

Using Miniconda

Install Miniconda
[Optional] Create a conda virtual environment

conda create --name urbansprawl-env
source activate urbansprawl-env

Install the dependencies using the conda package manager and the conda-forge channel

conda install -c conda-forge libspatialindex osmnx scikit-learn psutil tensorflow keras jupyter

Using Anaconda

Install Anaconda
[Optional] Create a conda virtual environment

conda create --name urbansprawl-env
source activate urbansprawl-env

Install the dependencies using the conda package manager and the conda-forge channel

conda update -c conda-forge --all
conda install -c conda-forge osmnx scikit-learn psutil tensorflow keras jupyter

Usage

The framework is presented through different examples in the form of notebooks. As well, the computational running times involved in each procedure are shown for each example. To this end, a r5.large AWS EC2 instance was employed (2 vCPU and 16GiB memory) to run the notebooks.

Please note that the different procedures can be both memory and time consuming, according to the size of the chosen region of interest. In order to run the different notebooks, type in a terminal:

jupyter notebook

Example: Urban sprawl

OpenStreetMap data is retrieved using the Overpass API. An input region of interest can be extracted by:

Place + result number: The name of the city/region, and the resulting number to retrieve (as seen in OpenStreetMap result order)
Polygon: A polygon with the coordinates delimitating the desired region of interest
Bounding box: Using northing, southing, easting, and westing coordinates
Point + distance (meters): Use the (latitude, longitude) central point plus an input distance around it
Address + distance (meters): Set the address as central point and an input distance around it

Additionally, the state of the data-base can be retrieved for a specific data. This allows for comparisons across time and keeping track of a city's evolution.

Results are depicted for the city of Lyon, France:

Locations of residential and activity land uses are retrieved

Buildings with defined land use:
- Blue: Residential use
- Red: Activity use
- Green: Mixed use

Points of interest (POIs) with defined land use:

Densities for each land use are estimated:
- Probability density function estimated using Kernel Density Estimation (KDE)

Activity uses can be further classified using the OSM wiki:
- Leisure and amenity
- Shop
- Commercial and industrial

Street network:

Sprawling indices:

Land use mix indices: Degree of co-occurence of differing land uses within 'walkable' distances.

Accessibility indices: Denotes the degree of accessibility to differing land uses (from residential to activity uses).
- Fixed activities: Represents the distance needed to travel in order to reach a certain number of activity land uses
- Fixed distance: Denotes the cumulative number of activity opportunities found within a certain travel distance

Dispersion indices: Denotes the degree of scatteredness of the built-up area.
- "A landscape suffers from urban sprawl if it is permeated by urban development or solitary buildings [...]. The more area built over and the more dispersed the built-up area, [...] the higher the degree of urban sprawl" (Jaeger and Schwick 2014)

Example: Population densities

Gridded population data is used in the context of population densities downscaling:

A fine scale description of residential land use (surface) per building is built exploiting OpenStreetMap.
Using coarse-scale gridded population data, we perform the down-scaling for each household given their containing area for residential usage
The evaluation is carried out using fine-grained census block data (INSEE) for cities in France as ground-truth.

Population count images are depicted for the city of Grenoble, France:

Population densities (INSEE census data):

Population densities (INSEE census data, Gridded Population World resolution):

urbansprawl's People

Contributors

Stargazers

Watchers

Forkers

doggett-sarah delhomer garaud edthenerdy mstiegl l5d1l5 jakubcha san770 andreiprokhorov ruirzma wegiangb innovationlabroma ignaciomsarmiento midnight93 chania12 mmaltaie maprdhm

urbansprawl's Issues

[Enhancement] unnecessary calculation of composed classification in `population/urban_features.py`

The composed classification calculation is unnecessary. During the pre-processing step, buildings classification are already given according to their composed classification, considering both containing Points of interest and building parts.

Code in population/urban_features.py

# Calculate the composed classification
df_osm_built.containing_poi = df_osm_built.containing_poi.apply(lambda x: x if isinstance(x,list) else [])
df_osm_built.activity_category = df_osm_built.activity_category.apply(lambda x: x if isinstance(x,list) else [])

df_osm_built['composed_classification'] = df_osm_built.apply(lambda x: get_composed_classification( x, df_osm_pois.loc[x.containing_poi] ).classification, axis=1 )
df_osm_built.loc[ df_osm_built.containing_poi.apply(lambda x: len(x)==0 ), "containing_poi" ] = np.nan
df_osm_built.loc[ df_osm_built.activity_category.apply(lambda x: len(x)==0 ), "activity_category" ] = np.nan

Make a setup.py and rename the 'src' folder

Hi,

I'm working on a setup.py file. It may be useful for:

install the urbansprawl project/package
list all dependencies
distribute the package on Pypi

I also suggest to rename the folder src to urbansprawl. When you distribute a Python package, you have to find a name to this package. It shouldn't be src. This implies to update all imports where there is src.

What do you think @lgervasoni ?

Thanks,
Damien

Example code not working

Hello, I am trying to run the sprawl-overview example code but I am getting an error when I attempt to retreive the OSM data. Can you help? This is the error message:

JSONDecodeError Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\osmnx\core.py in overpass_request(data, pause_duration, timeout, error_pause_duration)
337 try:
--> 338 response_json = response.json()
339 if 'remark' in response_json:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\models.py in json(self, **kwargs)
891 pass
--> 892 return complexjson.loads(self.text, **kwargs)
893

~\AppData\Local\Continuum\anaconda3\lib\json_init_.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
353 parse_constant is None and object_pairs_hook is None and not kw):
--> 354 return _default_decoder.decode(s)
355 if cls is None:

~\AppData\Local\Continuum\anaconda3\lib\json\decoder.py in decode(self, s, _w)
338 """
--> 339 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
340 end = _w(s, end).end()

~\AppData\Local\Continuum\anaconda3\lib\json\decoder.py in raw_decode(self, s, idx)
356 except StopIteration as err:
--> 357 raise JSONDecodeError("Expecting value", s, err.value) from None
358 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Exception Traceback (most recent call last)
in ()
1 # Retrieve OSM data
2 region_args = {"north":north, "south":south, "west":west, "east":east}
----> 3 df_osm_built, df_osm_building_parts, df_osm_pois = get_processed_osm_data(city_ref, region_args)

~\src\core.py in get_processed_osm_data(city_ref_file, region_args, kwargs)
179 ##########################
180 # Query and update bounding box / polygon
--> 181 df_osm_built, polygon, north, south, east, west = create_buildings_gdf_from_input(date=date_query, polygon=polygon, place=place, which_result=which_result, point=point, address=address, distance=distance, north=north, south=south, east=east, west=west)
182 df_osm_built["osm_id"] = df_osm_built.index
183 df_osm_built.reset_index(drop=True, inplace=True)

~\src\osm\osm_overpass.py in create_buildings_gdf_from_input(date, polygon, place, which_result, point, address, distance, north, south, east, west)
96 p4 = (east,south)
97 polygon = Polygon( [p1,p2,p3,p4] )
---> 98 df_osm_built = buildings_from_polygon(date, polygon)
99 else:
100 log("Error: Must provide at least one input")

~\src\osm\osm_overpass.py in buildings_from_polygon(date, polygon, retain_invalid)
337 """
338
--> 339 return create_buildings_gdf(date=date, polygon=polygon, retain_invalid=retain_invalid)
340
341

~\src\osm\osm_overpass.py in create_buildings_gdf(date, polygon, north, south, east, west, retain_invalid)
235 """
236
--> 237 responses = osm_bldg_download(date, polygon, north, south, east, west)
238
239 vertices = {}

~\src\osm\osm_overpass.py in osm_bldg_download(date, polygon, north, south, east, west, timeout, memory, max_query_area_size)
201 '(poly:"{polygon}")["building"];(._;>;));out;')
202 query_str = query_template.format(polygon=polygon_coord_str, timeout=timeout, maxsize=maxsize)
--> 203 response_json = overpass_request(data={'data':query_str}, timeout=timeout)
204 response_jsons.append(response_json)
205 msg = ('Got all building footprints data within polygon from API in '

~\AppData\Local\Continuum\anaconda3\lib\site-packages\osmnx\core.py in overpass_request(data, pause_duration, timeout, error_pause_duration)
358 else:
359 log('Server at {} returned status code {} and no JSON data'.format(domain, response.status_code), level=lg.ERROR)
--> 360 raise Exception('Server returned no JSON data.\n{} {}\n{}'.format(response, response.reason, response.text))
361
362 return response_json

Exception: Server returned no JSON data.
<Response [400]> Bad Request

<title>OSM3S Response</title>

The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.

Error: line 1: parse error: ';' expected - ')' found.

Error: line 1: parse error: Unexpected end of input.

Error: line 1: static error: Element "print" cannot be subelement of element "union".

Error whenever height tags are missing in `osm_surface.py`

When buildings are retrieved for a region of interest, and there exists no height tags for any of them, the associate_level function in osm/osm_surface.py fails.
It expects a dict to process the available height tags, but a NaN is sent in some cases (due to missing columns).
The method should consider these cases, and set a default height to the buildings.
The problem is reproducible chossing the following region of interest: "El Cairo, Egypt".

Update error

Hello,

Thank your for sharing UrbanSprawl of Python.

Now I want to use it but it does not work due to update of dependency like scikit-learn.

I need your advice for solving the issue.

Sincerely,

Fail in `sprawl_overview` notebook (missing dependency?)

The compute_grid_accessibility method fails during execution. After reading the source code, it seems that the problem comes from line 96:

max_processes = int( ''.join(numbers) )

The value of numbers is probably not int-compatible (I suspect a null value), as a the log highlights a missing dependency (psutil).

Entropy index calculation in case of null input values

In sprawl/landusemix.py the Entropy Index calculation results in NaN values if any of the inputs is zero, whereas the degree of mixed-use should be zero.

	if (x <= 0 or y <= 0): return np.nan

Add an executable main

A main executable can be added, in order to be able to do something like the following:
python urbansprawl [method] [arguments]
where the method can be either the 1) OSM processed data, 2) the spatial sprawl indices, or 3) the population downscaling.

As such, the following file can be included in /urbansprawl/__main__.py:

import sys

# Usage: "python urbansprawl [method] [arguments]"

def main():
	print(sys.argv)

	if (sys.argv[1] == "osm_data"):
		print("Retrieve OSM data")
		# TODO

	if (sys.argv[1] == "indices"):
		print("Retrieve sprawl indices")
		# TODO

	if (sys.argv[1] == "population_downscaling"):
		print("Perform population downscaling")
		# TODO

if __name__== "__main__":
	main()

Grenoble 3 years ago example has failed in `osm_cities` notebook

The last exemple within osm_cities.ipynb failed as follows:

---------------------------------------------------------------
KeyError                      Traceback (most recent call last)
<ipython-input-4-2c1734dd0411> in <module>()
      8 kwargs={'retrieve_graph': True, 'default_height': 3, 'meters_per_level': 3, 'associate_landuses_m2': True, 'minimum_m2_building_area': 9, 'date': date}
      9 region_args = {"place":"Grenoble, France"}
---> 10 df_osm_built, df_osm_building_parts, df_osm_pois = get_processed_osm_data(city_ref_file = 'Grenoble_old', region_args = region_args, kwargs = kwargs)
     11 
     12 # Plot

~/src/urbansprawl/src/core.py in get_processed_osm_data(city_ref_file, region_args, kwargs)
    306                 default_height = kwargs["default_height"]
    307                 meters_per_level = kwargs["meters_per_level"]
--> 308                 mixed_building_first_floor_activity = kwargs["mixed_building_first_floor_activity"]
    309                 compute_landuses_m2(df_osm_built, df_osm_building_parts, df_osm_pois, default_height=default_height, meters_per_level=meters_per_level, mixed_building_first_floor_activity=mixed_building_first_floor_activity)
    310 

KeyError: 'mixed_building_first_floor_activity'

It seems that we have to add 'mixed_building_first_floor_activity': True in the kwargs dict definition.

`osm_cities` notebook fails for some cities

Hi,
There seems to be an issue with osm_cities notebook, as the process fails for some cities. As an example, for Paris, I get a

ReadTimeout: HTTPConnectionPool(host='overpass-api.de', port=80): Read timed out. (read timeout=180)

Plus, some other cities are really long to be addressed. How much time is the processing supposed to be?

Missing dependency in `population_downscaling_ml.ipynb`

Running the first cell of the notebook raises a missing dependency error, as keras is not indicated in the project dependencies.

[Code enhancement] Introduces a data pipeline to enhance the data gathering process reliability

The OSM data gathering process is long and its reliability is hard to control. We could cut it into smaller steps, design a task dependency graph and verify each step realization sequentially.

[Enhancement] Improve the logging step across the framework

The employed logging can be enhanced

geopandas data farmes have no assigned names, where they still appear in the log.
The time elapsed can be shown for each step

[Enhancement] Code (modules) reorganisation

In order to better understand the framework, the code needs to be reorganized.
Modules need to be correctly separated.

[Enhancement] Code (modules) reorganisation + Main executable

In order to better understand the framework, the code needs to be reorganized. Modules need to be correctly separated.
As well, a main executable should be added, something like the following:
python urbansprawl [method] [arguments]
where the method can be either the OSM processed data, the spatial sprawl indices, or the population downscaling.

Improve installation in Readme

One should extend the installation instructions for the cases of Miniconda and pip

FileNotFoundError in `population_downscaling_ml.ipynb`

In the second code cell, I got:

FileNotFoundError: [Errno 2] No such file or directory: 'data/training'

We need a data/training subdir in get_Y_X_features_population_data fonction, however it seems that its creation is not managed at all in the source code.