Giter Club home page Giter Club logo

cartogram-cpp's People

Contributors

adisidev avatar edliau avatar fillingthemoon avatar mgastner avatar morrcriven avatar nihalzp avatar phongulus avatar pratyushmore avatar sevvalbbayram avatar vuminhhieunus2019 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

edliau wind1337

cartogram-cpp's Issues

Mislabelling of Inset GeoDivs (russia.geojson)

I came across a bug on the master branch.

This first screenshot below shows a GeoDiv of russia.geojson (input GeoJSON) with its NAME_1 as Sakha.

Screenshot 2021-06-22 at 1 01 48 PM

This second screenshot below shows the same GeoDiv of an inset of russia_B_cartogram_scaled.geojson with its NAME_1 as Bryansk, which is incorrect. This bug can be traced back to commit 551633ccf0a33c562a7d5b0bc9315c15e320d4df, which seems to be the first commit where this bug occurs, while commit 3996e831f259ddeeae60161ecaafc95ad19af686, 2 commits before, runs fine, which I believe means that this error is due to the inset code. (Commit da6ea99fd6dc1736955d66e8db99cbae123f379b, which is the commit between the 2 commits mentioned above had some compiling errors so I moved on to the commit after this one, which is 551633ccf0a33c562a7d5b0bc9315c15e320d4df)

Screenshot 2021-06-22 at 1 14 05 PM

To reproduce this bug, please follow these steps:

  1. git checkout master

  2. Place the attached russia.geojson (after unzipping) and inset_russia_population.csv into the sample_data/inset_data/ directory.
    inset_russia_modified_population.csv
    russia.geojson.zip

  3. Comment out lines 190-254 in cartogram_generator/main.cpp.

  4. Run the following in your terminal.

cd build/
cmake .
make
./cartogram -g ../sample_data/inset_data/russia.geojson -v ../sample_data/inset_data/inset_russia_population.csv
  1. Import the resulting russia_B_cartogram_scaled.geojson into Mapshaper to view the incorrectly labelled GeoDiv.

Callback functions for loop over each ring

We frequently use the pattern:

Apply function f() to the exterior ring.
For all internal rings,
  apply function f().

Mapshaper uses callback functions of the type: polygon_with_holes.forEachRing(f). I think we should adopt Mapshaper's strategy. It is more readable and more maintainable if we decide to switch our internal representation of polygons with holes (e.g. to 2D arrangements).

Should we have an `stl_typedef.h`?

We often have long type declarations, for example:

std::vector<std::vector<intersection> >

I am wondering whether we should use a typedef for this declaration and, in fact, all other STL containers, similar to the way in which we handle CGAL types. It would shorten our code without running the risk of using namespaces, which is risky because many different things in our code are called 'map'. Thus, my proposal is to add a header file stl_typedef.h that contains lines such as:

typedef std::vector<intersection> intersection_vector;
typedef std::vector<intersection_vector> intersection_vector_2;

Correct Topology Checks

Implementing a Topology Check with CGAL 2D Arrangements or our own functions may be helpful. Some ideas:

Using contiguity graph and segment intersections.

  • Create Contiguity Graph
  • Recreate map, segment by segment.
  • Check every new segment to segments in adjacent GeoDivs (speed process by checking bounding boxes of two segments comparison) for intersections. Code from densify.cpp can be reused here.

To speed up above code:

  • We can check if new points/segments already exist and skip them. Lookup in a map will take O(log n), total time might take around O( n log n).
  • Other checks to see if segment is on boundary.

Note:

  • Test with unsimplified Russia or Japan to check Speed.

Above function may also help with topology checks where two straight polylines fully overlap, but one has more points than the other.

Viewer for intersections in EPS

This would allow users to see intersections in their input maps, as well as developers to see any intersections incorrectly added by our program.

Use Contiguity Graph instead of Bounding Box for Comparison

When we want to make comparisons between two GeoDivs that touch or are nearby, we currently use bounding boxes to filter out far away GeoDivs; checking the contiguity graph and only testing against Adjacent GeoDivs may be faster.

Additionally, a map with the index of all GeoDivs may be helpful to speed up calculations and store GeoDivs in this manner. Perhaps, something like GeoDiv.index(). Thus, the adjacency list could then be stored as integers. This may also help us how with intersection calculation.

Remove unnecessary `include` directives in header (`.h`) files

We should study and implement the best practices as to what and what not to include in source files compared to header files. The conventions seems to be too include as few include directives as possible. I have noticed inconsistencies in our code; the convention followed seems to be ambiguous.

These may assist in further reading and rewriting:
https://stackoverflow.com/questions/2596449/including-includes-in-header-file-vs-source-file
https://softwareengineering.stackexchange.com/questions/167723/what-should-and-what-shouldnt-be-in-a-header-file

Zero blur width causes bug in world map

After the blur width hits zero, we observed that the maximum area error began increasing. The temporary fix, implemented by #93, is to keep the blur width above zero at all times. While it is a short-term fix, we must investigate the root cause of this bug.

British instead of US English

Should we consistently adopt British English in all repos related to go-cart.io?

Currently, cartogram_cpp uses US spelling, whereas the website uses British spelling. Our target audience is international, and our main sources of data are international organisations. The United Nations recommends adopting the spelling in the Concise Oxford English Dictionary (see https://www.un.org/dgacm/en/content/editorial-manual/spelling); thus, much of our data is likely to follow British spelling. It would feel more consistent if we adopt the same rule.

Native M1 and macOS support through LLVM Clang

I was able to finally get our program to compile natively on M1, as well as x86, using LLVM's clang. M1 has a considerable speed advantage when compiling and running our program. Finally, this would allow us to significantly streamline our repository (such that M1 users and other macOS users have the same build instructions). In the future, we'd also be able to create a simple autobuild script such that the installation process would boil down to running one file.

The one caveat is that we would need to remove our references to ranges as LLVM's clang does not support ranges yet (https://llvm.discourse.group/t/where-can-i-see-status-of-c-20-range-views-in-clang/4706). I argue that we should not use the ranges feature as it does not bring much benefit and is only partially supported, that too by very few compilers (GNU gcc-13, at the moment). Further, by using GNU gcc on a Mac, we make ourselves vulnerable to future bugs, as all the libraries we install from Homebrew are built by clang on Macs. Thus, using LLVM clang, and removing the ranges feature would ensure that the libraries we link, too, are built with the same compiler that we use for our program.

Naming conventions for URLs

Currently, URLs in our code that exceed 80 characters are split into two comments. However, this ends up making it tougher to copy paste and visit the URL, or open it directly using an editor.

Indeed, the Google style guide for C++ to mentions that URLs can be above 80 characters:
https://google.github.io/styleguide/cppguide.html#Line_Length

This stackexchange post has some more discussion about it.
https://meta.stackexchange.com/questions/12527/do-i-have-to-worry-about-copyright-issues-for-code-posted-on-stack-overflow

Shall we perhaps consider merging the split URLs into one comment per URL?

Clean up sample data

Right now, there are many files in sample_data/ that were at some point useful for debugging. Let us remove those files and only keep GeoJSONs for real countries with real data and without artificial insets. Also remove topologically invalid GeoJSONs.

Eliminate `break` statements

There are break statements in flatten_density() and print_properties_map(). There is never a good reason for break. Instead, the code should be refactored so that the loops with the break statements become functions that return when the condition for break is met.

Russia main.cpp holes_inside_polygons() bug

I came across a bug on the master branch.

To reproduce this bug, please follow these steps:

  1. git checkout master

  2. Place the attached russia.geojson (after unzipping) and inset_russia_population.csv into the sample_data/inset_data/ directory.
    inset_russia_modified_population.csv
    russia.geojson.zip

  3. Run the following in your terminal.

cd build/
cmake .
make
./cartogram -g ../sample_data/inset_data/russia.geojson -v ../sample_data/inset_data/inset_russia_population.csv
  1. Issue: It prints a bunch of coordinates to the output (screenshot below) and does not produce any GeoJSONs. When I comment out lines 190-201 in main.cpp, however, it runs fine. It seems then that this Russia example is getting stuck in holes_inside_polygons().

Screenshot 2021-06-29 at 12 54 32 PM

What does CGAL's line simplification do if line segments overlap but do not share the same end points?

Does CGAL's polyline simplification rely on the assumption that polylines that are shared by two neighboring polygons also share the same vertices? For example, the following two polygons touch along the line from [10, 1], to [10, 9]:

polygon 1: [[0, 0], [10, 0], [10, 10], [0, 10], [0, 0]]
polygon 2: [[10, 1], [19, 1], [19, 9], [10, 9], [10, 8], [10, 7], [10, 6], [10, 5], [10, 4], [10, 3], [10, 2], [10, 1]]

However, none of the line segments [[10, 1], [10, 2]], ..., [[10, 8], [10, 9]] are in polygon 1. Would this prevent CGAL's algorithm to remove the intermediate vertices?

`nested` exceptions

Currently, our code is not very friendly to debug, without using tools like lldb or gdb. It may be helpful if we "upgrade" our try-catch blocks to throw nested exceptions with std::nested_exception. This way, we will be able to create a custom backtrace, whenever our program fails.

Optimise speed by changing copies to references

We were originally under the assumption that the compiler would automatically optimise (with the -O3 flag, at the very least) us calling functions that return copies of an object's member rather than accessing the member directly. However, I was playing around with optimising some code earlier, and this does not seem to be the case.

For instance, in src/inset_state/contiguity_graph.cpp (currently on 105; however, since the pull request is not directly related to this issue, I have chosen to not use the # symbol), going from:

geo_divs_[gd_i].adjacent_to(geo_divs()[gd_j].
geo_divs_[gd_j].adjacent_to(geo_divs()[gd_i].id());
geo_divs_[gd_i].adjacent_to(geo_divs_[gd_j].id());
geo_divs_[gd_j].adjacent_to(geo_divs_[gd_i].id());

makes a significant performance difference (a few magnitude even, possibly).

However, geo_divs() simply returns geo_divs_ as such:

const std::vector<GeoDiv> InsetState::geo_divs() const
{
  return geo_divs_;
}

Thus, we should no longer hold this assumption and modify any parts of our code that we initially thought were already being optimised by the compiler. We may able to significantly reduce runtime for everything but the Integration itself.

Inconsistent approach to typecasting

We sometimes use C-style typecasting as in (int)2.5 and sometimes functional typecasting as in int(2.5). We should be consistent. Should both be replaced by static_cast<int>(2.5)?

Refactor Heatmap Code

The code currently has two issues:

  • Hardcoded too much
  • Minimum density corresponds to max index of color category which is unintuitive

-v flag appears twice in output of `./cartogram -h`

Here is the start of the output:

Usage: ./cartogram [options] geometry_file 

Positional arguments:
geometry_file              	File path: GeoJSON file

Optional arguments:
-h --help                  	shows help message and exits [default: false]
-v --version               	prints version information and exits [default: false]
-v --visual_variable_file  	File path: CSV file with ID, area, and (optionally) colour

Can the short form of -v --version be removed?

Error with world map on macOS

cartogram ../sample_data/sandbox/world/world_map_with_antarctica.geojson  -tw

Running this, seems to yield the following error for me on the master branch:

❯ cartogram ../sample_data/sandbox/world/world_map_with_antarctica.geojson ../sample_data/sandbox/world/world_gdp_with_antarctica.csv -tw
Using geometry from file ../sample_data/sandbox/world/world_map_with_antarctica.geojson
Using visual variables from file ../sample_data/sandbox/world/world_gdp_with_antarctica.csv
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
Coordinate reference system: urn:ogc:def:crs:OGC:1.3:CRS84
Replacing small target areas.
Turks and Caicos Is.: 9.1755e+08 to 1.8511e+09
Kiribati: 1.96e+08 to 1.8511e+09
Guinea-Bissau: 1.669e+09 to 1.8511e+09
British Virgin Is.: 9.71237e+08 to 1.8511e+09
St. Kitts and Nevis: 1.118e+09 to 1.8511e+09
Palau: 3.24e+08 to 1.8511e+09
Micronesia: 3.89e+08 to 1.8511e+09
Montserrat: 6.20514e+07 to 1.8511e+09
Dominica: 5.9e+08 to 1.8511e+09
Grenada: 1.33e+09 to 1.8511e+09
Nauru: 1.17e+08 to 1.8511e+09
Samoa: 9.6e+08 to 1.8511e+09
Comoros: 7.73e+08 to 1.8511e+09
Sao Tome and Principe: 5.27e+08 to 1.8511e+09
Anguilla: 3.3752e+08 to 1.8511e+09
Tonga: 5.12e+08 to 1.8511e+09
Solomon Is.: 1.607e+09 to 1.8511e+09
Seychelles: 1.739e+09 to 1.8511e+09
Cook Is.: 2.90194e+08 to 1.8511e+09
Marshall Is.: 2.28e+08 to 1.8511e+09
Vanuatu: 9.94e+08 to 1.8511e+09
Sint Maarten: 1.07207e+09 to 1.8511e+09
San Marino: 1.672e+09 to 1.8511e+09
St. Vin. and Gren.: 9.03e+08 to 1.8511e+09
Rescaling to 512-by-256 grid with bounding box
	(-2.50663, -1.25331, 2.50663, 1.25331)
Integration number 0
blur_width = 32
[1]    16477 bus error  cartogram ../sample_data/sandbox/world/world_map_with_antarctica.geojson  -tw

~/Desktop/github/cartogram_cpp/tests master                                                                                              ✘ BUS
❯ cartogram ../sample_data/sandbox/world/world_map_with_antarctica.geojson ../sample_data/sandbox/world/world_gdp_with_antarctica.csv -tw
Using geometry from file ../sample_data/sandbox/world/world_map_with_antarctica.geojson
Using visual variables from file ../sample_data/sandbox/world/world_gdp_with_antarctica.csv
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
area_field: NA
Coordinate reference system: urn:ogc:def:crs:OGC:1.3:CRS84
Replacing small target areas.
Turks and Caicos Is.: 9.1755e+08 to 1.8511e+09
Kiribati: 1.96e+08 to 1.8511e+09
Guinea-Bissau: 1.669e+09 to 1.8511e+09
British Virgin Is.: 9.71237e+08 to 1.8511e+09
St. Kitts and Nevis: 1.118e+09 to 1.8511e+09
Palau: 3.24e+08 to 1.8511e+09
Micronesia: 3.89e+08 to 1.8511e+09
Montserrat: 6.20514e+07 to 1.8511e+09
Dominica: 5.9e+08 to 1.8511e+09
Grenada: 1.33e+09 to 1.8511e+09
Nauru: 1.17e+08 to 1.8511e+09
Samoa: 9.6e+08 to 1.8511e+09
Comoros: 7.73e+08 to 1.8511e+09
Sao Tome and Principe: 5.27e+08 to 1.8511e+09
Anguilla: 3.3752e+08 to 1.8511e+09
Tonga: 5.12e+08 to 1.8511e+09
Solomon Is.: 1.607e+09 to 1.8511e+09
Seychelles: 1.739e+09 to 1.8511e+09
Cook Is.: 2.90194e+08 to 1.8511e+09
Marshall Is.: 2.28e+08 to 1.8511e+09
Vanuatu: 9.94e+08 to 1.8511e+09
Sint Maarten: 1.07207e+09 to 1.8511e+09
San Marino: 1.672e+09 to 1.8511e+09
St. Vin. and Gren.: 9.03e+08 to 1.8511e+09
Rescaling to 512-by-256 grid with bounding box
	(-2.50663, -1.25331, 2.50663, 1.25331)
Integration number 0
blur_width = 32
[1]    16498 bus error  cartogram ../sample_data/sandbox/world/world_map_with_antarctica.geojson  -tw

Progress bar sometimes goes backwards

I updated our testing script to also show the progress bar statements printed to stdout/stderr.

However, I noticed that our progress bar goes backwards sometimes. You may test this out with:

cartogram ../sample_data/usa/usa_with_puerto_rico.geojson ../sample_data/usa/usa_with_puerto_rico_na_test_inset.csv

Sort command-line flags

At present, it is difficult to determine the meaning of the command-line flags. Should we at least sort them alphabetically (e.g. 'A', 'C', 'D', ..., 'P', 'd', 'g', ..., 'w')?

Control Flow Suggestion, Naming Mistake

https://github.com/mgastner/cartogram_cpp/blob/af71f937c2ee70ed3730b8bf05c7c7d8e495e1db/cartogram_generator/main.cpp#L182-L217

Would it not be better to have this hidden in the control flow? Perhaps have cartogram_info call a member function that updates the non-positive target areas in the background? My reason for this is because we do not require min_positive_area or replacement_for_nonpositive_area anywhere else. Of course, progress would still remain in the same scope.

Further, the wording on line 203 "Replacing zero target area with [replacement_for_nonpositive_area] times the minimum non-positive area" may possibly be wrong; the wording suggests that the newly assigned area is also non-positive (because we multiply X times the non-positive area). Would "Replacing 0 target area with [replacement_for_nonpositive_area] (0.1 times the minimum positive area)." work better?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.