andybega / forecaster2 Goto Github PK
View Code? Open in Web Editor NEWCoup forecasts
Home Page: https://www.predictiveheuristics.com/forecasts
License: MIT License
Coup forecasts
Home Page: https://www.predictiveheuristics.com/forecasts
License: MIT License
The index page (index.html
) is quite big. I started out at almost 6MB when loaded in a browser, and through playing around with st_simplify()
for the two map objects I can get it down to around 4.53 MB when loaded. This still doesn't work with the Twitter card validator (see #10), and reducing the number of points further at this point I think will detract too much visibly.
Another option might be to reduce the precision of the coordinates.
Right now index.html
stores the coordinates like "-0.456945916095746", i.e. with 15 digits. That's probably way more than needed. (These are lag/long coordinates, so range from -180 - 180 and -90 to 90.)
So try this instead: reduce number of digits in the coordinates (and probabilities, for that matter); and possible maybe try to revert the point reduction (st_simplify()
) back to the earlier values.
In rleaflet, for sf, the code to convert sf to GeoJSON straight pulls from the sf coordinates, see https://github.com/rstudio/leaflet/blob/master/R/normalize-sf.R.
sf has a set of precision functions, but these only come into play when writing out data (see https://r-spatial.github.io/sf/reference/st_precision.html). rleafet's straight access to the sf coordinates thus circumvents this. Maybe either:
rleaflet::sf_coords()
sf
to GeoJSON; manipulate the coordinates somehow, and then add it as a GeoJSON layer in leaflet.I think there are coup forecasts for at least 2 past years. In 2017 (?) we had forecasts that we wrote up in WaPo Monkey Cage, and then I think I had some 2018 forecasts that I never did anything with.
Add those to the forecast repo and see what their accuracy was.
One caveat: the process to generate those forecasts was slightly different. E.g. some of the data going into it was different.
How much sensitivity is there between model runs, due to different starting conditions?
To keep this reproducible, find a way to run the forecast models over a some number of randomly picked RNG seeds, and check the resulting variation in performance.
The current map does not include Bahrain for example.
Make sure the website does open graph and twitter cards metadata correctly.
The data-sources
folder is missing the REIGN data cleaning code, add it.
data-sources
is missing the G&W state age cleaning code.
Add set.seed()
in run-forecasts.R
, re-run, and propagate the now reproducible forecasts to the website.
Instead of a relatively small number of decision trees that themselves operate on a lot of data and are fairly deep, try out an alternative strategy using a large number of trees, but where each tree is relatively shallow and only operates on a relatively small data sample. A variation of this is to also consider stratified sampling with downsampling for negative cases.
mlr3 uses the following defaults for ranger()
:
learner = mlr3::lrn("classif.ranger")
learner$param_set$default
The "sample.fraction" argument can be a vector giving the number of cases (relative to the total number of cases) to sample from each outcome factor class. See the bottom answer at https://stats.stackexchange.com/questions/171380/implementing-balanced-random-forest-brf-in-r-using-randomforests, and the linked ranger issues.
So something like sample.fraction = c(0.1, 0.9)
for example should give a resampled dataset with 10% positive cases and same number of rows as original data.
Things to vary:
Chao, Liaw, and Breiman in the balanced random forest paper recommend drawing same number of cases for both classes, i.e. proportion is 1:1, or sample.fraction = c(0.5, 0.5) or something like that. Maybe that's a good starting point.
So basically in total, three tuning strategies:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.