fzhu2e / cfr Goto Github PK
View Code? Open in Web Editor NEWA Python package for Climate Field Reconstruction
Home Page: https://fzhu2e.github.io/cfr
License: BSD 3-Clause "New" or "Revised" License
A Python package for Climate Field Reconstruction
Home Page: https://fzhu2e.github.io/cfr
License: BSD 3-Clause "New" or "Revised" License
methods like standardize() or center() reinvent the wheel ; the database method needs to re-use the records method
Right now, a RuntimeWarning: divide by zero encountered in divide
is returned, but that corrupts the results; the method should instead remove records from the db if they don't have any overlap over the reference period.
Code to reproduce:
pdb.annualize()
Using the example from: https://fzhu2e.github.io/cfr/notebooks/proxy-ops.html
when I run the step:
mdl.calibrate()
it cames some problem
name 'PyVSL' is not defined
I try to insatll PyVSL,but it seems not work
The load_proxy
method for the class ReconJob requires a pickle file (which I'm guessing has fixed keys). Would like an alternative as was done for ProxyDatabase().from_df
, specifying the columns. Example:
pdb = cfr.ProxyDatabase().from_df(df, pid_column='dsname', lat_column='lat', lon_column='lon', time_column='timeval', value_column='val', proxy_type_column='proxyobs', archive_type_column='archive', value_name_column='varname', value_unit_column='varunits')
Currently, cfr
only uses pyleoclim for spectral analysis. However, the ProxyRecord
class is basically GeoSeries; only "seasonality' is missing.
Making ProxyRecord
a child of GeoSeries
would enable Pyleoclim functionalities, particularly:
to_pandas()/from_pandas()
The proxy composite scores really only make sense by proxy type (not even archive type). If a user is trying to calculate a z-score across proxies, should throw a warning.
The problem is that proxy may have a positive or negative relationship with their common variables (e.g., coral d18O and coral Sr/Ca vs temperature) and so they may need to be flipped prior to calculating the z-score. This is done automatically when calibrating against instrumental records.
Preferred solution: the LiPD files have an interpretation field indicating the direction. We could use it to automatically figure out when to flip the axis. But it would require changes to the API to create a ProxyDatabase and the pickle file would no longer be valid.
Looping @CommonClimate in that discussion.
In several places, the code assumes that users only want to reconstruct temperature, and that this variable is called 'tas' (e.g. in prep_graphem
). This is an unnecessarily restrictive assumption, and may lead to bugs if people try to reconstruct any other field, or have other variable names.
Since cfr
stands for "climate field reconstruction", I suggest calling the target variable "field" in graphem-related functions. It may also be good to check that the LMR part of the code does not assume too much either.
The current version of the package only support gridded observation data for PSM calibrations.
In future, we should support ungridded (a collection of sites) obs data.
Hi @fzhu2e ,
I was going over the doc with Shreya (USC undergrad, who will be trying to apply the code to PAGES2k and maybe CoralHydro2k), and I noticed a couple of things:
Just a few ideas to operationalize this great package. Keep your audience is mind!
the graphem bug with singular matrix illustrated the perils of having duplicates in the proxy matrix. I was originally thinking that it would only be an issue for graphem, and therefore should be dealt with within prep_graphem()
, but I now believe it needs to be done earlier in the workflow.
Here is my proposal:
ProxyDatabase
class called find_duplicates()
, governed by a parameter called r_thresh
(default = 0.9). Within that function, compute R = np.triu(np.corrcoef(proxy.T),k=1)
(where proxy
is the proxy matrix) and find the indices/labels of the records for which R > r_thresh.ProxyRecord.plot()
and plotting the two close series in the same Axes object (different colors and/or line styles, whatever works best to tell them apart).ProxyDatabase
, so the user can subtract those proxies from the original database using the "-" syntax. (don't do it for them, though ; this must be an explicit part of the workflow so they can remember that they did it).I believe this will be cleanest and most transparent, as the users will have to make careful, explicit decisions. Now that I think of it, we had to do this a lot as part of PAGES 2k ca 2015-2016, because several groups had included the same proxy series, or several slightly different versions of the same proxy. I bet this will be helpful for CoralHydro2k as well. And it will come in handy when merging two databases that have potential duplicates. So overall a very useful feature that will serve for both pseudo- and real proxy recons.
Now that we've confirmed that the cfr
implementation of GraphEM can run on non-pathological cases, it needs to be upgraded to the next level:
The choice of regression model in GraphEM (the graph) is still very unsatisfactory: whether the cutoff radius for a neighborhood graph or the target sparsities of a graphical LASSO ("glasso") graph, the only way to do it now is by trial & error which is unscientific, error-prone, and, frankly, a little embarrassing. We can do a lot better than that with cross-validation.
Neighborhood graphs are a quick and dirty way to get a reconstruction, but they underuse the available information. If enough data are available for calibration, glasso can do much better at extracting structure and capturing spatial dependencies. However, glasso is in need of the following updates:
as in #2 , this code was written with the assumption that temperature is the only field of interest. Math stays the same, so changing the nomenclature won't change any numerical behavior, but I will still try to:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.