linkedearth / pyleotutorials Goto Github PK

View Code? Open in Web Editor NEW

14.0 14.0 7.0 241.34 MB

Jupyter-based, science-driven tutorials for using the LinkedEarth data-software Python ecosystem

Home Page: http://linked.earth/PyleoTutorials

License: Apache License 2.0

Jupyter Notebook 100.00%

data-science paleoceanography paleoclimatology tutorials

pyleotutorials's People

Contributors

Stargazers

Watchers

Forkers

rachelts speleotheme valentinavan aragath kindredfff samirparhi-dev junjie2008v

pyleotutorials's Issues

L0-Getting Started Notebook - Change link for intro to Python

Right now, the tutorials link to the old version of our training material. Once LeapFROGS is up and running, need to switch to that.

Edit: Actually needed in most notebooks

tutorial on how to edit subplots

Hi all - would love to see a tutorial on how to edit subplots of a pyleoclim figure. Specifically, I want to override a PCA plot so I can plot my own loadings.

Right now, the way I'm handling this is by calling

fig, gs = pca.modeplot()
ax = subplot(gs[3,1])

and then plotting everything to ax. gs is a GridSpec and fig is a Figure.

Spectral Analysis

Show how the various methods work and the effect of parameters (e.g, c for WWZ, n50 for Lomb-Scargle, BW for MTM, windows for Welch), the frequency vector options + effect of detrending/binning.

Beef up the NOAA example

Show an example on how to use their API and grab relevant datasets to Series or GeoSeries

Highlighting intervals tutorial

Build a tutorial showing how to highlight intervals:

overlay
above
below

with labels:

added to the legend (e.g. if there are categories of labels)
per interval with pointers

Bonus:

labelling specific points on a figure
making an event timeline?

Fodder: geologic timescale (Cenozoic), MIS

Summary plots

Work through an example to "pretty up" a summary plot for publication, with the various options. Can reuse the example in the Pyleoclim manuscript using Nino3.4 data.

It would be interesting to first show the default plot and look at ways to change some attributes.

QuanSight work showcase

Going with Pyleoclim release 0.11.0, we need to showcase the multiple improvements brought about by the work of the QuanSight team.

all tutorials should start using load_dataset() whenever possible
the pandas work per se should go into a dedicated L0 notebook called "Pandas in Pyleoclim". This will repurpose the first half of the paleopandas playground.
the work on MultipleSeries overloaded methods (+, -, &) should go into “L0_basic_MSES_manipulation.ipynb”. This will repurpose the second half of the paleopandas playground

We should be able to re-use the notebooks to run the "tests". Only thing to be careful about is that some of the notebooks require user input (i.e., choose your timeseries from a lipd file). So this particular line for the tests notebook would have to be rewritten to use the number parameter.

I propose to create a test folder that contain the test notebooks that are updated for no user inputs. Or we modify to not allow for user inputs. This problem should disappear once we integrate pylipd.

anti_alias filter in spectral analysis notebook

Pyleoclim now has a superb anti-alias feature for PSD objects. The spectral analysis notebook should nod to it.

pyLipd integration into tutorials

L1_working_with_LiPD.ipynb should be updated to reflect pyLipd updates, when the latter are ready for prime time.

Note: Euro2k example can go either there or in the MSES tutorial

v1.0.0 remaining bugs

change kernel from pyleo to Python 3 (ipykernel) in all notebooks
reduce number of sims in wavelet notebook to fit under the 1h limit
L1_working_with_age_ensembles.ipynb indexing issue
L1_surrogates missing imports

Correlation analysis notebook - Goals

The goals are now "blah". Should be changed before publication

indexing issue in from_PaleoEnsembleArray

Currently, what is holding up the completion of L1_working_with_age_ensembles.ipynb is this line:

ensemble = pyleo.EnsembleGeoSeries.from_PaleoEnsembleArray(geo_series = ts, paleo_array = paleoValues, age_depth = ts.depth, paleo_depth = paleoDepth)

Traceback:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[26], line 8
      1 ts = pyleo.GeoSeries(time = chronValues, value = paleo_row['paleoData_values'].iloc[0], 
      2                      time_name =  time_name, time_unit = time_unit,
      3                      value_name = value_name, value_unit = value_unit,
      4                      label = 'MD98-2181', archiveType = 'Marine sediment',
      5                      depth = chronDepth, depth_name = 'Depth', depth_unit = 'cm', lat = paleo_df['geo_meanLat'].iloc[0],
      6                     lon = paleo_df['geo_meanLon'].iloc[0])
----> 8 ensemble = pyleo.EnsembleGeoSeries.from_PaleoEnsembleArray(geo_series = ts, paleo_array = paleoValues, age_depth = ts.depth, paleo_depth = paleoDepth)

File [~/Documents/GitHub/Pyleoclim_util/pyleoclim/core/ensemblegeoseries.py:441](http://localhost:60043/doc/tree/~/Documents/GitHub/Pyleoclim_util/pyleoclim/core/ensemblegeoseries.py#line=440), in EnsembleGeoSeries.from_PaleoEnsembleArray(self, geo_series, paleo_array, paleo_depth, age_depth, extrapolate, verbose)
    438         raise ValueError("Age depth and series time need to have the same length")
    440     #Interpolate the age array to the value depth
--> 441     mapped_paleo = lipdutils.mapAgeEnsembleToPaleoData(
    442         ensembleValues=paleo_array, 
    443         depthEnsemble=paleo_depth, 
    444         depthMapping=age_depth,
    445         extrapolate=extrapolate
    446     )
    448 series_list = []
    450 #check that mapped_age and the original time vector are similar, and that the object is not a geoseries object

File [~/Documents/GitHub/Pyleoclim_util/pyleoclim/utils/lipdutils.py:1187](http://localhost:60043/doc/tree/~/Documents/GitHub/Pyleoclim_util/pyleoclim/utils/lipdutils.py#line=1186), in mapAgeEnsembleToPaleoData(ensembleValues, depthEnsemble, depthMapping, extrapolate)
   1184 depthMapping = np.array(depthMapping)
   1186 #Interpolate
-> 1187 ensembleValuesMapped = np.zeros((len(depthMapping),np.shape(ensembleValues)[1])) #placeholder
   1189 if extrapolate is True:
   1190     for i in np.arange(0,np.shape(ensembleValues)[1]):

IndexError: tuple index out of range

@alexkjames I think you were the one who wrote this Pyleoclim function - can you please take a look?

Make sure that the SPARQL queries are updated for the new version of the Ontology

The queries are most likely still using Ontology v1.0. Need to update to the newest version. Diagnostics: the results dataframe will be empty.

Age ensemble notebook

Create notebook with examples of moving from age ensembles + series to ensemble series:

From lipd file
From csv (maybe superfluous depending on NOAA format)
From pangaea
From noaa

Pretty faces for notebook authors

Our current model for author contributions comes from EarthCube, and let's just say it's not optimal:

Alexander James, Department of Earth Sciences, University of Southern California

Author = {"name": "Alexander James", "affiliation": "Department of Earth Sciences, University of Southern California", "email": "[email protected]", "orcid": "0000-0001-8561-3188"} 

Julien Emile-Geay, Department of Earth Sciences, University of Southern California 

Author = {"name": "Julien Emile-Geay", "affiliation": "Department of Earth Sciences, University of Southern California", "email": "[email protected]", "orcid": "0000-0001-5920-4751"}

On the other hand, it's possible to have our pretty avatars on there, as in Jordan's paleobook repo.

I suggest we do this for all the notebooks in this repo. @jordanplanders would it take you long?

Coherence/Wavelet

Repurpose the example from the documentation to create a tutorial on how to interpret coherence and wavelet plots.

main --> jupyterbook automation

updates are not automated; see if we can set up a GitHub actions to automatically update :

notebooks
environment
from the main branch to jupyterbook

Pyleoclim correlation methods

explain the different ways significance is assessed, especially over a grid.

Basic operations with MulitpleSeries/EnsembleSeries

common_time
stackplot (including showing how to change the labels for an existing Series/MS object)

colormap changes
Labels (none, some, changes)
shift factor.

For ensembleSeries

Envelope/traces

Issues in L0_paleopandas.ipynb

Series resampler does not work with pandas 2.1.3.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[25], line 1
----> 1 co2_5kavg = co2_5k.mean() # the aggregator here is simply the mean
      2 fig, ax = co2ts.plot(color='gray')
      3 co2_5kavg.plot(ax=ax,color='C1')         

File ~/opt/miniconda3/envs/pyleo/lib/python3.11/site-packages/pyleoclim/core/series.py:4436, in SeriesResampler.__getattr__(self, attr)
   4435 def __getattr__(self, attr):
-> 4436     attr = getattr(self.series.resample(self.rule,  **self.kwargs), attr)
   4437     def func(*args, **kwargs):
   4438         series = attr(*args, **kwargs)

File ~/opt/miniconda3/envs/pyleo/lib/python3.11/site-packages/pandas/core/generic.py:9771, in NDFrame.resample(self, rule, axis, closed, label, convention, kind, on, level, origin, offset, group_keys)
   9768 else:
   9769     convention = "start"
-> 9771 return get_resampler(
   9772     cast("Series | DataFrame", self),
   9773     freq=rule,
   9774     label=label,
   9775     closed=closed,
   9776     axis=axis,
   9777     kind=kind,
   9778     convention=convention,
   9779     key=on,
   9780     level=level,
   9781     origin=origin,
   9782     offset=offset,
   9783     group_keys=group_keys,
   9784 )

File ~/opt/miniconda3/envs/pyleo/lib/python3.11/site-packages/pandas/core/resample.py:2050, in get_resampler(obj, kind, **kwds)
   2046 """
   2047 Create a TimeGrouper and return our resampler.
   2048 """
   2049 tg = TimeGrouper(obj, **kwds)  # type: ignore[arg-type]
-> 2050 return tg._get_resampler(obj, kind=kind)

File ~/opt/miniconda3/envs/pyleo/lib/python3.11/site-packages/pandas/core/resample.py:2231, in TimeGrouper._get_resampler(self, obj, kind)
   2229 _, ax, _ = self._set_grouper(obj, gpr_index=None)
   2230 if isinstance(ax, DatetimeIndex):
-> 2231     return DatetimeIndexResampler(
   2232         obj,
   2233         timegrouper=self,
   2234         kind=kind,
   2235         axis=self.axis,
   2236         group_keys=self.group_keys,
   2237         gpr_index=ax,
   2238     )
   2239 elif isinstance(ax, PeriodIndex) or kind == "period":
   2240     if isinstance(ax, PeriodIndex):
   2241         # GH#53481

File ~/opt/miniconda3/envs/pyleo/lib/python3.11/site-packages/pandas/core/resample.py:187, in Resampler.__init__(self, obj, timegrouper, axis, kind, gpr_index, group_keys, selection, include_groups)
    182 self.include_groups = include_groups
    184 self.obj, self.ax, self._indexer = self._timegrouper._set_grouper(
    185     self._convert_obj(obj), sort=True, gpr_index=gpr_index
    186 )
--> 187 self.binner, self._grouper = self._get_binner()
    188 self._selection = selection
    189 if self._timegrouper.key is not None:

File ~/opt/miniconda3/envs/pyleo/lib/python3.11/site-packages/pandas/core/resample.py:252, in Resampler._get_binner(self)
    246 @final
    247 def _get_binner(self):
    248     """
    249     Create the BinGrouper, assume that self.set_grouper(obj)
    250     has already been called.
    251     """
--> 252     binner, bins, binlabels = self._get_binner_for_time()
    253     assert len(bins) == len(binlabels)
    254     bin_grouper = BinGrouper(bins, binlabels, indexer=self._indexer)

File ~/opt/miniconda3/envs/pyleo/lib/python3.11/site-packages/pandas/core/resample.py:1741, in DatetimeIndexResampler._get_binner_for_time(self)
   1739 if self.kind == "period":
   1740     return self._timegrouper._get_time_period_bins(self.ax)
-> 1741 return self._timegrouper._get_time_bins(self.ax)

File ~/opt/miniconda3/envs/pyleo/lib/python3.11/site-packages/pandas/core/resample.py:2329, in TimeGrouper._get_time_bins(self, ax)
   2326 binner, bin_edges = self._adjust_bin_edges(binner, ax_values)
   2328 # general version, knowing nothing about relative frequencies
-> 2329 bins = lib.generate_bins_dt64(
   2330     ax_values, bin_edges, self.closed, hasnans=ax.hasnans
   2331 )
   2333 if self.closed == "right":
   2334     labels = binner

File lib.pyx:891, in pandas._libs.lib.generate_bins_dt64()

ValueError: Values falls before first bin

Have tried @khider's suggestion to upgrade to the latest pandas from wheel (labeled 3.0.0 + gibberish) but it broke my environment (parasitic error messages popped up with every command, whether pandas related or not).

Proposed solutions:

try to revert to a version of pandas that would work (but preserve xarray compatibility ) OR
put a disclaimer that this is broken for now

Interpolation vs bin vs gkernel and effect on spectral analysis

Explore the difference between the different imputation method and effect on spectral analysis

Speed up the significant test

Hi,

Thanks for developing this wonderful tool for climate research.

I am using the following code to perform a significance test on wavelet coherence:
scal_sig2 = scal.signif_test(method='ar1sim', number=1000)
However, the process is extremely slow. I am wondering if it is possible to add a feature, such as parallel processing, to speed up the computation?

Best regards,
Cheng

Correlation notebook run time

The L2_correlations notebook is pretty heavy (minimum 37 minute run time, which really slows down CI), could be worth discussing moving to paleobooks and replacing with a more lightweight example.

Issues in L0_basic_MSES_manipulation.ipynb

in L0_basic_MSES_manipulation.ipynb ts_list_euro_coral is now an empty list, so all subsequent commands that use it fail. Appears to be a pylipd/ontology issue, with some keys no longer retrieving the information they used to.
in L1_working_with_age_ensembles.ipynb, notebook fails on this cell:

paleoValues = ens_df['ensembleVariableValues'][0][:,1:] #Drop the column that contains depth
paleoDepth = ens_df['ensembleDepthValues'][0]

value_name = "SST"
value_unit = "deg C"

chronValues = paleo_row['time_values'].to_numpy()[0]
chronDepth = paleo_row['depth_values'].to_numpy()[0]

time_name = 'Time'
time_unit = 'Years BP'

ReadMe file

I started a readme file, but it needs more precise guidance on:

installation using some environment.yml file (the one from Pyleoclim_util? Please confirm)
how to sign up for the Hub

I also need feedback on the various levels of tutorials.

We should probably acknowledge the PaleoCube grant there too

Intersectional issues

Notebooks need to be run and formatted so that the following are consistent throughout the JupyterBook:

presentation of authors
guidelines
watermarks
verbose=False

In addition, warnings need to be turned off.

Publication-ready figures

L0_a_quickstart.ipynb mentions "L1_publication_ready_figures.ipynb", but there is no such notebook. Do we want to create one, and if so, what figures would be most newsworthy?

Detrending

A helpful tutorial would be on detrending:

rehashing the Series.detrend() docstring, which is pretty complete at this point
illustrating how to use SSA for detrending.
effect of detrending on spectral/wavelet analysis

Filtering and detrending tutorial

Re-use from PaleoHack Notebook #3

Creating Multiple Panels

Create a tutorial showing how Matplotlib subplots can be used in combination with returned fig/ax from Pyleoclimn figures. An example is from the correlation figure in the Pyleoclim manuscript.

Include new wavelet/coherence capabilities

Recent updates to Pyleoclim allow to more flexibly specify the time and scale axes for Scalogram and Coherence objects. The Wavelet tutorial should illustrate how to use them.

When opening NOAA files using `read_csv` use comment instead of skip rows

The NOAA template uses # to denote the comment section vs data table. Instead of counting the rows and skipping, use the comment param in read_csv to open the file.

Issues in L0_working_with_geoseries.ipynb

cell 13 produces an illegible stackplot with 51 axes
pages2k.map() and next 2 maps have extra axes:
cell 21: elevation mappable hard to parse (0, 1500, 'Other')
ts.map_neighbors(pages2k) --> no neighbors
Loading NOAA Files (to be done) --> when are we doing it?

Include tutorials on how to get age ensembles quantiles for value and time and export to pandas, numpy

Following an open request for Pyleoclim, the quantiles function in EnsembleSeries can now support either calculating the quantiles along the value axis or the time axis.

In addition, the time/values can be exported to Pandas DataFrame or numpy arrays for further manipulation.

Suggest adding the new tutorials in L1_working_with_age_ensembles.

Update tutorials with new Series.surrogates

Todo:

new notebook on how to use surrogates
update correlation nb
update WTC in wavelet nb (phaseran and ar1sim)
update spectral notebook

Update basic tutorials

Somewhere in an old Pyleoclim version (or on @khider's hard drive), you will find a tutorial called pyleoclim_ui_tutorial.ipynb and one called plot_styles.ipynb. These tutorials need to be moved here and made to work with 0.8.1.

See LinkedEarth/Pyleoclim_util#216

showcase stripes() for Series and MultipleSeries in tutorials

L0_basic_ts_manipulation.ipynb
L0_basic_MSES_manipulation.ipynb

Series - How to create from various sources

We should have several notebooks illustrating how to get a Series (or LiPD Series) object from:

a csv through Pandas (show how to skip lines and enter headers; the LR04 dataset needs line skipping at the beginning)
a LiPD file (maybe use either ODP846 or Crystal Cave).

And show a simple plot

Update PCA notebook

the "data wrangling" section needs to showcase the newly developed MGS time/lat plot, and the soon-to-be developed Resolution class (and plot) for MultipleSeries.

Basic timeseries manipulation with Pyleoclim

Slicing
standadizing
gaussianize
stats

Optional (the functions needs some TLC)

Detecting gaps

Update SSA tutorial

Describe the SSA res object
For the missing values section, use the knee method

Working with LiPD files

For LiPD directory:

mapping

For LiPD files:

dashboard.

Also show how to manipulate and the use of various LiPDs to LiPDSeries transforms.