Giter Club home page Giter Club logo

pyleotutorials's People

Contributors

alexkjames avatar aragath avatar commonclimate avatar jordanplanders avatar khider avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pyleotutorials's Issues

tutorial on how to edit subplots

Hi all - would love to see a tutorial on how to edit subplots of a pyleoclim figure. Specifically, I want to override a PCA plot so I can plot my own loadings.

Right now, the way I'm handling this is by calling

fig, gs = pca.modeplot()
ax = subplot(gs[3,1])

and then plotting everything to ax. gs is a GridSpec and fig is a Figure.

Spectral Analysis

Show how the various methods work and the effect of parameters (e.g, c for WWZ, n50 for Lomb-Scargle, BW for MTM, windows for Welch), the frequency vector options + effect of detrending/binning.

Highlighting intervals tutorial

Build a tutorial showing how to highlight intervals:

  • overlay
  • above
  • below

with labels:

  • added to the legend (e.g. if there are categories of labels)
  • per interval with pointers

Bonus:

  • labelling specific points on a figure
  • making an event timeline?

Fodder: geologic timescale (Cenozoic), MIS

Summary plots

Work through an example to "pretty up" a summary plot for publication, with the various options. Can reuse the example in the Pyleoclim manuscript using Nino3.4 data.

It would be interesting to first show the default plot and look at ways to change some attributes.

QuanSight work showcase

Going with Pyleoclim release 0.11.0, we need to showcase the multiple improvements brought about by the work of the QuanSight team.

  • all tutorials should start using load_dataset() whenever possible
  • the pandas work per se should go into a dedicated L0 notebook called "Pandas in Pyleoclim". This will repurpose the first half of the paleopandas playground.
  • the work on MultipleSeries overloaded methods (+, -, &) should go into “L0_basic_MSES_manipulation.ipynb”. This will repurpose the second half of the paleopandas playground

Enable CI for the notebooks

We should be able to re-use the notebooks to run the "tests". Only thing to be careful about is that some of the notebooks require user input (i.e., choose your timeseries from a lipd file). So this particular line for the tests notebook would have to be rewritten to use the number parameter.

I propose to create a test folder that contain the test notebooks that are updated for no user inputs. Or we modify to not allow for user inputs. This problem should disappear once we integrate pylipd.

pyLipd integration into tutorials

L1_working_with_LiPD.ipynb should be updated to reflect pyLipd updates, when the latter are ready for prime time.

Note: Euro2k example can go either there or in the MSES tutorial

v1.0.0 remaining bugs

  • change kernel from pyleo to Python 3 (ipykernel) in all notebooks
  • reduce number of sims in wavelet notebook to fit under the 1h limit
  • L1_working_with_age_ensembles.ipynb indexing issue
  • L1_surrogates missing imports

indexing issue in from_PaleoEnsembleArray

Currently, what is holding up the completion of L1_working_with_age_ensembles.ipynb is this line:

ensemble = pyleo.EnsembleGeoSeries.from_PaleoEnsembleArray(geo_series = ts, paleo_array = paleoValues, age_depth = ts.depth, paleo_depth = paleoDepth)

Traceback:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[26], line 8
      1 ts = pyleo.GeoSeries(time = chronValues, value = paleo_row['paleoData_values'].iloc[0], 
      2                      time_name =  time_name, time_unit = time_unit,
      3                      value_name = value_name, value_unit = value_unit,
      4                      label = 'MD98-2181', archiveType = 'Marine sediment',
      5                      depth = chronDepth, depth_name = 'Depth', depth_unit = 'cm', lat = paleo_df['geo_meanLat'].iloc[0],
      6                     lon = paleo_df['geo_meanLon'].iloc[0])
----> 8 ensemble = pyleo.EnsembleGeoSeries.from_PaleoEnsembleArray(geo_series = ts, paleo_array = paleoValues, age_depth = ts.depth, paleo_depth = paleoDepth)

File [~/Documents/GitHub/Pyleoclim_util/pyleoclim/core/ensemblegeoseries.py:441](http://localhost:60043/doc/tree/~/Documents/GitHub/Pyleoclim_util/pyleoclim/core/ensemblegeoseries.py#line=440), in EnsembleGeoSeries.from_PaleoEnsembleArray(self, geo_series, paleo_array, paleo_depth, age_depth, extrapolate, verbose)
    438         raise ValueError("Age depth and series time need to have the same length")
    440     #Interpolate the age array to the value depth
--> 441     mapped_paleo = lipdutils.mapAgeEnsembleToPaleoData(
    442         ensembleValues=paleo_array, 
    443         depthEnsemble=paleo_depth, 
    444         depthMapping=age_depth,
    445         extrapolate=extrapolate
    446     )
    448 series_list = []
    450 #check that mapped_age and the original time vector are similar, and that the object is not a geoseries object

File [~/Documents/GitHub/Pyleoclim_util/pyleoclim/utils/lipdutils.py:1187](http://localhost:60043/doc/tree/~/Documents/GitHub/Pyleoclim_util/pyleoclim/utils/lipdutils.py#line=1186), in mapAgeEnsembleToPaleoData(ensembleValues, depthEnsemble, depthMapping, extrapolate)
   1184 depthMapping = np.array(depthMapping)
   1186 #Interpolate
-> 1187 ensembleValuesMapped = np.zeros((len(depthMapping),np.shape(ensembleValues)[1])) #placeholder
   1189 if extrapolate is True:
   1190     for i in np.arange(0,np.shape(ensembleValues)[1]):

IndexError: tuple index out of range

@alexkjames I think you were the one who wrote this Pyleoclim function - can you please take a look?

Age ensemble notebook

Create notebook with examples of moving from age ensembles + series to ensemble series:

  • From lipd file
  • From csv (maybe superfluous depending on NOAA format)
  • From pangaea
  • From noaa

Pretty faces for notebook authors

Our current model for author contributions comes from EarthCube, and let's just say it's not optimal:

Alexander James, Department of Earth Sciences, University of Southern California

Author = {"name": "Alexander James", "affiliation": "Department of Earth Sciences, University of Southern California", "email": "[email protected]", "orcid": "0000-0001-8561-3188"} 

Julien Emile-Geay, Department of Earth Sciences, University of Southern California 

Author = {"name": "Julien Emile-Geay", "affiliation": "Department of Earth Sciences, University of Southern California", "email": "[email protected]", "orcid": "0000-0001-5920-4751"}

On the other hand, it's possible to have our pretty avatars on there, as in Jordan's paleobook repo.

I suggest we do this for all the notebooks in this repo. @jordanplanders would it take you long?

Coherence/Wavelet

Repurpose the example from the documentation to create a tutorial on how to interpret coherence and wavelet plots.

main --> jupyterbook automation

updates are not automated; see if we can set up a GitHub actions to automatically update :

  • notebooks
  • environment
    from the main branch to jupyterbook

Issues in L0_paleopandas.ipynb

Series resampler does not work with pandas 2.1.3.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[25], line 1
----> 1 co2_5kavg = co2_5k.mean() # the aggregator here is simply the mean
      2 fig, ax = co2ts.plot(color='gray')
      3 co2_5kavg.plot(ax=ax,color='C1')         

File ~/opt/miniconda3/envs/pyleo/lib/python3.11/site-packages/pyleoclim/core/series.py:4436, in SeriesResampler.__getattr__(self, attr)
   4435 def __getattr__(self, attr):
-> 4436     attr = getattr(self.series.resample(self.rule,  **self.kwargs), attr)
   4437     def func(*args, **kwargs):
   4438         series = attr(*args, **kwargs)

File ~/opt/miniconda3/envs/pyleo/lib/python3.11/site-packages/pandas/core/generic.py:9771, in NDFrame.resample(self, rule, axis, closed, label, convention, kind, on, level, origin, offset, group_keys)
   9768 else:
   9769     convention = "start"
-> 9771 return get_resampler(
   9772     cast("Series | DataFrame", self),
   9773     freq=rule,
   9774     label=label,
   9775     closed=closed,
   9776     axis=axis,
   9777     kind=kind,
   9778     convention=convention,
   9779     key=on,
   9780     level=level,
   9781     origin=origin,
   9782     offset=offset,
   9783     group_keys=group_keys,
   9784 )

File ~/opt/miniconda3/envs/pyleo/lib/python3.11/site-packages/pandas/core/resample.py:2050, in get_resampler(obj, kind, **kwds)
   2046 """
   2047 Create a TimeGrouper and return our resampler.
   2048 """
   2049 tg = TimeGrouper(obj, **kwds)  # type: ignore[arg-type]
-> 2050 return tg._get_resampler(obj, kind=kind)

File ~/opt/miniconda3/envs/pyleo/lib/python3.11/site-packages/pandas/core/resample.py:2231, in TimeGrouper._get_resampler(self, obj, kind)
   2229 _, ax, _ = self._set_grouper(obj, gpr_index=None)
   2230 if isinstance(ax, DatetimeIndex):
-> 2231     return DatetimeIndexResampler(
   2232         obj,
   2233         timegrouper=self,
   2234         kind=kind,
   2235         axis=self.axis,
   2236         group_keys=self.group_keys,
   2237         gpr_index=ax,
   2238     )
   2239 elif isinstance(ax, PeriodIndex) or kind == "period":
   2240     if isinstance(ax, PeriodIndex):
   2241         # GH#53481

File ~/opt/miniconda3/envs/pyleo/lib/python3.11/site-packages/pandas/core/resample.py:187, in Resampler.__init__(self, obj, timegrouper, axis, kind, gpr_index, group_keys, selection, include_groups)
    182 self.include_groups = include_groups
    184 self.obj, self.ax, self._indexer = self._timegrouper._set_grouper(
    185     self._convert_obj(obj), sort=True, gpr_index=gpr_index
    186 )
--> 187 self.binner, self._grouper = self._get_binner()
    188 self._selection = selection
    189 if self._timegrouper.key is not None:

File ~/opt/miniconda3/envs/pyleo/lib/python3.11/site-packages/pandas/core/resample.py:252, in Resampler._get_binner(self)
    246 @final
    247 def _get_binner(self):
    248     """
    249     Create the BinGrouper, assume that self.set_grouper(obj)
    250     has already been called.
    251     """
--> 252     binner, bins, binlabels = self._get_binner_for_time()
    253     assert len(bins) == len(binlabels)
    254     bin_grouper = BinGrouper(bins, binlabels, indexer=self._indexer)

File ~/opt/miniconda3/envs/pyleo/lib/python3.11/site-packages/pandas/core/resample.py:1741, in DatetimeIndexResampler._get_binner_for_time(self)
   1739 if self.kind == "period":
   1740     return self._timegrouper._get_time_period_bins(self.ax)
-> 1741 return self._timegrouper._get_time_bins(self.ax)

File ~/opt/miniconda3/envs/pyleo/lib/python3.11/site-packages/pandas/core/resample.py:2329, in TimeGrouper._get_time_bins(self, ax)
   2326 binner, bin_edges = self._adjust_bin_edges(binner, ax_values)
   2328 # general version, knowing nothing about relative frequencies
-> 2329 bins = lib.generate_bins_dt64(
   2330     ax_values, bin_edges, self.closed, hasnans=ax.hasnans
   2331 )
   2333 if self.closed == "right":
   2334     labels = binner

File lib.pyx:891, in pandas._libs.lib.generate_bins_dt64()

ValueError: Values falls before first bin

Have tried @khider's suggestion to upgrade to the latest pandas from wheel (labeled 3.0.0 + gibberish) but it broke my environment (parasitic error messages popped up with every command, whether pandas related or not).

Proposed solutions:

  • try to revert to a version of pandas that would work (but preserve xarray compatibility ) OR
  • put a disclaimer that this is broken for now

Speed up the significant test

Hi,

Thanks for developing this wonderful tool for climate research.

I am using the following code to perform a significance test on wavelet coherence:
scal_sig2 = scal.signif_test(method='ar1sim', number=1000)
However, the process is extremely slow. I am wondering if it is possible to add a feature, such as parallel processing, to speed up the computation?

Best regards,
Cheng

Correlation notebook run time

The L2_correlations notebook is pretty heavy (minimum 37 minute run time, which really slows down CI), could be worth discussing moving to paleobooks and replacing with a more lightweight example.

Issues in L0_basic_MSES_manipulation.ipynb

  • in L0_basic_MSES_manipulation.ipynb ts_list_euro_coral is now an empty list, so all subsequent commands that use it fail. Appears to be a pylipd/ontology issue, with some keys no longer retrieving the information they used to.

  • in L1_working_with_age_ensembles.ipynb, notebook fails on this cell:

paleoValues = ens_df['ensembleVariableValues'][0][:,1:] #Drop the column that contains depth
paleoDepth = ens_df['ensembleDepthValues'][0]

value_name = "SST"
value_unit = "deg C"

chronValues = paleo_row['time_values'].to_numpy()[0]
chronDepth = paleo_row['depth_values'].to_numpy()[0]

time_name = 'Time'
time_unit = 'Years BP'

ReadMe file

I started a readme file, but it needs more precise guidance on:

  1. installation using some environment.yml file (the one from Pyleoclim_util? Please confirm)
  2. how to sign up for the Hub

I also need feedback on the various levels of tutorials.

We should probably acknowledge the PaleoCube grant there too

Intersectional issues

Notebooks need to be run and formatted so that the following are consistent throughout the JupyterBook:

  • presentation of authors
  • guidelines
  • watermarks
  • verbose=False

In addition, warnings need to be turned off.

Publication-ready figures

L0_a_quickstart.ipynb mentions "L1_publication_ready_figures.ipynb", but there is no such notebook. Do we want to create one, and if so, what figures would be most newsworthy?

Detrending

A helpful tutorial would be on detrending:

  • rehashing the Series.detrend() docstring, which is pretty complete at this point
  • illustrating how to use SSA for detrending.
  • effect of detrending on spectral/wavelet analysis

Creating Multiple Panels

Create a tutorial showing how Matplotlib subplots can be used in combination with returned fig/ax from Pyleoclimn figures. An example is from the correlation figure in the Pyleoclim manuscript.

Include new wavelet/coherence capabilities

Recent updates to Pyleoclim allow to more flexibly specify the time and scale axes for Scalogram and Coherence objects. The Wavelet tutorial should illustrate how to use them.

Issues in L0_working_with_geoseries.ipynb

  • cell 13 produces an illegible stackplot with 51 axes

  • pages2k.map() and next 2 maps have extra axes:
    download

  • cell 21: elevation mappable hard to parse (0, 1500, 'Other')

  • ts.map_neighbors(pages2k) --> no neighbors

  • Loading NOAA Files (to be done) --> when are we doing it?

Series - How to create from various sources

We should have several notebooks illustrating how to get a Series (or LiPD Series) object from:

  1. a csv through Pandas (show how to skip lines and enter headers; the LR04 dataset needs line skipping at the beginning)
  2. a LiPD file (maybe use either ODP846 or Crystal Cave).

And show a simple plot

Update PCA notebook

the "data wrangling" section needs to showcase the newly developed MGS time/lat plot, and the soon-to-be developed Resolution class (and plot) for MultipleSeries.

Update SSA tutorial

  • Describe the SSA res object
  • For the missing values section, use the knee method

Working with LiPD files

For LiPD directory:

  • mapping

For LiPD files:

  • dashboard.

Also show how to manipulate and the use of various LiPDs to LiPDSeries transforms.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.