Giter Club home page Giter Club logo

lipd-utilities's People

Contributors

andrewdolman avatar chrismheiser avatar gavinsimpson avatar khider avatar nickmckay avatar routson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lipd-utilities's Issues

lipd.readLipd.py needs improvement -- here's some simple things!

@ericsteig writes:

The default for lipd.readLipdshould be to expect a filename, not open a GUI.

*The path should by default assume './'.

Details:

The default for lipd.readLipd is to open a GUI (which I would never want). Also, the GUI doesn't open on my machine. I don't care since I'll never use it!

I have a lipd file called GISP2.lpd.
If I'm already in the right directory, I should be able to read it with:

D = lipd.readLipd('GISP2.lpd')

but that doesn't work. I have to say:

D = lipd.readLipd('./GISP2.lpd')

which just is silly.

R: warn user about writing ageEnsemble data in paleoData

writeLipd() should warn people about writing ageEnsembles that have been mapped into paleoData. This is a common procedure in GeoChronR, and thus will come up, however it can greatly increase the size of the LiPD file, and is easily and quickly replicated upon loading with geoChronR::mapAgeEnsembleToPaleoData() .

Perhaps a warning, and then a yes/no about deleting the ageEnsemble from the paleoData?

Matlab: global qc sheet

we need a global qc sheet function in matlab, and the create and update functions should use the same translator

NOAA Updates

List of issues to address

1. Study name on the first line is missing. This name should be the same as Study_Name in the Title section (see #5).
Action : Python script has a function called __generate_study_name() that finds or creates a study name. If study name doesn't exist, it attempts to use <geo_siteName>.<pub0_pubYear>.<pub0_author>. If those keys don't exist, creating the study name fails. Need test files to see if they meet the requirements for creating the study name and/or why it may be failing in this function.

2. Online_Resource has the last part of URL repeated twice.
Action: Find why it's adding the URL twice.

3. The Online_Resource for LiPD files should be https://www1.ncdc.noaa.gov/pub/data/paleo/reconstructions/climate12k/temperature/version1.0.0/Temp12k_v1.0.0.LiPD. LiPD files will be in their own directory, separate from the NOAA Templates.
Action : Correct the online resource link template

4. Need Contribution_Date. This can be the same as the Modified_Date.
Action : Make the contribution date a timestamp of when the file is created.

5. Need Study_Name. If it is possible to programmatically generate a study name, it should generally follow: Where, When, What. We need to create this programmatically, maybe (geo_siteName + paleoData_minYear - paleoData_maxYear + pub1_title)? Might be weird sometimes, but something like that.
Action: This already exists. Refer to Issue # 1. Might be a bug or files are missing the necessary data.

6. Investigators are sometimes missing, and other times not consistently formatted (eg, missing first initial). Maybe just always pull this from pub1_authors?
Action: This already partially exists._ When investigators is empty, it creates the investigators field using the FIRST publication available with author data. Generally this is pub0. When the author entry is a list of authors, it will create the investigator string as "LastName; LastName;..." However, if the author data is a single string of multiple author names, it gets trickier. I'm not positive this case is working. Since sometimes investigators is missing completely, there may be a bug in this function.

7. Investigators should be split with semicolons instead of commas.
Action: The function mentioned in issue # 6 does this when generating investigators. However, this does not cover existing investigator data. I'll make a function to check existing data and format it as necessary.

8. Descriptions are random (eg, “Ian Walker (he could not send the data)” or “cannot validate elevation”). What do you think about a boilerplate description related to Temperature 12k here instead? WDS-Paleo could draft the description.
Action: Nick is handling this.

9. Some publications are missing. This should be fixed.
Action: Bug. Find out why.

10. Site_Names are missing.
Action: Check the mapping. Data may be getting lost.

11. Location is missing. The NASA GCMD location keywords (provided in Table S1) go in this field.
Action: Nick is handling this.

12. Many files are missing variable “what” terms. The shortname could be used for the “what.”
Action: Map the paleoData_variableName to "what"

13. Variables seasonality is missing.
Action: Possible mapping issue? Nick - "This should come from interpretation1_seasonality"

14. Variables C or N designation is mostly missing
Action : Autofill this based on a sample of the table column data.

15. Column headings in data table should be tab delimited (not space delimited).
Action: Fixed. Removed fixed 'spaces' spacing.

16. Shortnames listed in Variables section do not always match data column headings. This seems like it is usually caused by repeated shortnames (eg, d18O in "893A.Kennet.2007-1.txt")
Action: Need Lipd file to recreate the issue. Will investigate.

17. Data tables should not have # at the start of their lines.
Action: This is an ongoing design change that has switched. Formerly, it was requested to have #, then no #, then # again. Can remove.

18. Many variables that are uncertainties are either missing units or have units designated as “unitless” when they are not unitless (eg, file “Wonderkrater.Scott.2016-2.txt”)
Action: Nick is handling this. Data problem.

R: collapseTs : Build dataset structure based from TS data

  • Remove get_table
  • Remove get_crumbs
  • Build new dataset structure based on the paleoNumber, modelNumber, tableNumber, etc, and do not rely on original raw data. This is in case the user changes the amount of tables, switches a table's type, or other things that would alter the collapsed structure from the original structure.

filterTs() limitation

Function doesn't appear to work with the OR symbol.

EX) filterTs(TS, 'interpretation1_variable == M | interpretation1_variable == M')
returns list()

Python: Library separate from application?

Is it possible for a general developer-oriented LiPD library to be packaged separately from end-user-oriented LiPD utilities and applications?

I may have use for LiPD within a much larger analysis system. It would be nice to have a library or framework to write project-specific applications that read/write/parse lipid files as a standardized object.

collapseTs omitting some data

Issue submitted by Jessica via e-mail.

If I take a folder containing 2 lipd files, extract the time series (ts_list), and collapse the time series, then the new lipd files (two of them again) have lost some of the header information. This happens even when I don't add a new time series to ts_list. (Side note: when I do add a new time series to ts_list it does show up in the new lipd file after I use collapseTs.) Thus, it looks to me like collapseTs does not perfectly reverse the process of extractTs because header information is lost in one or both of these transformations. If this is intentional, then I'll need to add header information back into the new lipd files by grabbing it from the old lipd files. Let me know if this doesn't make sense and I'll use code to show you what I mean.

Tested it out and I was able to reproduce it as shown below. The following keys do not get collapsed properly: studyName, proxy, investigator, description. This may be true for other keys, but this is all that showed in this test.

screen shot 2018-10-23 at 2 53 03 pm

R: PAGES2k LiPD file loading issue

After pulling LiPD files off the linked earth wiki and trying to load using lipdR -

Do you want to load a single file (s) or directory (d)? s
[1] "reading: Arc-Agassiz.Vinther.2008.lpd"
[1] "Error: import_model: Error in idx_col_by_name(table): there should be a columns variable in here\n"

BadZipFile

Here is an issue submitted by my postdoc Michael Erb:

I'm having a problem opening a lipd file in python. I installed lipd in anaconda and downloaded this file: http://wiki.linked.earth/GeoB12610-2.Rippert.2015.

In python 3, I imported lipd and tried to use the lipd.readLipd(path) command, but I'm getting an error:

reading: GeoB12610-2.Rippert.2015.lpd
Traceback (most recent call last):
File "", line 1, in
File "/home/geovault-02/erbm/programs/anaconda2/envs/py35/lib/python3.5/site-packages/lipd/init.py", line 49, in readLipd
__read_file(usr_path, ".lpd")
File "/home/geovault-02/erbm/programs/anaconda2/envs/py35/lib/python3.5/site-packages/lipd/init.py", line 680, in __read_file
__universal_load(usr_path, file_type)
File "/home/geovault-02/erbm/programs/anaconda2/envs/py35/lib/python3.5/site-packages/lipd/init.py", line 640, in __universal_load
lipd_lib.read_lipd(file_meta)
File "/home/geovault-02/erbm/programs/anaconda2/envs/py35/lib/python3.5/site-packages/lipd/pkg_resources/lipds/LiPD_Library.py", line 231, in read_lipd
lipd_obj.read()
File "/home/geovault-02/erbm/programs/anaconda2/envs/py35/lib/python3.5/site-packages/lipd/pkg_resources/lipds/LiPD.py", line 56, in read
unzipper(self.name_ext, self.dir_tmp)
File "/home/geovault-02/erbm/programs/anaconda2/envs/py35/lib/python3.5/site-packages/lipd/pkg_resources/helpers/zips.py", line 37, in unzipper
with zipfile.ZipFile(name_ext) as f:
File "/home/geovault-02/erbm/programs/anaconda2/envs/py35/lib/python3.5/zipfile.py", line 1026, in init
self._RealGetContents()
File "/home/geovault-02/erbm/programs/anaconda2/envs/py35/lib/python3.5/zipfile.py", line 1093, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Any ideas? I tried a different lipd file and got the same result.

Python version: No module named 'download_lipd'

If I install the latest version (0.2.5.4) with pip install lipd, then it would be unable to import lipd. Error information below:

      6 import os
----> 7 import lipd as lpd
      8 import pandas as pd
      9 import numpy as np

~/.pyenv/versions/anaconda3-5.0.1/envs/py3.6/lib/python3.6/site-packages/LiPD-0.2.5.4-py3.6.egg/lipd/__init__.py in <module>()
     14 from lipd.regexes import re_url
     15 from lipd.fetch_doi import update_dois
---> 16 from download_lipd import download_from_url, get_download_path
     17 
     18 # Load stock modules

ModuleNotFoundError: No module named 'download_lipd'

I tried an older version (the commit at 2018-03-02 17:00), and it doesn't have this issue.

Can't have python files in the R/R folder

Convention (i.e. R CMD check) is that only R code should be in the R folder of a package. Currently there is a bam.py bagit.py file that is required by the package. I suggest this is moved to R-PKG-ROOT/exec/bam.py R-PKG-ROOT/exec/bagit.py and the R code calling it adjusted so that it knows about the new location.

LiPD utilities in Python not compatible with newer version of numpy

screen shot 2018-07-26 at 6 10 15 pm

import lipd as lpd
//anaconda/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
Traceback (most recent call last):
File "", line 1, in
File "//anaconda/lib/python3.5/site-packages/lipd/init.py", line 13, in
from lipd.json_viewer import viewLipd
File "//anaconda/lib/python3.5/site-packages/lipd/json_viewer.py", line 16, in
from PyQt5 import QtCore
ImportError: dlopen(//anaconda/lib/python3.5/site-packages/PyQt5/QtCore.so, 2): Symbol not found: _PySlice_AdjustIndices
Referenced from: //anaconda/lib/python3.5/site-packages/PyQt5/QtCore.so
Expected in: flat namespace
in //anaconda/lib/python3.5/site-packages/PyQt5/QtCore.so

R: failure to install lipdR in R v 3.6.2

When attempting to install lipdR from Github in R v 3.6.2 (MacBook): devtools::install_github("nickmckay/LiPD-Utilities", subdir = "R")

I get the following error:
Error: Failed to install 'lipdR' from GitHub:
(converted from warning) package ‘Smisc’ is not available (for R version 3.6.2)

R: collapseTs, collapsing data added into the time series.

Previously had an issue where data added into the time series was not being collapsed. That issue has been fixed, but that caused another issues. There is now only one column being collapsed "A" and none of the other columns exist.

R: extractTs/collapseTs bug. Losing data.

Nick -

bug report for lipdR:
extractTs(L,whichtables = “meas”,mode = “chron”)

in chron mode loses a whole bunch of data,
the geo, and maybe the paleoData too, and so the original cant be reconstructed
Dataset : hjort.Schmidt.2011.lpd

Matlab: More documentation

Matlab is lacking in the documentation department. Update the documentation website (the one connected to the repository) and the documentation within the package.

R: Geo linestring not supported

The excel template supports entering 4 unique coordinate values. N lat, S lat, W lon, E lon. Generally, only 2 coordinates have been used in most datasets so far, but now 4 unique coordinates are starting to appear. Python supports the creation of LiPD files with a linestring type, but R does not support reading those LiPD files properly.

The Error:

One longitude value occupies L$geo$geometry$coordinates$longitude, but then the same longitude value mistakenly overwrites the L$geo$geometry$coordinates$latitude value. The other 3 values are dropped.

PupukePiatrunia.2016_2.xlsx
Pupuke.Piatrunia.2016.zip

Python: possible to use nested dictionary keys in filterTS() criteria?

Hi,
I am using the LiPD utilities for Python and I would like to filter the a data set by the temporal resolution of the records. But the 'paleoData_hasResolution' is a dictionary in itself, and using the filterTs() command like this:

highres=lipd.filterTs(alldata,"paleoData_hasResolution['hasMedianValue']<5")

returns "Invalid input expression". Is it possible to use nested dictionary keys (not sure if that's the right term) in the filterTs() command?
I can probably find a way around that problem, but it would be so handy to use the filterTs for this.

Thank you!
Marlene

Allow URLs to LiPD files for readLipd

If a user provides a link to a LiPD file hosted online via LinkedEarth Wiki or other, then the utilities should be able to download the file in the background and read it into memory.

Python API: CSV name doesn't match filename

This error is coming back from lipd.validate(D) on the first try, directly after converting an excel file to LiPD file in python. All subsequent readLipd then validate come back passing

Why?

Because the excel template is using 1-indexed naming for it's data sheets, while the Utilities, and all other code, uses 0-indexed naming.

Example:

Excel Sheets:
paleo1measurementTable1
chron1measurementTable1

These sheets generate the filenames:
NWG-SL.Lasher.2017.chron1measurement1.csv for the table chron0measurement0
NWG-SL.Lasher.2017.paleo1measurement1.csv for the table paleo0measurement0

Why is it only happening when trying to validate directly after converting an excel file?

Because filenames and table names are not permanent. They are rewritten (to adhere to standard naming) every time you use writeLipd. excel() has a few major steps

  1. convert the excel to and write the LiPD file
  2. readLipd file into memory (the one created in step 1)
  3. writeLipd the LiPD data in memory back the disk (to save all the inferred data and file standardization corrections)
    The csv filenames in memory from step 2 are mismatched, and this is what goes to the validator. The filenames saved to file, in step 3, are corrected for next time.

R : collapseTs and calibration data

Calibration data doesn't process through collapseTs because it isn't indexed like interpretation. (ie. "interpretation1_seasonality" vs "calibration_uncertainty")

Should I make some rules to handle calibration as-is (unindexed) or should calibration data be indexed?

@nickmckay

Bulk DOI updater from memory

The DOI updater currently reads from a directory and updates LiPD files directly on disk, including overwriting. Switch this to work on LiPD files in memory, and store the results in memory instead.

Python2.7 lite

For backward compatibility with code written in 2.7, a lite version of the utilities that only allows to load the LiPD files into the workspace (loadLipds()) would be useful.

R: values not read in properly

Having an issue with the R utilities. This file appears mostly valid, but when it gets read into R, the chronTable (which is mostly, maybe entirely, NAs) doesn’t make it into the values in the list in R, which causes problems later.

It should just populate the same number of NAs into the values field

  • Nick

R : writeLipd not working

"Error: writeLipd: Error in basename(entry): object 'entry' not found\n"
Looks, like I’m able to write them one by one, just not a list of them. But I’m getting this warning:

Error appears while using OSX:
“Warning: OS - Windows. Unable to use bagit module on LiPD data. Skipping...”

  • Nick

Pypi readme publish error

Upload failed (400): The description failed to render in the default format of reStructuredText.

Nothing has changed, but for some reason the pypi package publishing for LiPD has stopped working. I removed the readme file for now until I find more info.

R: No automatic pcaMethod install

The pcaMethod package cannot be installed automatically during the geoChronR installation. This has to be done separately before installing geoChronR for unknown reasons.

There needs to be documentation stating that this is a known bug so users know how to work around it.

Code:
source(“https://bioconductor.org/biocLite.R“)
biocLite(“pcaMethods”)

python: age ensemble missing one member

When I open up a LiPD file in python using "lipd.readLipd", the first member of the age ensemble is missing. This can be seen when comparing the size of the age ensemble with the length of the "number" field. For example, there may be 1000 values in the "number" field, but only 999 members of the age ensemble. When opening up the file in Matlab, however, all members of the age ensemble are loaded. Unzipping the file in Windows also shows all members.

R : collapseTs bug when multiple paleoData

Per Nick

The error just comes if there are more than 1 paleo objects
Just extractTs, and then collapseTs to recreate the error seems unrelated to the model type

Debugging with "NamTreeRing021318.RData" file.

It's possible there is some issue with the paleoData objects loop, though it should theoretically be able to handle multiple paleoData objects.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.