Giter Club home page Giter Club logo

npyc-toolbox's People

Contributors

adwolfer avatar carolinesands avatar duibuqi avatar ghaggart avatar gscorreia89 avatar jaketmp avatar misch91 avatar nsadawi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

npyc-toolbox's Issues

Deprecation warning due to numpy.matlib on multivariateUtilities

There seems to be a deprecation warning caused by importing numpy.matlib python 3.9 with latest version of numpy. At the moment there is no error, but we should fix this as soon as possible.

Here is the warning, for reference.
"""
.../PycharmProjects/nPYc-Toolbox/nPYc/multivariate/multivariateUtilities.py:2: PendingDeprecationWarning:

Importing from numpy.matlib is deprecated since 1.19.0. The matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
"""

Refactor Targeted MS StudyDesign jsons to be more user friendly

At the moment the json SOP files for MS targeted assays are awkward to edit by most users. It would be better to implement the possibility of reading these also as CSV files so users could edit the compound names, equations etc on excel, or even merge this information with the compound Calibration report.

This will be something to do after refactoring TargetedDataset though.

Read acquisition parameters from mzML

    1. Implement a function to extract detector voltage, Acquisition Time and Data and other important acquisition parameters from .mzML.
  1. Test with mzML files from multiple MS vendors

Split NMR Targeted from LC-QqQ Targeted objects

The features for import and QC of both targeted NMR (Bruker ivdr methods) and LC-QqQ MS assays are using the same general TargetedDataset. However, NMR Targeted methods are conceptually very simple, while LC-QqQ require a set of specific extra attributes. Maintaining both features in a single object is making modification of targeted QC features much harder to debug and improve, so these should be split to different specific TargetedDataset objects (which might or not inherit from an abstract Targeted Dataset object).

Add 'Unknown' to Enumerations

https://npyc-toolbox.readthedocs.io/en/latest/enumerations.html

Some enumerations do not have some default values when the expected enum option is unknown.

ie, when importing sample metadata - if the 'Sample Type' is blank or not in the expected list (StudySample,StudyPool,ExternalReference,MethodReference,ProceduralBlank), it may be set incorrectly to StudySample - potentially making any downstream analysis imprecise or inaccurate.

Please add another Enum choice:

Unknown = 'Unknown' to all Enums with String Type values

Unknown = 0 to all Enums with Integer Type values

Please make sure in the toolbox code that whenever a blank or NA or null value is encountered, the correct Unknown is assigned.

`test_plotScores_raises` errors occasionally

Maybe an be an issue in pyChemometrics cv with small sample nos?

======================================================================
ERROR: test_plotScores_raises (test_plotting.test_plotting)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jtpearce/Dropbox (Personal)/Development/nPYc-toolbox/Tests/test_plotting.py", line 849, in test_plotScores_raises
    pcaModel = nPYc.multivariate.exploratoryAnalysisPCA(dataset)
  File "../nPYc/multivariate/exploratoryAnalysisPCA.py", line 86, in exploratoryAnalysisPCA
    raise exp
  File "../nPYc/multivariate/exploratoryAnalysisPCA.py", line 60, in exploratoryAnalysisPCA
    scree_cv = PCAmodel._screecv_optimize_ncomps(data, total_comps=maxComponents, stopping_condition=minQ2, **kwargs)
  File "/Users/jtpearce/Dropbox (Personal)/Development/pyChemometrics/pyChemometrics/ChemometricsPCA.py", line 572, in _screecv_optimize_ncomps
    currmodel.cross_validation(x, outputdist=False, cv_method=cv_method, press_impute=False)
  File "/Users/jtpearce/Dropbox (Personal)/Development/pyChemometrics/pyChemometrics/ChemometricsPCA.py", line 498, in cross_validation
    cv_loads.append(np.array([x[comp] for x in loadings]))
  File "/Users/jtpearce/Dropbox (Personal)/Development/pyChemometrics/pyChemometrics/ChemometricsPCA.py", line 498, in <listcomp>
    cv_loads.append(np.array([x[comp] for x in loadings]))
IndexError: index 8 is out of bounds for axis 0 with size 8

----------------------------------------------------------------------

Add the ability to concatenate datasets

Either horizontally:

  • Same samples, new features
  • Prob only makes sense for VariableType.Discrete datasets

Or vertically:

  • Same features, new samples

Can build on functionality already in TargetedDataset

Failure reading NMR data from different windows drives

The current release fails reading NMR data from different windows drives.

This seems to be caused by issues with relative paths in _getMetadataFromBruker:
localPath = os.path.normpath(os.path.join(os.path.relpath(path), inputFile))

ISATAB export hardcodes optional columns

These columns are optional in sampleMetadata and should not be hard-coded in ISATAB export.

  • 'Study'
  • 'Status'
  • 'Age'
  • 'Gender'
  • 'Sampling Date'
  • 'Acquired Time'
  • 'Assay data name'
  • 'Instrument'
  • 'Sample Batch'

Separate 'Feature ID' from 'Chemical/Compound' IDs and names

Refactor current featureMetadata fields to add improved "formal" fields for compound annotations.

The goal is to allow more text description of ion and compound ids at the same time. For example, the Feature Name/Unique ID could refer to a specific annotation: 'Histidine [M+H]+'. Then, Chemical/Compound ID and Name Fields could store unique identifiers about the annotated chemical compounds (Compound ID: CheBI/PubChem ID and 'Compound Name': Histidine.

This would require the following tasks:

  1. Refactor the feature unique primary ID 'Feature Name' to 'Feature ID'
  2. Keep 'Feature Name' as a text (or other) descriptor of a feature - ideally unique as well, but not required.
  3. Add a new 'Compound Name' and 'Compound ID' to store chemical names and their identifiers.

Idiosyncratic failure in `test_lineWidth_sf`

Failure occurs in approx 1% of runs:

======================================================================
ERROR: test_lineWidth_sf (test_utilities.test_utilities_linewidth)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jtpearce/Dropbox (Personal)/Development/nPYc-toolbox/Tests/test_utilities.py", line 1016, in test_lineWidth_sf
    calculatedLW = lineWidth(x, self.y, sf, [-5, 5], multiplicity='singlet')
  File "../nPYc/utilities/_lineWidth.py", line 29, in lineWidth
    fit = fitPeak(X, ppm, peakRange, multiplicity, parameters=parameters, maxLW=maxLW, estLW=estLW, shiftTollerance=shiftTollerance)
  File "../nPYc/utilities/_fitPeak.py", line 305, in fitPeak
    fit = peak.fit(spec, pars, x=localPPM)
  File "/Users/jtpearce/anaconda/lib/python3.6/site-packages/lmfit/model.py", line 736, in fit
    output.fit(data=data, weights=weights)
  File "/Users/jtpearce/anaconda/lib/python3.6/site-packages/lmfit/model.py", line 951, in fit
    _ret = self.minimize(method=self.method)
  File "/Users/jtpearce/anaconda/lib/python3.6/site-packages/lmfit/minimizer.py", line 1649, in minimize
    return function(**kwargs)
  File "/Users/jtpearce/anaconda/lib/python3.6/site-packages/lmfit/minimizer.py", line 1408, in leastsq
    eval_stderr(par, uvars, result.var_names, params)
  File "/Users/jtpearce/anaconda/lib/python3.6/site-packages/lmfit/minimizer.py", line 108, in eval_stderr
    uval = wrap_ueval(*uvars, _obj=obj, _names=_names, _pars=_pars)
  File "/Users/jtpearce/anaconda/lib/python3.6/site-packages/lmfit/uncertainties/__init__.py", line 696, in f_with_affine_output
    if arg.derivatives
  File "/Users/jtpearce/anaconda/lib/python3.6/site-packages/lmfit/uncertainties/__init__.py", line 492, in partial_derivative_of_f
    return (shifted_f_plus - shifted_f_minus)/2/step
ZeroDivisionError: float division by zero

----------------------------------------------------------------------

Friendly "includeFeature/Sample" synthax

Implementing an Include (Sample/Feature) method for Dataset objects which would erase the Excluded Details information, and allow re-inclusion directly based on name or other metadata (same syntax as excludeSample/Featurew)

Duplicated 'Feature Name' corrupts excludeFeatures()

Hi, nPYC team!

I encounter this issue when running LC-MS pipeline:

dataset.excludeFeatures(dataset.featureMetadata[dataset.featureMetadata['Retention Time'] < 0.6]['Feature Name'], on='Feature Name', message='Outside RT limits')

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Which arrises due to the fact that there are duplicated feature names, which are created when XCMS mz and rt values are merged into a string, there are 40 pairs of duplicate names in my dataset.

I'll fix it now by creating a new column 'Feature Name'.

Thanks!

Ability to export log at end of analysis

Hi NPC,

It would be great to be able to export the SOP object at the end of an analysis so a record of the overwritten method SOP options could be made. This would aid reproducibility and general record keeping of analyses. Essentially a json dump of the updated SOP attributes.

Thanks!

Add unittests and test data for Bruker ivDr version 2

There are no unitests covering parsing of Bruker ivDr BI-QUANT v2.0 files.
We should:

  1. Add BI-QUANT v2.0 files to the current NMR data in unitest data
  2. Replicate the existing unitests to cover the v2.0 files as well. For back compatibility, the current v1.0 tests should be kept.

use of 'nonposy' argument deprecated

The 'nonposy' argument in matplotlib axes set_xscale and set_yscale has been deprecated in favour of 'nonpositive'. This causes issues with many of the current plotting functions, for example, _plotTIC or _plotRDS.

Import of NMR data from compressed archives

The NMR data format produces a lot of small files which are very inconvenient for storage and data transfer. It would be good to add functionality to import a dataset directly from a .zip or other compressed archives.

Tutorial for LC-QqQ

There is a lot of LC-QqQ quality control functionality which is not represented in the tutorials. We also have no example dataset, so it would be good to:

  1. Acquire some LC-QqQ data for this purpose
  2. Add tutorials for LC-QqQ data import and QC.

Probably should be done more intensively after #34

Tutorial Improvements

Tutorial.rst is not as clear as it could be - possibly spilt import from NMR & MS specific parts?

Remove existing file deletion when multivariate report generated

Hi Team,

Let's discuss this, but it would be useful to save multiple versions of the MV analytical report (for example on different sample types) in the same folder. At the moment any previous files are deleted (line 121 in multivariateReport.py) when a new report is generated.

Cheers,
Caroline

Repetitive message when no dilution series are present.

Some of the reports output a text message (not a warning) when there are no linearity reference samples. This is particularly noticeable when running the feature selection report. A warning should be sent instead, which can then be suppressed when repeated.

sklearn future warning

The sklearn.decomposition.base module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.decomposition. Anything that cannot be imported from sklearn.decomposition is now part of the private API.

Import of XCMS data made with Peaktable excludes samples

My data table was made using XCMS's "peakTable" function. Importing this data using nPYc.MSDataset excludes the first 5 samples in my case due to row 495 def _loadXCMSDataset(self, path, noFeatureParams=14):

This number varies depending on the number of classes in the data and if peakTable or diffreport is used. This might need to be specified in order to reduce import issues and inadvertent data exclusion.

Summary Table of QC in NMR feature summary

When repeating the 'feature summary' report following sample exclusions with updateMasks or manually in an NMRDataset the second time the final summary table contains wrong information.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.