ioos / compliance-checker Goto Github PK

Python tool to check your datasets against compliance standards

Home Page: http://ioos.github.io/compliance-checker/

License: Apache License 2.0

Python 98.64% Jinja 1.36%

compliance-checker's Introduction

IOOS Compliance Checker

The IOOS Compliance Checker is a python based tool for data providers to check for completeness and community standard compliance of local or remote netCDF files against CF and ACDD file standards. The python module can be used as a command-line tool or as a library that can be integrated into other software.

A web-based version of the Compliance Checker was developed to enable a broader audience and improve accessibility for the checker. With the web version, providers can simply provide a link or upload their datasets and get the full suite of capabilities that Compliance Checker offers.

It currently supports the following sources and standards:

Standard	Source	.nc/OPeNDAP/.cdl	SOS
ACDD (1.1, 1.3)	Built-in	X	-
CF (1.9)	Built-in	X	-
CF (1.8)	Built-in	X	-
CF (1.7)	Built-in	X	-
CF (1.6)	Built-in	X	-
IOOS SOS	Built-in	-	GetCapabilities, DescribeSensor
IOOS (1.1)	Built-in	X	-
IOOS (1.2)	Built-in	X	-
Glider DAC	ioos/cc-plugin-glider	X	-
NCEI (1.1, 2.0)	ioos/cc-plugin-ncei	X	-

Advice to data providers

While the command-line version of this tool can be run in a loop, it is not necessary to check every file if they are all created the same way. In short, this tool is not meant for identifying bugs in your data processing stream. It is, however, intended to help you identify your process procedure compliance to the standards. If you change your processing procedure for any reason it would be worth your while to run one file through the Compliance Checker to insure you procedure change does not impact your file’s compliance.

If you feel you will need to run a batch of files through the Compliance Checker, please contact the IOOS Program Office Operations Division for assistance.

The Compliance Checker Web Tool

The IOOS Compliance Checker front end companion.

https://compliance.ioos.us/index.html

Source Code is available on GitHub:

https://github.com/ioos/compliance-checker-web

Usage

Select the test you want to run from the dropdown menu. Then, either upload your dataset or provide a url to a remote dataset (OPeNDAP) and click 'Submit'.

The output of the Compliance Checker will give you a comprehensive list of issues and the actions needed to correct them. You may download the Compliance Checker report as a text file by clicking the 'Download Report' button

API

In addition to a web-based front-end for the IOOS Compliance Checker project, an API is provided for users interested in batch processing files hosted via OPeNDAP. Details on how to use the API are available on the Compliance Checker Web wiki page.

Here are a couple examples:

HTML Output

https://compliance.ioos.us/index.htmlapi/run?report_format=html&test=acdd&url=http://sos.maracoos.org/stable/dodsC/hrecos/stationHRMARPH-agg.ncml

JSON Output

https://compliance.ioos.us/index.htmlapi/run?report_format=json&test=acdd&url=http://sos.maracoos.org/stable/dodsC/hrecos/stationHRMARPH-agg.ncml

The Compliance Checker Command Line Tool

Concepts & Terminology

Each compliance standard is executed by a Check Suite, which functions similar to a Python standard Unit Test. A Check Suite runs checks against a dataset based on a metadata standard, returning a list of Results which are then aggregated into a summary.

Each Result has a (# passed / # total) score, a weight (HIGH/MEDIUM/LOW), a computer-readable name, an optional list of human-readable messages, and optionally a list of child Results.

A single score is then calculated by aggregating on the names, then multiplying the score by the weight and summing them together.

The computer-readable name field controls how Results are aggregated together - in order to prevent the overall score for a Check Suite varying on the number of variables, it is possible to group Results together via the name property. Grouped results will only add up to a single top-level entry.

See the Development wiki page for more details on implementation.

Installation

Check out the Installation wiki for instructions on how to install.

Command Line Usage

The compliance-checker can work against local files (.nc files, .cdl metadata files, .xml files of SOS GetCapabilities/DescribeSensor requests) or against remote URLs (OPeNDAP data URLs, SOS GetCapabilities/DescribeSensor URLs).

If you are aiming to check a netCDF-dump, also known as a CDL file, the file must be named to end with a .cdl for the check-suite to be able to correctly parse it's contents.

WARNING The CF/ACDD checks will access data, so if using a remote OPeNDAP URL, please be sure the size is reasonable!

usage: cchecker.py [-h] [--test TEST] [--criteria [{lenient,normal,strict}]]
                   [--verbose] [--describe-checks] [--skip-checks SKIP_CHECKS]
                   [-f {text,html,json,json_new}] [-o OUTPUT] [-O OPTION] [-V]
                   [-l] [-d DOWNLOAD_STANDARD_NAMES]
                   [dataset_location [dataset_location ...]]

positional arguments:
  dataset_location      Defines the location of the dataset to be checked.
                        The location can be a local netCDF file, a remote
                        OPeNDAP endpoint, a remote netCDF file which returns
                        content-type header of 'application/x-netcdf', or an
                        ERDDAP TableDAP endpoint. Note that the ERDDAP TableDAP
                        endpoint will currently attempt to fetch the entire
                        TableDAP dataset.


optional arguments:
  -h, --help            show this help message and exit
  --test TEST, -t TEST, --test= TEST, -t= TEST
                        Select the Checks you want to perform. Defaults to
                        'acdd' if unspecified. Versions of standards can be
                        specified via `-t <test_standard>:<version>`. If
                        `<version>` is omitted, or is "latest", the latest
                        version of the test standard is used.
  --criteria [{lenient,normal,strict}], -c [{lenient,normal,strict}]
                        Define the criteria for the checks. Either Strict,
                        Normal, or Lenient. Defaults to Normal.
  --verbose, -v         Increase output. May be specified up to three times.
  --describe-checks, -D
                        Describes checks for checkers specified using `-t`. If
                        `-t` is not specified, lists checks from all available
                        checkers.
  --skip-checks SKIP_CHECKS, -s SKIP_CHECKS
                        Specifies tests to skip. Can take the form of either
                        `<check_name>` or `<check_name>:<skip_level>`. The
                        first form skips any checks matching the name. In the
                        second form <skip_level> may be specified as "A", "M",
                        or "L". "A" skips all checks and is equivalent to
                        calling the first form. "M" will only show high
                        priority output from the given check and will skip
                        medium and low. "L" will show both high and medium
                        priority issues, while skipping low priority issues.
  -f {text,html,json,json_new}, --format {text,html,json,json_new}
                        Output format(s). Options are 'text', 'html', 'json',
                        'json_new'. The difference between the 'json' and the
                        'json_new' formats is that the 'json' format has the
                        check as the top level key, whereas the 'json_new'
                        format has the dataset name(s) as the main key in the
                        output follow by any checks as subkeys. Also, 'json'
                        format can be only be run against one input file,
                        whereas 'json_new' can be run against multiple files.
  -o OUTPUT, --output OUTPUT
                        Output filename(s). If '-' is supplied, output to
                        stdout. Can either be one or many files. If one file
                        is supplied, but the checker is run against many
                        files, all the output from the checks goes to that
                        file (does not presently work with 'json' format). If
                        more than one output file is supplied, the number of
                        input datasets supplied must match the number of
                        output files.
  -O OPTION, --option OPTION
                        Additional options to be passed to the checkers.
                        Multiple options can be specified via multiple
                        invocations of this switch. Options should be prefixed
                        with a the checker name followed by the option, e.g.
                        '<checker>:<option_name>' Available options:
                        'cf:enable_appendix_a_checks' - Allow check results
                        against CF Appendix A for attribute location and data
                        types.

  -V, --version         Display the IOOS Compliance Checker version
                        information.
  -l, --list-tests      List the available tests
  -d DOWNLOAD_STANDARD_NAMES, --download-standard-names DOWNLOAD_STANDARD_NAMES
                        Specify a version of the cf standard name table to
                        download as packaged version. Either specify a version
                        number (e.g. "72") to fetch a specific version or
                        "latest" to get the latest CF standard name table.

Examples

Check a local file against CF 1.6

compliance-checker --test=cf:1.6 compliance_checker/tests/data/examples/hycom_global.nc

--------------------------------------------------------------------------------
                         IOOS Compliance Checker Report
                                  cf:1.6 check
--------------------------------------------------------------------------------
                               Corrective Actions
hycom_global.nc has 9 potential issues


                                     Errors
--------------------------------------------------------------------------------
Name                                      Reasoning
§3.2 Either long_name or standard_name    Attribute long_name or/and standard_name
is highly recommended for variable time:  is highly recommended for variable time
§4.3.1 depth is a valid vertical          vertical coordinates not defining
coordinate:                               pressure must include a positive
                                          attribute that is either 'up' or 'down'


                                    Warnings
--------------------------------------------------------------------------------
Name                                   Reasoning
§2.6.1 Global Attribute Conventions    Conventions global attribute does not
includes CF-1.6:                       contain "CF-1.6". The CF Checker only
                                       supports CF-1.6 at this time.
§2.6.2 Recommended Attributes:         institution should be defined source
                                       should be defined references should be
                                       defined
§2.6.2 Recommended Global Attributes:  global attribute history should exist
                                       and be a non-empty string
§8.1 Packed Data defined by water_u    Attributes add_offset and scale_factor
contains valid packing:                are not of type float or double.
§8.1 Packed Data defined by water_v    Attributes add_offset and scale_factor
contains valid packing:                are not of type float or double.

Check a remote file against ACDD 1.3

The remote dataset url is taken from the Data URL section of an OPeNDAP endpoint.

compliance-checker --test=acdd:1.3 "http://sos.maracoos.org/stable/dodsC/hrecos/stationHRMARPH-agg.ncml"

Checking against remote ERDDAP Datasets

ERDDAP datasets are becoming a popular way to access data. Supply an ERDDAP TableDAP or GridDAP URL to the checker:

compliance-checker --test ioos:1.2 "https://pae-paha.pacioos.hawaii.edu/erddap/griddap/pibhmc_bathy_60m_guam"

Ensure to supply the URL without the format extension at the end (no .nc, .ncCF, etc.).

Some examples of ERDDAP datasets:

Write results to text file

compliance-checker --test=acdd:1.3 --format=text --output=/tmp/report.txt compliance_checker/tests/data/examples/hycom_global.nc

Write results to JSON file

compliance-checker --test=acdd:1.3 --format=json --output=/tmp/report.json compliance_checker/tests/data/examples/hycom_global.nc

Write results to HTML file

compliance-checker --test=acdd:1.3 --format=html --output=/tmp/report.html compliance_checker/tests/data/examples/hycom_global.nc

Output text from multiple input files to one output file

compliance-checker --test=cf:1.6 --format text --output=/tmp/combined_output.txt compliance_checker/tests/data/examples/hycom_global.nc compliance_checker/tests/data/examples/ww3.nc

Output html and text files from multiple input files (part 1)

In this case you'll get 2 files /tmp/combined_output.txt and /tmp/combined_output.html that contain cf check results for both input files because you only specified 1 output filename.

compliance-checker --test=cf:1.6 --format text --format html --output=/tmp/combined_output.txt compliance_checker/tests/data/examples/hycom_global.nc compliance_checker/tests/data/examples/ww3.nc

Output html and text files from multiple input files (part 2)

In this case you'll get 4 files /tmp/hycom.txt, /tmp/hycom.html, /tmp/ww3.txt, and /tmp/ww3.html that contain cf check results because you specified as many output filenames as input filenames.

compliance-checker --test=cf:1.6 --format text --format html --output=/tmp/hycom.txt --output=/tmp/ww3.txt compliance_checker/tests/data/examples/hycom_global.nc compliance_checker/tests/data/examples/ww3.nc

Download a particular CF standard names table for use in the test

Note During the CF test, if a file has a particular version of the cf standard name table specified in the global attributes (i.e. :standard_name_vocabulary = "CF Standard Name Table v30" ;) that doesn't match the packaged version, it will try to download the specified version. If it fails, it will fall back to packaged version.

compliance-checker -d 35

Downloading cf-standard-names table version 35 from: http://cfconventions.org/Data/cf-standard-names/35/src/cf-standard-name-table.xml


Alternatively, you can specify an absolute path to a standard name table you may have locally in an environment variable named CF_STANDARD_NAME_TABLE and the compliance checker will use that version instead.


## Python Usage

If you are interested in incorporating the IOOS Compliance Checker into your own python projects, check out the following python code example:
```python
from compliance_checker.runner import ComplianceChecker, CheckSuite

# Load all available checker classes
check_suite = CheckSuite()
check_suite.load_all_available_checkers()

# Run cf and adcc checks
path = '/path/or/url/to/your/dataset'
checker_names = ['cf', 'acdd']
verbose = 0
criteria = 'normal'
output_filename = '/output/report.json'
output_format = 'json'
"""
Inputs to ComplianceChecker.run_checker

path            Dataset location (url or file)
checker_names   List of string names to run, should match keys of checkers dict (empty list means run all)
verbose         Verbosity of the output (0, 1, 2)
criteria        Determines failure (lenient, normal, strict)
output_filename Path to the file for output
output_format   Format of the output

@returns                If the tests failed (based on the criteria)
"""
return_value, errors = ComplianceChecker.run_checker(path,
                                                     checker_names,
                                                     verbose,
                                                     criteria,
                                                     output_filename=output_filename,
                                                     output_format=output_format)

# Open the JSON output and get the compliance scores
with open(output_filename, 'r') as fp:
    cc_data = json.load(fp)
    scored = cc_data[cc_test[0]]['scored_points']
    possible = cc_data[cc_test[0]]['possible_points']
    log.debug('CC Scored {} out of {} possible points'.format(scored, possible))

Compliance Checker Plug-Ins

Separate Plug-ins have been developed to complement the Compliance Checker tool with specifications for preparing data to be submitted to different data assembly centers. The version numbering of these plug-ins are not necessarily link to the version of the Compliance Checker, but they are all designed to run with the Compliance Checker tool.

Current Plug-in Releases:

GliderDAC

This is a checker for GliderDAC files

NCEI - link

This is a checker for NCEI netCDF Templates v1.1 and v2.0 files.

These plug-ins must be installed separately but work on top of the base compliance checker software.

pip install cc-plugin-ncei

Check to see if it installed correctly, list the tests:

compliance-checker -l

You should see

 IOOS compliance checker available checker suites (code version):
 - ncei-grid (2.1.0)
 - ncei-grid:1.1 (2.1.0)
 - ncei-grid:2.0 (2.3.0)
 - ncei-grid:latest (2.1.0)
 - ncei-point (2.3.0)
 - ncei-point:1.1 (2.1.0)
 - ncei-point:2.0 (2.3.0)
 etc ....

Once installing the plug-in the usage is similar to the built in checkers.

Examples of how to use the Plug-Ins

Run the NCEI Point check on a THREDDS endpoint

compliance-checker -t ncei-point -v "https://data.nodc.noaa.gov/thredds/dodsC/testdata/mbiddle/GOLD_STANDARD_NETCDF/1.1/NODC_point_template_v1.1_2016-06-15_133710.844375.nc"

Run NCEI Trajectory Profile Orthogonal Check on local dataset

compliance-checker -t ncei-trajectory-profile-orthogonal -v ~/data/sample-trajectory-profile.nc

Outputting JSON from a gridded file check

compliance-checker -t ncei-grid -f json -o ~/Documents/sample_grid_report.json ~/Documents/sample_grid_report.nc

Disclaimer

The objective of the IOOS Compliance Checker is to check your file against our interpretation of select dataset metadata standards to use as a guideline in generating compliant files. The compliance checker should not be considered the authoritative source on whether your file is 100% "compliant". Instead, we recommend that users use the results as a guide to work towards compliance.

Miscellaneous/Acknowledgements

Contributors

Portions of the CF checker are based on Michael Decker's work, http://repositories.iek.fz-juelich.de/hg/CFchecker/

compliance-checker's People

Contributors

Stargazers

Watchers

compliance-checker's Issues

Ensure tests use resource/package data properly to get test files

CF: Greedy exception catching masks check_missing_data failure

The test for valid_coordinate_attribute will always fail as a result of a numpy syntax error causing a ValueError to be thrown and interpreted as a test failure in cf.py. Without the try-except block, we get the following traceback:

Running Compliance Checker on the dataset from: ../datasets/netcdf/sss_rc201401.v3.0cap.nc
Traceback (most recent call last):
  File "cchecker.py", line 24, in <module>
    main()
  File "cchecker.py", line 21, in main
    args.criteria)
  File "/Users/ochang/podaac/compliance-checker/compliance_checker/runner.py", line 39, in run_checker
    score_groups = cs.run(ds, *checker_names)
  File "/Users/ochang/podaac/compliance-checker/compliance_checker/suite.py", line 87, in run
    vals = list(itertools.chain.from_iterable(map(lambda c: self._run_check(c, dsp), checks)))
  File "/Users/ochang/podaac/compliance-checker/compliance_checker/suite.py", line 87, in <lambda>
    vals = list(itertools.chain.from_iterable(map(lambda c: self._run_check(c, dsp), checks)))
  File "/Users/ochang/podaac/compliance-checker/compliance_checker/suite.py", line 31, in _run_check
    val = check_method(ds)
  File "/Users/ochang/podaac/compliance-checker/compliance_checker/cf/cf.py", line 2955, in check_missing_data
    indices = [i for i, x in enumerate(var[:]) if (var._FillValue == x or '--' == x or 'nan' == x or 'NaN' == x)]
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Versions

Python 2.7.5, numpy 1.8.1, Mac OSX 10.9.3

Add License file

LGPLv3

Usage message not consistent with accepted command-line arguments

The command-line usage message (output of compliance-checker -h) seems to suggest several ways of specifying which tests to apply:

 optional arguments:
  --test {acdd,cf,ioos} [{acdd,cf,ioos} ...], -t {acdd,cf,ioos} [{acdd,cf,ioos} ...], --test= {acdd,cf,ioos} [{acdd,cf,ioos} ...], -t= {acdd,cf,ioos} [{acdd,cf,ioos} ...]

However, several of these variants:

$ compliance-checker -t acdd test.nc
$ compliance-checker --test acdd test.nc
$ compliance-checker --test= acdd test.nc

result in:

compliance-checker: error: argument --test/-t/--test=/-t=: invalid choice: 'test.nc' (choose from 'acdd', 'cf', 'ioos')

Maybe the usage should just say it has to be -t={acdd,cf,ioos} or --test={acdd,cf,ioos} to keep it simple.

Also, it's not quite clear how to specify multiple tests. Comma or space-separated lists don't work.

Add checks for glider specific variables

CF checker doesn't allow additional conventions in Conventions attribute

The test check_conventions_are_cf_16 in cf.py requires the Conventions global attribute to be exactly equal to the string "CF-1.6". This is indeed what the CF conventions document recommends. However, the NetCDF User's Guide allows this attribute to be a blank- or comma-separated list of convention names.

Could the check be modifed to look for "CF-1.6" as a sub-string of the Conventions attribute?

E.g. IMOS data will usually have Conventions = "CF-1.6,IMOS-1.3". We would like these files to pass the CF checker.

README up to date

Check is_reduced_horizontal_grid is wrong

The check for is_reduced_horizontal_grid is not properly discriminating against datasets that aren't reduced horizontal grids.

Reduce size of included test datasets

@DanielJMaher can we reduce the size of test-data/mapping.nc, currently it is 18mb and we can't distribute something that large.

Check for ugrid compliance

Eventually it would be nice to check for UGRID 0.9 compliance
https://github.com/ugrid-conventions/ugrid-conventions/blob/v0.9.0/ugrid-conventions.md
perhaps using some code from pyugrid
https://github.com/pyugrid/pyugrid

Is that on the roadmap?

Print failures without verbose setting

Print tree/table with only the failures and failure messages without needing -v.

-v should print entire test suite.
-v -v should possibly print debugging info (or leave same as -v for now)

Miniconda on Linux requires setuptools for pkg_resources/running tests

I need to explicitly install setuptools in my miniconda environment in order to run the tests. Not sure if this is a platform issue, miniconda issue - should it be added to requirements.txt?

Amount of data within file affects run time

This isn't an error but more of a question. What is actually causing the script to run soo slow when the data within the file is increased? For example, I have one file that is 325MB, which has been reduced down to 41KB by clearing out some (a lot) of the data points. All of the variables and attributes have stayed exactly the same, only the dimensions and amount of data has changed. When I run the CF check on the smaller (41KB) file, it completes in a matter of seconds. However, when I run the CF check on the larger (325MB) file, it takes forever to run (I'm on hour two of runtime right now). Does the checker actually need to pull in each individual data point within the file? Isn't it just checking variable attributes and global attributes? FYI, I'm running with both files on my local system...

No valid checkers found for tests 'acdd'

I'm trying to follow the example from the README.md here
http://nbviewer.ipython.org/gist/rsignell-usgs/9b013520225da13fe455
but getting no valid checkers found ...

Do I need to import something else?

Skip CF DSG checks if dataset is not DSG

Currently, the CF checker returns failures for DSG related checks that do not apply. In our current framework, they should be skipped, not executed.

Alternative to udunitspy

The udunitspy library is a swig wrapper (hard to install) around udunits, and udunits does not work/install on Windows properly.

@dpsnowden suggests perhaps making a pure python port (one may already exist) - I briefly looked and it seemed reasonable.

@abirger suggests perhaps a udunits webservice that can be optionally hit from the compliance-checker. We could make udunitspy an optional dependency and react to import failure in code properly.

Travis-CI integration

Compile Glider DAC v2.0 requirements so that we can estimate the work needed for Glider checks.

@kknee, this fell off the plate during the last round of development because we hadn't yet solified the glider format. Once you and @kerfoot are certain that the ioos/ioosngdac file description is accurate, we need to include this in our release planning for the next phase of development. I expect that there will be other more granular issues all attached to this milestone. This issue is just there to get the ball rolling toward planning. @BeckyBaltes note that the milestone currently has no date. Up to the team to figure out where it fits in the overall release planning.

check_compression needs cleanup

the check is iterating through the variables an extra time. Clean up would be nice.

Overhaul of scoring concept

This issue meant for discussion, including but not limited to @dpsnowden, @rsignell-usgs, @kwilcox, @DanielJMaher, @lukecampbell, @oychang ...

The scoring system currently in the compliance-checker feels not up to par. This issue attempts to explain how it got to be the way it is, and where to go from here.

Initial Design

The initial design goals were this:

Score a dataset by multiplying the weight (low=1, medium=2, high=3) of each top-level Result by the value (0 for 0, 0.5 for any fractional, 1 for 1)
Provide a stable upper-bound per checker, not varying per dataset (aka same upper bound no matter how many variables)

The per variable checks rely on the concept of grouping, which results in the tree-like output you see today (and needs to be clarified in documentation). Therefore, instead of something like the following:

var_temperature_units_valid passed
var_salinity_units_valid failed

You get something like this:

var
    temperature
        units
            valid  Passed
    salinity
        units
            valid  Failed

... and everything under the "var" tree counts as ONE top level Result to be added towards the total score. In this case, "var" would be 1/2, or partial, multiplied by whatever is the maximum weight observed in this tree. THIS IS COMPLICATED.

Issues

Grouped check contribution

A major downside to this is that grouped items contribute very little to the overall score. Consider the ACDD checker: most of the checks are for global attributes, then about four checks per variable, which all get grouped together.

Since the ACDD checker was first completed, this heavily guided the scoring ideas. As soon as the CF checker was taking shape, problems emerged: we do some number of global checks, then do a TON of per-variable checks, which barely contribute to the score. Simple global attribute checks should not carry more weight than the per-variable checks!

Optional checks (DSG)

Section 9 of the CF document defines DSGs. Clearly, not every dataset going through the CF checker will or should have these properties or even perform these tests (we're fixing #36 to that affect).

Therefore, there is no way to give a consistent "CF checker" upper-bound between different datasets if one is DSG and the other is not. It would have to be something like a "bolt-on" style score - "Dataset X scored 120/140 of normal CF, and 20/30 of DSG", almost treating it as if it was a second checker.

Potential Solutions

Split out CF/DSG checker

As mentioned in the optional section above, "optional" style checks can be implemented as a second checker.

Pros:

Consistent reporting
Could solely report DSG compliance if wanted

Cons:

Would this mean users would have to --test=cf,cf-dsg every time?

Do away with upper bound/scoring altogether

Instead, consider a "demerit" style scoring approach. Instead of a "100/120, here's all your wrong things", just output the wrong things. For example:

CF: -6

HIGH: No cf_role detected (-3)
MEDIUM: No author attribute (-2)
LOW: Variable "temperature" is missing a comment about how important it is (-1)

Pros:

Could possibly do away with grouping mechanism
Still variable independent

Cons:

Extremely negative style approach (HERE'S HOW YOU'RE WRONG, DUMMY)
Provides no baseline other than Perfect (0)

I apologize for the length, but needed to get this all recorded. Please take a bit to digest and give some thoughts, or ask for clarification, as this is a not very well organized brain dump.

Line 1340 of the cf.py might have an error in the implementation

@kerfoot John's glider nc files were failing the check independent axis dimension test even though they appear to be right. Error might be in line 1340, or around.

Cell Boundaries

The Cell Boundaries Check is not properly finding the length of the variable and the length of the cell boundary variable. This is causing inaccurate failures.

Missing Data Check will not work with coordinates that are multidimensional

The enumerate function will not properly find the missing data indices. Must change to np.where

CF: check_two_dimensional doesn't use COARDS/CF-compatible method of determining lat/lon

Given the following dimensions and variables, we fail:

$ ncdump -h Equirectangular sss_rc201401.v3.0cap.nc
netcdf sss_rc201401.v3.0cap {
dimensions:
    idlon = 360 ;
    idlat = 180 ;
variables:
    float idlon(idlon) ;
        idlon:long_name = "longitude" ;
        idlon:standard_name = "longitude" ;
        idlon:units = "degrees_east" ;
        idlon:comment = "midpoint of interval on uniform grid from -180 to 180 in 1 degree longitude increments" ;
        idlon:point_spacing = "1deg" ;
        idlon:axis = "X" ;
    float idlat(idlat) ;
        idlat:long_name = "latitude" ;
        idlat:standard_name = "latitude" ;
        idlat:units = "degrees_north" ;
        idlat:comment = "midpoint of interval on uniform grid from -90 to 90 in 1 degree latitude increments" ;
        idlat:point_spacing = "1deg" ;
        idlat:axis = "Y" ;
    float sss_cap(idlat, idlon) ;
        sss_cap:long_name = "Sea Surface Salinity" ;
        sss_cap:standard_name = "sea_surface_salinity" ;
        sss_cap:units = "1e-3" ;
        sss_cap:valid_min = 0.f ;
        sss_cap:valid_max = 45.f ;
        sss_cap:scale_factor = 1.f ;
        sss_cap:add_offset = 0.f ;
        sss_cap:grid_mapping = "Equirectangular" ;
        sss_cap:comment = "level-3 analysed sea surface salinity values obtained from the Combined Active Passive -CAP- algorithm with Rain Correction. Cell values are means for the temporal interval & 1degree spatial grid" ;
        sss_cap:_FillValue = -9999.f ;
        sss_cap:coordinates = "idlat idlon" ;
    char Equirectangular ;
        Equirectangular:grid_mapping_name = "latitude_longitude" ;
        Equirectangular:Standard_Parallel = 0.f ;
        Equirectangular:Longitude_of_Central_Meridian = 0.f ;
        Equirectangular:false_northing = 0.f ;
        Equirectangular:false_easting = 0.f ;
        Equirectangular:comment = "projection also referred to as Equidistant Cylindrical" ;
...

var                                    :3:    13/14 :  
    sss_cap                            :3:     8/ 9 :  
        lat_lon_correct                :3:     0/ 1 :

This seems to be because cf.py determines if a coordinate is lat/lon by checking variable name rather than the proper way of knowing based on units.

check_alternative_coordinates must be fixed

Not properly checking for alternative coordinate variables

Installation problems on Ubuntu

There is an issue installing compliance checker on Ubuntu (Version to be filled in).

Installing udunitspy, a dependency of compliance checker, is reporting the lack of a swig executable.

SWIG is a library or tool that is used to create bindings, or an interface between a python module and a dynamically linked library written in C/C++ (or another compiled language) that is usually native to the operating system.

The issue was originally reported by @mbiddle-nodc

check_geographic_region needs to allow uppercase

check_geographic_region check only allows regions that are all lowercase

Valid_coordinates not functioning properly

When a non Lat,Lon,Time,Elevation dimension is given as a coordinate, and it is not a coordinate varaible, an error was being raised. That case should pass this check as only lat,lon,time,elev dimensions are checked here.

Clarify when metadata is read from global attributes vice inferred from file.

There has recently been a long discussion on the ACDD list discussing the relative merits of including metadata as global attributes when that metadata could (or should) be computed from the actual data in the file. The classic example is including elements describing spatio-temporal bounds in global attributes in netCDF files served via THREDDS. Since THREDDS inherits the global attributes from only one file (first, last, or random) and not from the entire aggregation, attributes such as start_time, end_time often do not describe the actual time axis in the aggregated data set time.

For attributes like these, the checker should compare the global attribute with the computed quantity and note any inconsistencies. It's going to be up to us to determine which attributes fit in to this category. The ESIP wiki describes this problem and is starting to characterize the behavior of some common applications. Basically, beware of inherited global attributes.

Not correctly processing CF1.6 TimeSeries (STATION) file

This file loads into NetCDF-Java and is recognized as a STATION.

Exceptions

$ compliance-checker 20150210T1900_TO_20150225T1000.nc
Running Compliance Checker on the dataset from:
20150210T1900_TO_20150225T1000.nc
The following exceptions occured during the acdd checker (possibly indicate compliance checker issues):
acdd.check_lat_extents: object __array__ method not producing an array
acdd.check_time_extents: can't subtract offset-naive and offset-aware datetimes
acdd.check_lon_extents: object __array__ method not producing an array

ACDD

I'm getting ACDD failures for every variable because they don't have a coverage_content_type attribute. Should this only be tested for "data" variables? It is failing on things like crs, time, and latitude.

CF

Not correctly identifying the featureType:

                                Medium Priority                                 
--------------------------------------------------------------------------------
    Name                            :Priority: Score
all_features_are_same_type              :2:     0/0

This failure shouldn't be occurring:

--------------------------------------------------------------------------------
                  Reasoning for the failed tests given below:                   


Name                             Priority:     Score:Reasoning
--------------------------------------------------------------------------------
axis                                   :3:     3/ 4 :  
    height                             :3:     3/ 4 :  
        is_coordinate_var              :3:     0/ 1 : height is not allowed to
                                                      have an axis attr as it is
                                                      not a coordinate var

Better text output

Methods for printing need to be cleaned up and merged for redundancy.

Installation problems on windows

I'm trying to test compliance-checker under windows with anaconda. pip install compliance-checker is freaking out with:

HDF5_DIR environment variable not set, checking some standard locations ..
checking C:\Users\christopher.rae ...
checking /usr/local ...
checking /sw ...
checking /opt ...
checking /opt/local ...
checking /usr ...
Traceback (most recent call last):
  File "<string>", line 17, in <module>
  File  c:\users\christopher.rae\appdata\local\temp\1\pip_build_Christopher.Rae\netCDF4\setup.py", line 208, in <module>
    raise ValueError('did not find HDF5 headers')
ValueError: did not find HDF5 headers

@dpsnowden suggests it's probably related to the way PyTables or netcdf4-python were installed in conda for windows, and that @rsignell-usgs may have insight having installed related packages on windows.

CF check rejects many valid units

Running the CF tests on some IMOS files, we get high-priority tests failing with messages like these:

units are metres, standard_name units should be m
units are Celsius, standard_name units should be K

The CF convention says "The value of the units attribute is a string that can be recognized by UNIDATA's Udunits package" (Section 3.1) and "Unless it is dimensionless, a variable with a standard_name attribute must have units which are physically equivalent (not necessarily identical) to the canonical units" (Section 3.3).

Looking at the code (cf.py), the function check_units seems to insist on the units being exactly the canonical units for any variable with a standard_name attribute, but makes some exceptions to allow some commonly used variants. These exceptions include e.g. "meter" and "meters" (US spelling), but not "metres" (British/Australian spelling).

Wouldn't it be more correct (and more flexible) to check the units using the tests of equivalence provided by UDUNITS? (e.g. udunitspy.udunits2.Unit has a method are_convertible)

check_packed_data needs a ret_val.append(result)

if type(data_type_check) == type(scale) == type(offset): block needs ret_val append

Support and check for other IOOS-approved vocabularies

Based on a scan of the code and the docs, it looks like the compliance checker only checks for CF standard_names, for observed properties. But that's not the only "IOOS approved" vocabulary for observed properties; see the IOOS vocabularies doc.

There also a few other, smaller vocabularies developed by IOOS and hosted on MMI that are part of IOOS SOS compliance (eg, sector, platform type); these are all or mostly found on the DescribeSensor response. It'd be good to add these to the compliance checker, too, eventually. Direct checking against MMI (or an auto-cached version of MMI vocabularies) would be nice. I may be able to help code that, later in the year, after October.

Return value always 0

In order to incorporate compliance-checker efficiently with other command line tools, it is recommended that it returns a non-zero return value on error.

At the moment it returns 0 all the time, correct me if I'm wrong.

Here is a sample output:

$ ./cchecker.py -t tester -c strict /tmp/output.nc
Running Compliance Checker on the dataset from: /tmp/output.nc


--------------------------------------------------------------------------------
                      The dataset scored 0 out of 2 points                      
                              during the tester check                              
--------------------------------------------------------------------------------
                               Scoring Breakdown:                               


                                 High Priority                                  
--------------------------------------------------------------------------------
    Name                            :Priority: Score
fail_me                                 :3:     0/1
fail_me2                                :3:     0/1
                        No Medium priority tests present                        
--------------------------------------------------------------------------------

                         No Low priority tests present                          
--------------------------------------------------------------------------------


--------------------------------------------------------------------------------
                  Reasoning for the failed tests given below:                   


Name                             Priority:     Score:Reasoning
--------------------------------------------------------------------------------
fail_me                                :3:     0/ 1 : Attr fail_me not present
fail_me2                               :3:     0/ 1 : Attr fail_me2 not present
$ echo $?
0

compliance-checker fails to process CO-OPS GetCaps template

I tried compliance-checker with the CO-OPS templates that have been recently distributed on ioostech_dev, and it remained quiet for a while, then gave me just one phrase - “The following tests failed:”, and exited to a command prompt. Probable cause: template file size.

Reenable Udunits installation

In a22b488 I disabled udunits from requirements, but without it, the CF checker crashes. Needs to be reenabled, and the comment addressed (won't install on centos/rhel?).

cleanup for check_all_features_are_same_type

stillusing len(var.shape) instead of ndim. stillusing names instead of units for x,y,z

Prep for 1.0 release

Let's get a few extra pairs of eyes on the documentation and installation instructions before declaring v1.0. Call it 1.0.0-RC1 ???

I suggest @emiliom, @DanRamage, @jcothran, and @rsignell-usgs as good testers for this release.

Incorrect Check name for horz_crs_grid_mappings_projections

Result object is returning a check name of compressed_data and not horz_crs_grid_mappings_projections

also a typo "valid_capping_count" instead of "valid_mapping_count"

also 'if name_again in self.grid_mapping_dict[mapping][1]:' should be looking at the variables standard_name

check_alternative_coordinates not checking properly

This check should look for all singled dimensioned variables that are not coordinate variables, but are used as a coordinate.

Enhance ACDD's attr/val checks to return messages indicating failures

TypeError when running acdd test

The checker works fine for CF and for IOOS, but runs into an issue with ACDD.

$ compliance-checker --test=acdd -v http://data.nodc.noaa.gov/thredds/dodsC/testdata/mbiddle/carocoops.sun2.buoy_2013_12_31_02.nc
Running Compliance Checker on the dataset from: http://data.nodc.noaa.gov/thredds/dodsC/testdata/mbiddle/carocoops.sun2.buoy_2013_12_31_02.nc
Traceback (most recent call last):
File "/home/matt-ubuntu/anaconda/bin/compliance-checker", line 9, in
load_entry_point('compliance-checker==1.0.0', 'console_scripts', 'compliance-checker')()
File "/home/matt-ubuntu/anaconda/bin/cchecker.py", line 21, in main
args.criteria)
File "/home/matt-ubuntu/anaconda/lib/python2.7/site-packages/compliance_checker/runner.py", line 41, in run_checker
score_groups = cs.run(ds, *checker_names)
File "/home/matt-ubuntu/anaconda/lib/python2.7/site-packages/compliance_checker/suite.py", line 98, in run
groups = self.scores(vals)
File "/home/matt-ubuntu/anaconda/lib/python2.7/site-packages/compliance_checker/suite.py", line 361, in scores
grouped = self._group_raw(raw_scores)
File "/home/matt-ubuntu/anaconda/lib/python2.7/site-packages/compliance_checker/suite.py", line 426, in _group_raw
msgs = sum(map(lambda x: x.msgs, v), [])
TypeError: can only concatenate list (not "str") to list

CF check_data_types fails for certain char types

Consider the following excerpt:

$  ncdump -v Equirectangular sss_rc201401.v3.0cap.nc
...
    char Equirectangular ;
        Equirectangular:grid_mapping_name = "latitude_longitude" ;
        Equirectangular:Standard_Parallel = 0.f ;
        Equirectangular:Longitude_of_Central_Meridian = 0.f ;
        Equirectangular:false_northing = 0.f ;
        Equirectangular:false_easting = 0.f ;
        Equirectangular:comment = "projection also referred to as Equidistant Cylindrical" ;
...
Equirectangular = "" ;

data_types                             :3:     3/ 4 : [u'The variable
                                                      Equirectangular failed
                                                      because the datatype is
                                                      |S1']

|S1 should be an homogenous array of single characters, i.e. equivalent to numpy.character. This equivalence is further suggested by this example from the numpy documentation.

A possible solution would be to add |S1 to the list of allowed types in cf.py.

ACDD 'name' get raises TypeError

Traceback

(cc)LMC-026061 :: ~/podaac % compliance-checker --test=acdd 20140508-MODIS_A-JPL-L2P-A2014128024500.L2_LAC_GHRSST_N-v01.nc                                                                                   
Running Compliance Checker on the dataset from: 20140508-MODIS_A-JPL-L2P-A2014128024500.L2_LAC_GHRSST_N-v01.nc
Traceback (most recent call last):
  File "/Users/ochang/.virtualenvs/cc/bin/compliance-checker", line 9, in <module>
    load_entry_point('compliance-checker==0.2.0', 'console_scripts', 'compliance-checker')()
  File "/Users/ochang/.virtualenvs/cc/bin/cchecker.py", line 21, in main
    args.criteria)
  File "/Users/ochang/.virtualenvs/cc/lib/python2.7/site-packages/compliance_checker/runner.py", line 39, in run_checker
    score_groups = cs.run(ds, *checker_names)
  File "/Users/ochang/.virtualenvs/cc/lib/python2.7/site-packages/compliance_checker/suite.py", line 87, in run
    vals = list(itertools.chain.from_iterable(map(lambda c: self._run_check(c, dsp), checks)))
  File "/Users/ochang/.virtualenvs/cc/lib/python2.7/site-packages/compliance_checker/suite.py", line 87, in <lambda>
    vals = list(itertools.chain.from_iterable(map(lambda c: self._run_check(c, dsp), checks)))
  File "/Users/ochang/.virtualenvs/cc/lib/python2.7/site-packages/compliance_checker/suite.py", line 30, in _run_check
    val = check_method(ds)
  File "/Users/ochang/.virtualenvs/cc/lib/python2.7/site-packages/compliance_checker/base.py", line 189, in _dec
    ret_val = func(s, ds)
  File "/Users/ochang/.virtualenvs/cc/lib/python2.7/site-packages/compliance_checker/acdd.py", line 122, in check_var_coverage_content_type
    vars = self._get_vars(ds, 'coverage_content_type')
  File "/Users/ochang/.virtualenvs/cc/lib/python2.7/site-packages/compliance_checker/acdd.py", line 93, in _get_vars
    attrs = zip(attrs, names)
  File "/Users/ochang/.virtualenvs/cc/lib/python2.7/site-packages/compliance_checker/acdd.py", line 91, in <genexpr>
    names = (v.attrib.get('name', 'unknown') for v in vars)
  File "/Users/ochang/.virtualenvs/cc/lib/python2.7/site-packages/petulantbear/netcdf_etree.py", line 329, in attrib
    return NcVarAttrib(self)
TypeError: __init__() takes exactly 3 arguments (2 given)

Tested on fresh installs of compliance-checker==0.2.0, petulant-bear==0.1.2 installed through pip under OSX 10.9.3 and Ubuntu 14.04, python 2.7.5.

Solution

Replace v.attrib.get(...) with v.get(...) in acdd.py:91. Then everything works as expected! Seems to be related to the changes to NcVarAttrib in ioos/petulant-bear@b58277a

CF Conventions check allows pre 1.6 strings

Added in #68 is a list of new convention strings allowed from CF-1.0 to CF-1.6. As noted by @DanielJMaher in the comments of that PR, this checker is only designed for CF-1.6. Allowing other checks gives a false impression that other CF standards will be rigorously checked (they won't be).

I'm fine with the other changes of that PR but this list needs to be reverted back to a single entity, CF-1.6.

Plugin system for checkers

This is a future issue.

Checkers should be done in a manner to allow independent authoring. Typical Python approach to this is namespace packages, but caused me so much grief during initial packaging that I don't really want to revisit (nor is it a perfect solution anyway).

This looks like a great alternative: https://github.com/mitsuhiko/pluginbase

Trap errors in checks

Right now, checks bomb out completely when errors occur. This is fine for debugging/development, but should be captured/recorded and allowed to continue.

Errors that occur in each check suite should be output above the normal report - just the error message/what check|suite caused it in normal mode, but with verbose on (or maybe to level 2) should give the full traceback.

Help text in compliance checker

Looking at the readme, output from compliance-checker --help is shown, but I cannot grep help anywhere in the compliance-checker code. Where is this help text generated?

ioos / compliance-checker Goto Github PK

compliance-checker's Introduction

IOOS Compliance Checker

Advice to data providers

Usage

API

The Compliance Checker Command Line Tool

Concepts & Terminology

Installation

Command Line Usage

Examples

Check a local file against CF 1.6

Check a remote file against ACDD 1.3

Checking against remote ERDDAP Datasets

Write results to text file

Write results to JSON file

Write results to HTML file

Output text from multiple input files to one output file

Output html and text files from multiple input files (part 1)

Output html and text files from multiple input files (part 2)

Download a particular CF standard names table for use in the test

Compliance Checker Plug-Ins

Current Plug-in Releases:

Examples of how to use the Plug-Ins

Disclaimer

Miscellaneous/Acknowledgements

Contributors

compliance-checker's People

Contributors

Stargazers

Watchers

Forkers

compliance-checker's Issues

Versions

Initial Design

Issues

Grouped check contribution

Optional checks (DSG)

Potential Solutions

Split out CF/DSG checker

Do away with upper bound/scoring altogether

Exceptions

ACDD

CF

Traceback

Solution

Recommend Projects

Recommend Topics

Recommend Org