Giter Club home page Giter Club logo

noaa_coops's Introduction

noaa_coops

PyPI PyPI - Python Version

A Python wrapper for the NOAA CO-OPS Tides & Currents Data and Metadata APIs.

Installation

This package is distributed via PyPi and can be installed using , pip, poetry, etc.

# Install with pip
❯ pip install noaa_coops

# Install with poetry
❯ poetry add noaa_coops

Getting Started

Stations

Data is accessed via Station class objects. Each station is uniquely identified by an id. To initialize a Station object, run:

>>> from noaa_coops import Station
>>> seattle = Station(id="9447130")  # Create Station object for Seattle (ID = 9447130)

Stations and their IDs can be found using the Tides & Currents mapping interface. Alternatively, you can search for stations in a bounding box using the get_stations_from_bbox function, which will return a list of stations found in the box (if any).

>>> from pprint import pprint
>>> from noaa_coops import Station, get_stations_from_bbox
>>> stations = get_stations_from_bbox(lat_coords=[40.389, 40.9397], lon_coords=[-74.4751, -73.7432])
>>> pprint(stations)
['8516945', '8518750', '8519483', '8531680']
>>> station_one = Station(id="8516945")
>>> pprint(station_one.name)
'Kings Point'

Metadata

Station metadata is stored in the .metadata attribute of a Station object. Additionally, the keys of the metadata attribute dictionary are also assigned as attributes of the station object itself.

>>> from pprint import pprint
>>> from noaa_coops import Station
>>> seattle = Station(id="9447130")
>>> pprint(list(seattle.metadata.items())[:5])                   # Print first 3 items in metadata
[('tidal', True), ('greatlakes', False), ('shefcode', 'EBSW1')]  # Metadata dictionary can be very long
>>> pprint(seattle.lat_lon['lat'])                               # Print latitude
47.601944
>>> pprint(seattle.lat_lon['lon'])                               # Print longitude
-122.339167

Data Inventory

A description of a Station's data products and available dates can be accessed via the .data_inventory attribute of a Station object.

>>> from noaa_coops import Station
>>> from pprint import pprint
>>> seattle = Station(id="9447130")
>>> pprint(seattle.data_inventory)
{'Air Temperature': {'end_date': '2019-01-02 18:36',
                     'start_date': '1991-11-09 01:00'},
 'Barometric Pressure': {'end_date': '2019-01-02 18:36',
                         'start_date': '1991-11-09 00:00'},
 'Preliminary 6-Minute Water Level': {'end_date': '2023-02-05 19:54',
                                      'start_date': '2001-01-01 00:00'},
 'Verified 6-Minute Water Level': {'end_date': '2022-12-31 23:54',
                                   'start_date': '1995-06-01 00:00'},
 'Verified High/Low Water Level': {'end_date': '2022-12-31 23:54',
                                   'start_date': '1977-10-18 02:18'},
 'Verified Hourly Height Water Level': {'end_date': '2022-12-31 23:00',
                                        'start_date': '1899-01-01 00:00'},
 'Verified Monthly Mean Water Level': {'end_date': '2022-12-31 23:54',
                                       'start_date': '1898-12-01 00:00'},
 'Water Temperature': {'end_date': '2019-01-02 18:36',
                       'start_date': '1991-11-09 00:00'},
 'Wind': {'end_date': '2019-01-02 18:36', 'start_date': '1991-11-09 00:00'}}

Data Retrieval

Available data products can be found in NOAA CO-OPS Data API docs.

Station data can be fetched using the .get_data method on a Station object. Data is returned as a Pandas DataFrame for ease of use and analysis. DataFrame columns are named according to the NOAA CO-OPS API docs, with the t column (timestamp) set as the DataFrame index.

The example below fetches water level data from the Seattle station (id=9447130) for a 1 month period. The corresponding web output is shown below the code as a reference.

>>> from noaa_coops import Station
>>> seattle = Station(id="9447130")
>>> df_water_levels = seattle.get_data(
...     begin_date="20150101",
...     end_date="20150131",
...     product="water_level",
...     datum="MLLW",
...     units="metric",
...     time_zone="gmt")
>>> df_water_levels.head()
                         v      s        f  q
t
2015-01-01 00:00:00  1.799  0.023  0,0,0,0  v
2015-01-01 00:06:00  1.718  0.018  0,0,0,0  v
2015-01-01 00:12:00  1.639  0.013  0,0,0,0  v
2015-01-01 00:18:00  1.557  0.012  0,0,0,0  v
2015-01-01 00:24:00  1.473  0.014  0,0,0,0  v

image

Development

Requirements

This package and its dependencies are managed using poetry. To install the development environment for noaa_coops, first install poetry, then run (inside the repo):

poetry install

TODO

Click here for a list of existing issues and to submit a new one.

Contribution

Contributions are welcome, feel free to submit a pull request.

noaa_coops's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

noaa_coops's Issues

noaa_coops.py doesn't handle HTML exceptions

Hello,
Novice coder here. I'm having an issue when I run the code for multiple different stations where the NOAA server side will sometimes hang up and cause the code to crash (see below)

`
File C:\ProgramData\Anaconda3\lib\site-packages\requests\models.py:910 in json
return complexjson.loads(self.text, **kwargs)

File C:\ProgramData\Anaconda3\lib\json_init_.py:346 in loads
return _default_decoder.decode(s)

File C:\ProgramData\Anaconda3\lib\json\decoder.py:337 in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())

File C:\ProgramData\Anaconda3\lib\json\decoder.py:355 in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None

JSONDecodeError: Expecting value

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File ~\Documents\Python\NOAA_script\Noaa_Data_Pull.py:15 in
station_1 = nc.Station(x)

File ~\AppData\Roaming\Python\Python39\site-packages\noaa_coops\noaa_coops.py:23 in init
self.get_metadata(self.stationid)

File ~\AppData\Roaming\Python\Python39\site-packages\noaa_coops\noaa_coops.py:83 in get_metadata
json_dict = response.json()

File C:\ProgramData\Anaconda3\lib\site-packages\requests\models.py:917 in json
raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

JSONDecodeError: [Errno Expecting value]

<title>504 Gateway Time-out</title>

504 Gateway Time-out

: 0`

I addressed this with a simple solution of wrapping the lines throwing the error in the noaa_coops.py script with a while loop that waits 5 seconds and tries the server again.

I'm sure there is a better way to resolve this (and other exceptions that could be thrown).

Thanks and let me know if you need more details!

Fails to download data before 1996

Hello!

I've noticed that this package appears unable to download data from before 1996, even if that data exists and is recognized in the data_inventory. I couldn't find any information about this in the API docs, so I'm assuming this is a bug. Here is a working example (tested in Python 3.11):

import noaa_coops as nc

station = nc.Station('8771510')

data= station.get_data(
    begin_date='19951201 00:00',
    end_date='19960201 00:00',
    product="water_level",
    datum='MSL',
    interval='h',
    units='english',
    time_zone="gmt")

print(data['water_level'])

Output is always NaN's prior to 1996-01-01, and actual data thereafter. It knows this data exists, because if I call station.data_inventory['Verified Hourly Height Water Level'], the output is: {'start_date': '1985-03-10 08:00', 'end_date': '2022-04-14 16:00'}. Also, the data isn't missing, because it shows up when you plot it on the web interface.

I've tested this on three gages with data before 1996 (IDs 8775870, 8770570, 8771510) and gotten the same cut-off date.

Records should be deduped based on datetime index. Not column values.

Hello again! Apologies for bringing more bad news, but I'm noticing a lot more issues with data gaps in v0.3.0 than in previous versions of the package (for benchmarking purposes I'm comparing to v0.2.0). I noticed some of this when accessing water_level, but the problem is either more severe or more obvious when accessing tide predictions, which I would assume to be gap-free since it's a synthetic product.

Here's a working test case (Python = 3.11, noaa-coops = 0.2.0 or 0.3.0, OS = Windows 10)

import noaa_coops as nc

station = nc.Station('8775241')

df = station.get_data(
    begin_date='20230320 00:00',
    end_date='20230421 00:00',
    product="predictions",
    datum='MSL',
    interval='h',
    units='english',
    time_zone="gmt")

print('Number of values returned: %s.' % len(df))

# Number of data values expected in the output at an hourly interval:
expected_hours = (df.index[-1] - df.index[0]).total_seconds()/3600
# Percentage actually returned:
data_coverage = len(df)/expected_hours
print('Fraction of range with valid data: %s.' % data_coverage)

Output for v0.2.0:

Number of values returned: 816.
Fraction of range with valid data: 1.031605562579014.

Output for v0.3.0:

Number of values returned: 615.
Fraction of range with valid data: 0.80078125.

Note that the reason I'm comparing the outputs this way is due to a second (presumably unrelated) problem, which is that sometimes the actual end_date that gets returned (i.e. df.index[-1]) does not match the end_date requested. This discrepancy is pretty irregular, it seems to depend on the choice of date range, station, and package version. For example, in the output for v0.2.0, df.index[-1] = '2023-04-21 23:00' (I assume the fraction >1 means the output contains some duplicates). Removing interval entirely in v0.3.0 returned data up to '2023-04-21 11:18'.

I've probed a few things to see where this issue might be coming from, and I haven't quite figured it out, but here are a few of my findings in case they are helpful information:

  • Other choices of interval do not appear to eliminate the data gaps problem (i.e. it is not just a problem with "hourly" outputs). In fact, some choices have a way higher fraction of data gaps; choosing interval='1' returned less than 5% of the expected data.
  • Removing the interval parameter entirely also does not eliminate the data gaps; it defaults to a 6-minute interval, and this example in v0.3.0 returned a valid fraction of about 39%.
  • During the last PR you noted that line 1002 of station.py was postprocessing the returned data with df = df.resample("H").first(). I tested to see whether this change caused the discrepancy, and it didn't. I ran the above code with the interval parameter removed and then coarsened the data to hourly with this line, and it filled some of the voids, but not many of them. Also, the output after doing so was still different from the output in v0.2.0, so something else is clearly the culprit.

I glanced through the recent changes to the code but I couldn't find anything that would obviously result in this issue. Perhaps this is an issue with the API itself, but if it is, it would be helpful to know what post-processing this package used to do that filtered out these errors, since I haven't been able to replicate it.

Failed dependency for `pandas`

Home Assistant uses a quit old version of the library (noaa-coops==0.1.8).
The dependency for pandas is no longer pinned and with Python 3.11.x pandas 2.0.2 is installed.
Installing noaa-coops fails as it requires pandas to be <2.0.0
Is it possible to fix the pandas dependencies for python 3.11.x with panda's > 2.0.x?

Further there might be an issue with numpy, as this is still pinned, but git logs indicate the dependency should be removed.
See: #53

IOOS SOS API access?

Hi! This package looks awesome. I see it gathers data from the Tides and Currents API. Do you know of any package set up to connect with https://opendap.co-ops.nos.noaa.gov/ioos-dif-sos/SOS?service=SOS? If not, do you think noaa_coops would be a reasonable place for that sort of API access to live, or a different project? Thanks.

Pulled data from the package does not match with the downloaded data from noaa website

  1. In the code, I have simply selected a time period from the beginning to the end of 2015 and pulled of water temperature data from station 8531680. Then I saved the file as csv. (Filename: data3_waterTemperature_ErrorCheck)

  2. I downloaded 2015 water temperature data (with metric and gmt) from the noaa coops website (Filename: CO-OPS_8531680_met_errorCheck) and tried to match the results of downloaded data with the pulled of data from the code.

  3. In the comparison between pulled of and downloaded data csv file, both the pulled data and downloaded data should match based on time, but it does not. The january month looks fine, but in the february month there are mismatches with the downloaded and pulled of data. The reason is, I think, there are duplicate datas from 2/1/15 0:00 (or 715 cell) where there is duplicate values in the pulled of from the code data, the whole days data gets repeated and it messed up all other data.

Pulling conductivity data has an issue

I'm trying to pull conductivity data from Ship John Shoal, NJ for the year of 2016, but let's start with October and I get this error:

df_conductsiteid = '8537121'
site = Station(stationid=siteid)
df_conductivity = site.get_data(begin_date="20161001",end_date="20161101",product="conductivity",time_zone="gmt")


KeyError Traceback (most recent call last)
File ~/mambaforge/envs/pangeo/lib/python3.10/site-packages/pandas/core/indexes/base.py:3652, in Index.get_loc(self, key)
3651 try:
-> 3652 return self._engine.get_loc(casted_key)
3653 except KeyError as err:

File ~/mambaforge/envs/pangeo/lib/python3.10/site-packages/pandas/_libs/index.pyx:147, in pandas._libs.index.IndexEngine.get_loc()

File ~/mambaforge/envs/pangeo/lib/python3.10/site-packages/pandas/_libs/index.pyx:176, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'date_time'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)
Cell In[23], line 1
----> 1 df_conductivity = site.get_data(begin_date="20161001",end_date="20161101",product="conductivity",time_zone="gmt")
2 # df_conductivity.head()

File ~/mambaforge/envs/pangeo/lib/python3.10/site-packages/noaa_coops/noaa_coops.py:873, in Station.get_data(self, begin_date, end_date, product, datum, bin_num, interval, units, time_zone)
870 df["date_time"] = pd.to_datetime(df["date_time"])
872 # Set datetime to index (for use in resampling)
--> 873 df.index = df["date_time"]
874 df = df.drop(columns=["date_time"])
876 # Handle hourly requests for water_level and currents data

File ~/mambaforge/envs/pangeo/lib/python3.10/site-packages/pandas/core/frame.py:3761, in DataFrame.getitem(self, key)
3759 if self.columns.nlevels > 1:
3760 return self._getitem_multilevel(key)
-> 3761 indexer = self.columns.get_loc(key)
3762 if is_integer(indexer):
3763 indexer = [indexer]

File ~/mambaforge/envs/pangeo/lib/python3.10/site-packages/pandas/core/indexes/base.py:3654, in Index.get_loc(self, key)
3652 return self._engine.get_loc(casted_key)
3653 except KeyError as err:
-> 3654 raise KeyError(key) from err
3655 except TypeError:
3656 # If we have a listlike key, _check_indexing_error will raise
3657 # InvalidIndexError. Otherwise we fall through and re-raise
3658 # the TypeError.
3659 self._check_indexing_error(key)

KeyError: 'date_time'

Do you know what I am doing wrong?

Selecting stations by bounding box

Hey, relying on the point and click map from the website to find out the station ID's seems a little unsystematic.

I was wondering whether there might not be a use case to select station ID's by a bounding lat-lon box and or time-period of operation, say from processing this list: https://www.tidesandcurrents.noaa.gov/stations.html?type=Historic+Water+Levels

or submitting a url request similar to this user:

https://gis.stackexchange.com/questions/89330/accessing-noaa-co-ops-water-levels-using-sos-with-bounding-box-and-time-extent

Apologies if this has been implemented elsewhere, my cursory email search didn't find anything.

I'd be happy to have a shot at implementing this today if it doesn't exist.

_parse_known_date_formats is broken

If the date doesn't match the first format in the list it will fail without trying the other formats.

Looks like it was correct prior to the most recent change.

KeyError saying "['flags', 'QC', 'date_time'] not found in axis" when downloading tide gage data before 1995

Hello, when I try to run the following code, I get a KeyError saying "['flags', 'QC', 'date_time'] not found in axis". This only occurs when the begin_date year is 1994 or earlier indicating to me that the format of the data changed around that time for this station. Any suggestions? My goal is to download the complete record of this tide gage starting 7/1/1927.

import noaa_coops as nc

sta = nc.Station(8638610)

lat, lon = sta.lat_lon

data_wl = sta.get_data(begin_date="19940526",
                    end_date="19950526",
                    product="water_level",
                    datum="NAVD",
                    units="english",
                    time_zone="lst"
    )

deprecated append method fname.append

I just wanted to share that when I run this code:

Battery_water_levels = Battery.get_data(
product='hourly_height',
begin_date='20040601',
end_date='20220831',
datum="MSL",
units="metric",
time_zone="gmt")

I get this warning:

/Users/scook/opt/miniconda3/envs/pangeo_3.8/lib/python3.8/site-packages/noaa_coops/noaa_coops.py:551: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
df = df.append(df_new)

ValueError: No Predictions data was found. Please make sure the Datum input is valid.

why am I getting this message on some stations (and not others)?

Example:

st = nc.Station(8533941)
st.get_data(
    begin_date="20200101",
    end_date="20200131",
    product="predictions",
    datum="MLLW",
    units="metric",
    time_zone="gmt")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
......
ValueError: No Predictions data was found. Please make sure the Datum input is valid.

This station id was obtained through a request to the metadata API with

https://api.tidesandcurrents.noaa.gov/mdapi/prod/webapi/stations.json?type=tidepredictions&greatlakes=False

I tried with different datum types.

I appreciate this may be an issue with CO-OPS API and not noa_coops , but I can't find any explanations on their site and have not gotten a response from them yet.

Is there a way to predict what stations have (or don't have) predictions by looking at the st objects without having to call the get_data method?

Thank you

Column names should keep the product name

For example, querying a station for air_temperature, the resulting dataframe column name is air_temp. If I am iterating a list of products to pull from a station, I can't use the name provided in the get_data query to access the dataframe.

Issues downloading large Met data requests

Errors getting wind data from station 9418767

Trying to get all of the data but throws an error that some of the data is missing...

Looks like it may have to do with missing data like:

...Probably haven't specified how to handle gaps properly for met data (and likely other data types)

Station syntax

The syntax for station id is wrong in the notes:

from noaa_coops import Station
seattle = Station(id="9447130") # Create Station object for Seattle (ID = 9447130)

It should be according to the function details

from noaa_coops import Station
seattle = Station(stationid="9447130") # Create Station object for Seattle (ID = 9447130)

No Station Inventory data available

From contact with NOAA:

At this point, the CO-OPS Metadata API does not include a data inventory.
It is possible that this capability might be added at a later enhancement.

However, the SOAP Web Services, under the Stations heading, does include a "Water Level/Met Data Inventory".

Need to find an elegant way to incorporate this inventory data from SOAP into station metadata, ideally as a new attribute of a station, .data_inventory, when it is initialized. Something like:

>>> import noaa_coops as nc
>>> seattle = nc.Station(9447130)
>>> seattle.data_inventory

DATA_INVENTORY_OUTPUT_DISPLAY HERE 

Issue with downloading 6 min data

I tried to download 6 min data (interval=None) but it keeps throwing the following error:

KeyError                                  Traceback (most recent call last)
~/applications/anaconda/envs/claw/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2656             try:
-> 2657                 return self._engine.get_loc(key)
   2658             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'date_time'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-4-845fb52b11bc> in <module>
----> 1 pr = tc.get_tides('20180101', '20180120', -88.2, 30.4)

~/repositories/SI_2019_Coastal/src/tide_constituents/tide_constituents.py in get_tides(start, end, lon, lat, interval)
     25             datum="MSL",
     26             units="metric",
---> 27             time_zone="gmt")
     28     elif interval == 'h' or interval == 'hilo':
     29         noaa_predict = station.get_data(

~/applications/anaconda/envs/claw/lib/python3.7/site-packages/noaa_coops-0.1.1-py3.7.egg/noaa_coops/noaa_coops.py in get_data(self, begin_date, end_date, product, datum, bin_num, interval, units, time_zone)
    602 
    603             # Convert date & time strings to datetime objects
--> 604             df['date_time'] = pd.to_datetime(df['date_time'])
    605 
    606         elif product == 'currents':

~/applications/anaconda/envs/claw/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2925             if self.columns.nlevels > 1:
   2926                 return self._getitem_multilevel(key)
-> 2927             indexer = self.columns.get_loc(key)
   2928             if is_integer(indexer):
   2929                 indexer = [indexer]

~/applications/anaconda/envs/claw/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2657                 return self._engine.get_loc(key)
   2658             except KeyError:
-> 2659                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2660         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2661         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'date_time'

The issue is due to a missing column rename when product is prediction and interval is None. I am adding a pull request to fix the issue.

numpy version requirement?

@GClunies I see the following requirement for numpy:

numpy = "^1.24.1"

Is it really necessary? It is leading to a conflict in my conda environment since:

numba 0.56.4 requires numpy<1.24,>=1.18

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.