ywangd / pybufrkit Goto Github PK

View Code? Open in Web Editor NEW

70.0 13.0 27.0 4.59 MB

Pure Python toolkit to work with WMO BUFR messages

Home Page: http://pybufrkit.readthedocs.io/

License: MIT License

Python 99.96% Dockerfile 0.04%

wmo bufr fm-94 pure python

pybufrkit's Introduction

Python Toolkit for WMO BUFR Messages

https://travis-ci.org/ywangd/pybufrkit.svg?branch=master

PyBufrKit is a pure Python package to work with WMO BUFR (FM-94) messages. It can be used as both a command line tool or library to decode and encode BUFR messages. Here is a brief list of some of the features:

Pure Python
Handles both compressed and un-compressed messages
Handles all practical operator descriptors, including data quality info, stats, bitmaps, etc.
Option to construct hierarchial structure of a message, e.g. associate first order stats data to their owners.
Convenient subsetting support for BUFR messages
Comprehensive query support for BUFR messages
Script support enables flexible extensions, e.g. filtering through large number of files.
Tested with the same set of BUFR files used by ecCodes and BUFRDC.

More documentation at http://pybufrkit.readthedocs.io/

An online BUFR decoder powered by PyBufrKit, Serverless and AWS Lambda.

Installation

PyBufrKit is compatible with Python 2.7, 3.5+, and PyPy. To install from PyPi:

pip install pybufrkit

Or from conda-forge:

conda install -c conda-forge pybufrkit

Or from source:

python setup.py install

Command Line Usage

The command line usage of the toolkit takes the following form:

pybufrkit [OPTIONS] command ...

where the command is one of following actions that can be performed by the tool:

decode - Decode a BUFR file to outputs of various format, e.g. JSON
encode - Encode a BUFR file from a JSON input
info - Decode only the metadata sections (i.e. section 0, 1, 2, 3) of given BUFR files
split - Split given BUFR files into one message per file
subset - Subset the given BUFR file and save as new file
query - Query metadata or data of given BUFR files
script - Embed BUFR query expressions into normal Python script
lookup - Look up information about the given list of comma separated BUFR descriptors
compile - Compile the given comma separated BUFR descriptors

Here are a few examples using the tool from command line. For more details, please refer to the help option, e.g. pybufrkit decode -h. Also checkout the documentation.

# Decode a BUFR file and output in the default flat text format
pybufrkit decode BUFR_FILE

# Decode a file that is a concatenation of multiple BUFR messages,
# skipping any erroneous messages and continue on next one
pybufrkit decode -m --continue-on-error FILE

# Filter through a multi-message file and only decode messages
# that have data_category equals to 2. See below for details
# about usable filter expressions.
pybufrkit decode -m --filter '${%data_category} == 2' FILE

# Decode a BUFR file and display it in a hierarchical structure
# corresponding to the BUFR Descriptors. In addition, the attribute
# descriptors are associated to their (bitmap) corresponding descriptors.
pybufrkit decode -a BUFR_FILE

# Decode a BUFR file and output in the flat JSON format
pybufrkit decode -j BUFR_FILE

# Encode from a flat JSON file to BUFR
pybufrkit encode -j JSON_FILE BUFR_FILE

# Decode a BUFR file, pipe it to the encoder to encode it back to BUFR
pybufrkit decode BUFR_FILE | pybufrkit encode -

# Decode only the metadata sections of a BUFR file
pybufrkit info BUFR_FILE

# Split a BUFR file into one message per file
pybufrkit split BUFR_FILE

# Subset from a given BUFR file
pybufrkit subset 0,3,6,9 BUFR_FILE

# Query values from the metadata sections (section 0, 1, 2, 3):
pybufrkit query %n_subsets BUFR_FILE

# Query all values for descriptor 001002 of the data section
pybufrkit query 001002 BUFR_FILE

# Query for those root level 001002 of the BUFR Template
pybufrkit query /001002 BUFR_FILE

# Query for 001002 that is a direct child of 301001
pybufrkit query /301001/001002 BUFR_FILE

# Query for all 001002 of the first subset
pybufrkit query '@[0] > 001002' BUFR_FILE

# Query for associated field of 021062
pybufrkit query 021062.A21062 BUFR_FILE

# Filtering through a number of BUFR files with Script support
# (find files that have multiple subsets):
pybufrkit script 'if ${%n_subsets} > 1: print(PBK_FILENAME)' DIRECTORY/*.bufr

# Lookup information for a Element Descriptor (along with its code table)
pybufrkit lookup -l 020003

# Compile a BUFR Template composed as a comma separated list of descriptors
pybufrkit compile 309052,205060

Library Usage

The following code shows an example of basic library usage

# Decode a BUFR file
from pybufrkit.decoder import Decoder
decoder = Decoder()
with open(SOME_BUFR_FILE, 'rb') as ins:
    bufr_message = decoder.process(ins.read())

# Convert the BUFR message to JSON
from pybufrkit.renderer import FlatJsonRenderer
json_data = FlatJsonRenderer().render(bufr_message)

# Encode the JSON back to BUFR file
from pybufrkit.encoder import Encoder
encoder = Encoder()
bufr_message_new = encoder.process(json_data)
with open(BUFR_OUTPUT_FILE, 'wb') as outs:
    outs.write(bufr_message_new.serialized_bytes)

# Decode for multiple messages from a single file
from pybufrkit.decoder import generate_bufr_message
with open(SOME_FILE, 'rb') as ins:
    for bufr_message in generate_bufr_message(decoder, ins.read()):
        pass  # do something with the decoded message object

# Query the metadata
from pybufrkit.mdquery import MetadataExprParser, MetadataQuerent
n_subsets = MetadataQuerent(MetadataExprParser()).query(bufr_message, '%n_subsets')

# Query the data
from pybufrkit.dataquery import NodePathParser, DataQuerent
query_result = DataQuerent(NodePathParser()).query(bufr_message, '001002')

# Script
from pybufrkit.script import ScriptRunner
# NOTE: must use the function version of print (Python 3), NOT the statement version
code = """print('Multiple' if ${%n_subsets} > 1 else 'Single')"""
runner = ScriptRunner(code)
runner.run(bufr_message)

For more help, please check the documentation site at http://pybufrkit.readthedocs.io/

pybufrkit's People

Contributors

Stargazers

Watchers

pybufrkit's Issues

Migrate CI to use GitHub actions

Travis CI no longer works.

Can't filter files

pybufrkit decode -m --filter '${%data_category} == 2' FILE
does not work because filter is an unrecognized argument

Update to Tables V39

Hi ywangd,
could you pls update to V39 of the BUFR Table format?
V38 Tables are empty files.

BR,

Dominic

BUFR file with all date values off-by-one

Thanks for the work on this excellent library. I have been having great success with processing surface observations transmitted in near realtime in BUFR format. I have been doing this for a couple of months now and occasionally hit an oddity with a random BUFR file seemingly having a timestamp that is one-off from what the current time is. An example is this:

example.zip

pybufrkit appears to decode this as a timestamp of 2025-02-26 19:01+00, which is strangely +1 in all year, month, day, hour, and minute from the current timestamp. I am unsure of a means to fully debug this outside of pybufrkit and I could totally see this just being an encoding issue with the upstream generator, but figured I would drop an example here to see, if by chance, there is something else going on!

Thank you.

Lookup descriptor by their description

For an example, what are the descriptors that has the word "wind" in its description.

Radiosonde data - apparent (and misterious) disparity in data shape

Hello.

I have some Radiosonde profiles, in BUFR format. I have tried reading them using both pybufrkit and eccodes. Specifically for eccodes, I adapted this script from confluence.ecmwf.int

My aim is to read in the whole sequence of BUFR and concatenate/convert them to an xarray dataset.

Specifically, when running following command,

pybufrkit decode -a bufr309052_all_20240125_1108_3.bfr > "../testout.json"

I have:

        # --- 1 of 3592 replications ---
        303054 (Temperature, dewpoint and wind data at a pressure level with radiosonde position)
            004086 LONG TIME PERIOD OR DISPLACEMENT None
            008042 EXTENDED VERTICAL SOUNDING SIGNIFICANCE 65536
            007004 PRESSURE 100000.0
            010009 GEOPOTENTIAL HEIGHT 126
            005015 LATITUDE DISPLACEMENT (HIGH ACCURACY) 0.0
            006015 LONGITUDE DISPLACEMENT (HIGH ACCURACY) 0.0
            012101 TEMPERATURE/AIR TEMPERATURE None
            012103 DEWPOINT TEMPERATURE None
            011001 WIND DIRECTION None
            011002 WIND SPEED None

However, in both cases (pybufrkit and eccodes), depending on the file I ingest in my python function inside my loop, i observe a mismatch in size (shape) of my data:
len(airT) = len(dewT) = len(time) -1

More specifically:

[+] -------------------------------------------
[+] Filename:  03/bufr309052_all_20240331_1104_0.bfr
[+] -------------------------------------------
      variable: dtime   --> lenght: 2720
      variable: dlat    --> lenght: 2720
      variable: dlon    --> lenght: 2720
      variable: airt    --> lenght: 2720
      variable: geopot  --> lenght: 2720

[+] -------------------------------------------
[+] Filename:  01/bufr309052_all_20240131_1105_0.bfr
[+] -------------------------------------------
      variable: dtime   --> lenght: 2854
      variable: dlat    --> lenght: 2854
      variable: dlon    --> lenght: 2854
      variable: airt    --> lenght: 2853    |<--------
      variable: geopot  --> lenght: 2853    |<--------

Now my questions:
Why is that? Shouldn't lat, lon, time and parameters always be the same size?
How can I proceed? Should I filter? Skip? Ignore?

Thanks for any suggestion or constructive comment you might be willing to share with me.

WMO : Update to V40 and V41

Context

I am thinking to use your library for decoding/encoding BUFR file from netCDF from the latest version table in order to use the latest sequence table d from version 41.

Issue

I am not able to decode my file using the librairy because the following are not updated yet.

Would it be possible to import v40 and v41 from WMO : latest version is now 41 : https://community.wmo.int/en/activity-areas/wis/latest-version ?
How to import local table if not present ?

Thanks a million in advance.

iter(query_result) fails with AttributeError

The QueryResult class in the dataquery module defines a method __iter__. Calling this method unconditionally fails with AttributeError, due to calling a historical dict method viewitems that existed in Python 2.

To reproduce:

from pybufrkit.decoder import Decoder, generate_bufr_message
from pybufrkit.dataquery import NodePathParser, DataQuerent
fn = "/home/gholl/checkouts/pybufrkit/tests/data/jaso_214.bufr"
with open(fn, "rb") as fp:
    for bufr_message in generate_bufr_message(Decoder(), fp.read()):
        query_result = DataQuerent(NodePathParser()).query(bufr_message, '001015')
        iter(query_result)

Using Python 3.11, this fails with:

Traceback (most recent call last):
  File "/data/gholl/checkouts/protocode/mwe/pybufrkit-attributeerror.py", line 7, in <module>
    iter(query_result)
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/pybufrkit/dataquery.py", line 313, in __iter__
    return iter(self.results.viewitems())
                ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'collections.OrderedDict' object has no attribute 'viewitems'

I'm using pybufrkit 0.2.19.

Exaples Json File

Greetings I'm from the chile weather service. I am new in Python.
We are testing your development.
Do you have any JSON file of test to see the encoder?
We are using template 307080 SYNOP.
regards

IndexError on NUMERIC_MISSING_VALUES

In working with BUFR data from SIO Lagrangian Drifter Lab, I found some data where nbits=38 which caused an IndexError in this section of bitops.py as the number of bits only goes up to 32:

    def read_uint_or_none(self, nbits):
        value = self.read_uint(nbits)
        if nbits > 1 and value == NUMERIC_MISSING_VALUES[nbits]:
            value = None
        return value

To fix this, I'd like to submit a pull request that will raise the valid nbits to 64 as a quick fix in constants.py. As a more robust option, is it worth replacing this with a function that can handle the higher options (and perhaps use the constant for quick computation of the value)? something like

def is_missing_value(nbits, value):
    if nbits < 33:
        return value == NUMERIC_MISSING_VALUES[nbits]
    else:
        return value == 2 ** (nbits - 1)

If so, I can work on a patch for that too.

Master tables V31 support?

Is version 31 of the master tables going to be added? Or is there a way to take the tables as provided from WMO (http://www.wmo.int/pages/prog/www/WMOCodes/WMO306_vI2/LatestVERSION/LatestVERSION.html) and turn them into what is required? I couldn't find any instructions on how to do that.

Thanks for the great tool!

Thanks
Steph

Please update the new version in pip

Thanks.

Error: Cannot process descriptor UNDEFINED (020237) of type: UndefinedElementDescriptor

Test data are weather reports of the official german weather provider: https://opendata.dwd.de/weather/weather_reports/synoptic/germany/

Command: pybufrkit decode file.bin

WARNING: Cannot find sub-centre 0 nor valid default. Local table not in use.
Error: Cannot process descriptor UNDEFINED (020237) of type: UndefinedElementDescriptor

How should that be handled? I don't think there is an error in the files.

Thx!

Master Table v36

WMO has released up to v36. I was wondering if I can help with updating the master version? I update our tables yearly in the summer. I pull the tables from https://community.wmo.int/activity-areas/wmo-codes/manual-codes/latest-version. Do you have a script to convert from the csv on that website to the json format you use?

Provide an utility to convert bufr tables of other formats

PyBufrKit has its own format for bufr tables and these are not used anywhere else. It would be useful if there is tool to convert tables of more popular formats, e.g. wmo and ecCdoes to pybufrkit's own format.

Master version 34

Hello! I was wondering when master version 34 will be supported.

Thanks!

PyInstaller doesn't include directories

If you create a Decoder object in a script and build an executable out of that script using pyinstaller it will not run properly. It will complain about the missing tables and definitions directories. The only solution to this problem I have found is modifying the pyinstaller .spec file to force pyinstaller to include the directories when building the executable.

Template Compiler Utilization

Decoding EUMETSAT AMV Bufr file

I used the command pybufrkit decode -j BUFR_FILE to decode an EUMETSAT AMV Bufr file.

The command works but the json files that comes out does not have a proper header.

I have a sample file here:

https://drive.google.com/file/d/1xf47kBRex-52AIvjah4azFOvptdOeLwb/view?usp=sharing

Thank you for any advice you can offer.

read the subsets and the name of fields after reading FlatJsonRenderer().render(bufr_message)

HI all I'm trying to extract some data from a bufr file but I don't understand how to read the field names and the different subsets to extract data as U-COMPONET, V-COMPONENT, PRESSURE etc after reading any bufr_message. I send in attachment an example file. thank you very much in advance
L-000-MSG4__-MPEF________-AMV______-000001___-202106171330-__.zip

Decoding NCEP BUFR files

Hello,

I'm interested in using this library to decode NCEP BUFR files that contain forecast atmospheric sounding profiles. An example file can be found here.

It seems to decode the metadata, but very little in the way of actual profile data. I tried converting the local NCEP tables to json, and the library is detecting them, but still no truly "successful" output.

Am I doing something wrong, or is this due to the NCEP formatting? When I pursued eccodes as an option, the developers said that NCEP BUFR/prepBUFR wasn't supported, so if that's the case here, I can close the issue and move on.

Here's the decoder output... I'll also include my Table B and Table D jsons below the output, and were derived from these tables here

Due to text limits, output will be provided in the following comments...

BUFR message with year of 0

Like in #33, I have an example BUFR file for your consideration. This yields a year value of zero

example.zip

New tables encoding

Thanks for sharing.
It's the first pure python lib that can decode my BUFR files.
I didn't find in the package which tool encode the tables into your own json tables format.
I would need to access to the new tables 0.29.

Override table related information when decoding

Not all messages declare table related information (table version, local table version etc.) correctly. When it happens, the message in question cannot be decoded with its declared tables. It would be useful if we can override these information from outside of the messages when decoding.

Test data

I'm a member of the open-source Pillow imaging library for Python.

https://github.com/python-pillow/Pillow

I'm trying to increase test coverage of the 20-year-old-plus code base by adding unit tests for more file formats.

Pillow has very limited BUFR support: only a stub that can recognise the format. Read and write would require an extra handler. Nevertheless, I'd like to test what is there.

Would it be possible to use and redistribute a BUFR file from https://github.com/ywangd/pybufrkit/tree/master/tests/data/ (or benchmark_data/) in the Pillow codebase as part of the test suite?

Pillow uses an open source PIL Software License: https://github.com/python-pillow/Pillow/blob/master/LICENSE

Thank you!

Unable to process NCEP BUFR files - exits with StopIteration

Something is throwing an exception deep inside pybufrkit while processing NCEP bufr files and I'm not able to figure out where it's coming from.

Data file:
http://static.skysight.io/ncep.bufr.gz

Versions:
% pybufrkit --version
pybufrkit: 0.2.20

% python --version
Python 3.10.6

Output:
% pybufrkit decode --continue-on-error -m /tmp/ncep.bufr > out.txt
Traceback (most recent call last):
File "/Users/plantain/.pyenv/versions/3.10.6/lib/python3.10/site-packages/pybufrkit/decoder.py", line 422, in generate_bufr_message
bufr_message = decoder.process(
File "/Users/plantain/.pyenv/versions/3.10.6/lib/python3.10/site-packages/pybufrkit/decoder.py", line 98, in process
nbits_decoded += self.process_section(bufr_message, bit_reader, section)
File "/Users/plantain/.pyenv/versions/3.10.6/lib/python3.10/site-packages/pybufrkit/decoder.py", line 125, in process_section
parameter.value = self.process_template_data(bufr_message, bit_reader)
File "/Users/plantain/.pyenv/versions/3.10.6/lib/python3.10/site-packages/pybufrkit/decoder.py", line 203, in process_template_data
template_processing_func(state, bit_reader, template_to_process)
File "/Users/plantain/.pyenv/versions/3.10.6/lib/python3.10/site-packages/pybufrkit/templatecompiler.py", line 327, in process_compiled_template
process_statements(coder, state, bit_operator, compiled_template.statements)
File "/Users/plantain/.pyenv/versions/3.10.6/lib/python3.10/site-packages/pybufrkit/templatecompiler.py", line 367, in process_statements
process_statements(coder, state, bit_operator, statement.statements)
File "/Users/plantain/.pyenv/versions/3.10.6/lib/python3.10/site-packages/pybufrkit/templatecompiler.py", line 344, in process_statements
getattr(state, statement.method_name)(*statement.args)
File "/Users/plantain/.pyenv/versions/3.10.6/lib/python3.10/site-packages/pybufrkit/coder.py", line 184, in add_bitmap_link
idx_descriptor, _ = self.next_bitmapped_descriptor()
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/plantain/.pyenv/versions/3.10.6/bin/pybufrkit", line 8, in
sys.exit(main())
File "/Users/plantain/.pyenv/versions/3.10.6/lib/python3.10/site-packages/pybufrkit/init.py", line 274, in main
command_decode(ns)
File "/Users/plantain/.pyenv/versions/3.10.6/lib/python3.10/site-packages/pybufrkit/commands.py", line 62, in command_decode
for bufr_message in generate_bufr_message(decoder, s,
RuntimeError: generator raised StopIteration