josephhardinee / pydsd Goto Github PK
View Code? Open in Web Editor NEWPython Library for working with disdrometer data.
License: GNU Lesser General Public License v2.1
Python Library for working with disdrometer data.
License: GNU Lesser General Public License v2.1
Hi @josephhardinee, I have a problem processing the file when I am trying to use functions with time. I am trying to use this codes using the ATM4.txt files
import pydsd
import matplotlib.pyplot as plt
import pydsd.plot
import pydsd.partition
filename = 'merged_file.txt'
dsd = pydsd.read_parsivel(filename)
dsd.calculate_dsd_parameterization()
pydsd.plot.plot_dsd(dsd)
plt.show()
Here is the error:
runfile('/home/jeff/.config/spyder-py3/DSD_sample.py', wdir='/home/jeff/.config/spyder-py3')
Traceback (most recent call last):
File "/home/jeff/anaconda3/envs/wradlib/lib/python3.10/site-packages/spyder_kernels/py3compat.py", line 356, in compat_exec
exec(code, globals, locals)
File "/home/jeff/.config/spyder-py3/DSD_sample.py", line 21, in
pydsd.plot.plot_dsd(dsd)
File "/home/jeff/anaconda3/envs/wradlib/lib/python3.10/site-packages/PyDSD-1.0.6.2+5.g0a54715-py3.10.egg/pydsd/plot/plot.py", line 72, in plot_dsd
dsd.time["data"].filled(),
AttributeError: 'list' object has no attribute 'filled'
Do you have idea to fix this? My programming skills are not that advanced. Thank you
After talking with @nguy the following represents what we see as the roadmap to getting PyDisdrometer to a V1 release.
Need to add more QC filters. First among them is a filter based on number of detected drops.
Also need to provide a notebook showing the new filtering steps.
I want to calculate and plot slope coefficient of DSD.I have tried with moments method this way is it useful?
self.fields['mslope'] = {'data': np.zeros(len(self.time))}
self.fields['mslope']['data'] = self._calculate_mslope(self.Nd[t])
def _calculate_mslope(self):
G=np.divide(self._calc_mth_moment(3)*_2,np.multiply(self._calc_mth_moment(2),self._calc_mth_moment(4)))
mslope = np.divide(self._calc_mth_moment(2),self.calc_mth_moment(3))(G/(1.0-G))
return mslope
Select the time you want from the file. So for instance, if your file was 10 UTC to 13 UTC, you could read in only the times between 11 UTC and 12 UTC. This would be useful for the 2DS and HVPS readers, since they have to be narrowed down to only liquid times before analysis.
Need to make sure the spectral handling works with 2DVD as well as making sure it can be made more general.
This is now the 2DS only issue.
It would be great if a reader could be added to read in and process 2DS and HVPS .txt files, as these have similar measurements to disdrometers.
2DS: This is split into 2 files, one for Horizontal and one for Vertical, .txt file
-Time is recorded in seconds since 00 UTC, dividing by 3600. puts it into hours.
-The concentration column is the total concentration of drops at that particular time.
-All columns where the header begins C: are the concentrations of drops in each bin, in #/L/um.
-The spread of the bins is variable.
I believe the data is also tab separated. The example shown I poorly converted to an excel spreadsheet to get a better image of the headers.
HVPS: Vertical only, .txt file
Similar columns and locations to the 2DS data, but the bin sizes are different, as this instrument measures larger particles.
It would be nice if temperature could be set manually, as well as the time chosen to evaluate. Choosing the times to start and end evaluation would allow the user to choose only the time period where the drops were liquid.
While PyDisdrometer has some basic plotting routines, I plan on going through and adding more before the v1.0 release. Please comment here on any routines in particular you would like to see.
The DSD object needs the ability to filter out size columns from the drop spectra and from N(D). This helps with QC and cross comparisons.
I'm going to create issues for each of the remaining blocks to a v1.0 release.
First is file reader unit/functional tests. All file readers need at least a basic example file/functional test to make sure they don't get broken.
If anyone wants to put together a small example file with 5 time samples and PR it in with a note as to what reader it belongs to, that would make my life easier. I'll also accept functional PRs that test the whole reading process for each reader.
Insofar as I can tell the data model is not full self describing..
I think we need a self.time_start (as a datetime object) and a self.location = lat,lon
while these could go in notes I think basic usability needs them to be exposed on the base class.
I thought I'd bring this offline conversation into issues to record for future changes and so we know why we chose a certain design.
Currently DropSizeDisdrometer
objects store data in the class (e.g. dsd.time). I propose the use of dictionaries to store the data, and specifically dsd.fields
. By also storing the data as dictionaries, this would allow metadata to be stored as well (especially since there is not a standard units used in the field of study). In addition, the derived fields could be stored under dsd.derived_fields
, which would clearly separate which fields were derived from PyDisdrometer.
Secondly, I would suggest standardizing time units within the package. I convert all time units to Epoch time in this package for easier plotting routines.
To plot time in file along x-axis i changed
import numpy as np
from DropSizeDistribution import DropSizeDistribution
import datetime
import pandas as pd
UTC_OFFSET_TIMEDELTA = datetime.timedelta(0, 19800) # 5 X 3600 + 30 X 60 Indian time offset used to convert time in IST to UTC.
def read_jwd_utc(filename):
'''
Takes a filename pointing to a Joss-WaldVogel file and returns
a drop size distribution object.
Usage:
dsd = read_jwd(filename)
Returns:
DropSizeDistrometer object
'''
reader = JWDReader_utc(filename)
dsd = DropSizeDistribution(reader.time, reader.Nd, spread=reader.spread,
diameter=reader.diameter, rain_rate=reader.rain_rate,
bin_edges=reader.bin_edges)
return dsd
class JWDReader_utc(object):
'''
JWDReader class takes takes a filename as it's only argument(for now).
This should be a Joss-Waldvogel datafile.
'''
diameter = np.array([
0.359, 0.455, 0.551, 0.656, 0.771, 0.913, 1.116, 1.331, 1.506, 1.665,
1.912, 2.259, 2.584, 2.869, 3.198, 3.544, 3.916, 4.350, 4.859, 5.373])
spread = np.array([
0.092, 0.100, 0.091, 0.119, 0.112, 0.172, 0.233, 0.197, 0.153, 0.166,
0.329, 0.364, 0.286, 0.284, 0.374, 0.319, 0.423, 0.446, 0.572, 0.455])
def __init__(self, filename):
self.filename = filename
self.rain_rate = []
self.Nd = []
self.time = []
self._read_file()
self._prep_data()
self.bin_edges = np.hstack(
(0, self.diameter + np.array(self.spread) / 2))
def conv_md_to_nd(self, Nd):
F = 0.005
t = 30.0
v = 9.65 - 10.3 * np.exp(-0.6 * self.diameter)
return np.divide(Nd, (F * t * np.multiply(v, self.spread)))
def _read_file(self):
with open(self.filename) as f:
next(f)
for i,line in enumerate(f):
self.time.append(datetime.datetime.strptime((line.split('\t')[0]+line.split('\t')[1]),'%Y-%m-%d%H:%M:%S')-UTC_OFFSET_TIMEDELTA)
def _prep_data(self):
self.Nd = np.array(self.Nd)
self.time = pd.Series(np.array(self.time))
#self.time = self.time - self.time[0]
self.rain_rate = np.array(self.rain_rate)
Hi! PyDSD seems a good tool for DSD processing and very much working when I tried it. However, what I have are files for every minute of disdrometer data. I would like to ask if it is possible to process multiple files of parsivel disdrometer in the form of ATM4.txt using PyDSD? Thank you very much!
If possible, it would be useful to pick the temperature used, as well as the times used in analysis.
When installing PyDSD from source, the fit
module is missing from the installation. It should be added in the setup.py
.
@nguy , I am thinking of announcing an end of life for python 2.7 at the end of the year (or earlier?). It mirrors the move by a lot of projects to cut support and it would make some maintenance tasks a fair bit easier. Thoughts?
I am using radar data in vertical pointing mode so how can i perform scattering simulations in case of vertical incidence ?
Thank you
Port over the UA, TS calculations, etc.
I have been trying to scatter particles at W band and it fails (kernel dies, or python restarts). I have tried narrowing it down to a particular drop size but have had no luck. I have attached a "short" file from the RACORO field campaign of a DSD of just one time step that failed (this must be read in using read_racoro, which I just submitted a pull request for). I also tested the W-band scattering on an APU file from IFloodS, and it failed as well. Do you know what is causing this failure?
IFloodS_APU11_2013_0529_dsd.txt
racoro2ds_short.txt
TypeError Traceback (most recent call last)
in ()
----> 1 dsd.calculate_R_Zh_relationship()
/Users/matt493/PyDSD/pydsd/DropSizeDistribution.pyc in calculate_R_Zh_relationship(self)
439
440 popt, pcov = expfit(
--> 441 np.power(10, 0.1 * self.fields["Zh"]["data"][self.rain_rate["data"] > 0]),
442 self.fields["rain_rate"]["data"][self.fields["rain_rate"]["data"] > 0],
443 )
TypeError: 'NoneType' object has no attribute 'getitem'
It seems this is occurring because self.rain_rate doesn't exist. It is now in the "fields", so should be called by: self.fields['rain_rate']['data'] instead.
We need support for saving the scattering matrices to a file, and reloading these. Under the hood this is supported by PyTMatrix so we just need to tap into that. When processing a large number of files this can significantly reduce the computational overhead.
Hello josephhardinee,
Thank you for this excellent project. I have some Parsivel data and calculate_dsd_parametrization returns empty (all nan) field for D0 all the time. Is this something known to anyone else? I can send an example data if need be. All the other parameters in ['D0', 'Dmax', 'Dm', 'Nt', 'Nw', 'N0', 'W', 'mu'] seem fine.
Could it possibly be related to the use of numpy functions like cumsum instead of nancumsum on the masked Nd data?
I get a"Warning message" that
UserWarning: Warning: converting a masked element to nan.
return array(a, dtype, copy=False, order=order)
/home/dagnachew/anaconda3/lib/python3.6/site-packages/pydsd/DropSizeDistribution.py:349: RuntimeWarning: invalid value encountered in less
cross_pt = list(cum_W < (cum_W[-1] * 0.5)).index(False) - 1
(Pdb) c
/home/dagnachew/anaconda3/lib/python3.6/site-packages/pytmatrix/psd.py:183: RuntimeWarning: invalid value encountered in greater
psd[(D > self.D_max) | (D==0.0)] = 0.0
On Ubuntu 16. Python 3.6 (Anaconda)
Thanks
How can we read more than one files simultaneously and combine data to plot combined total drop size distribution.
PyDisdrometer needs the ability to save out files into a standard format. While I have part of this started, if someone beats me to it I will accept it. The plan is it should be a mostly CF compliant netcdf file that is similar to the internal structure we currently use. If anyone has coments on this, please feel free to place below.
The APU data works well with PyDisdrometer -- how about adding support for 2DVD data as well? Thanks!
It would be nice to have the double-moment rain estimators from the disdrometer data as well, R(Kdp,Zdr) and R(Zh, Zdr). Thanks!
I would like to incorporate XArray into PyDSD to make many of the operations much easier (In particular, netCDF writing).
Here is my proposal, I would like some feedback.
All entries in fields become DataArrays. All coordinates will be stored in a DropSizeDistribution.coords. We will add properties for time, diameter, velocity bin to point into the coords dictionary. Most things will be able to stay the same as far as how they work. The one bigger change is the values would now be accessed as something like DropSizeDistribution['Nd'].values instead of by ['data'].
Benefits: We can autoroll datasets and make writing much easier. Xarray objects are easier to work with and this opens the possibility of possibly adding some dask integrations as well.
@nguy and anyone else that has a vested interest, I would love to get your feedback. The actual work to write the changes will be pretty quick I believe.
One big thing I have not figured out how to address is making sure all of the lengths stay in sync. This is a problem with our current implementation as well though so we don't lose anything here.
Data is similar to the 2DS data in set up (similar columns and units). However, the bin headers are broken, and must be read in from a separate file.
HVPS data:
https://www.dropbox.com/s/qdb9y0ts4nf8y05/HVPS_20150205a_V_M1ice.txt.gz?dl=0
HVPS bins:
https://www.dropbox.com/s/tmui3kpk6lrywuz/HVPS_Bins.csv?dl=0
@nguy I've been tossing around the idea for a while of pulling out a lot of the operations from DropSizeDistribution object and moving them into library functions. This would simplify the DropSizeDistribution object quite a bit, and make testing easier. It would also make it easier to do things like have an operation on a dsd object (like a select) return a new dsd object.
The functionality I had in mind was moving scattering to it's own submodule, and similarly for the rain fits that have been in the object just as a part of it's legacy.
File "/home/yogesh/anaconda/lib/python2.7/site-packages/pydisdrometer/init.py", line 12, in
from . import partition
ImportError: cannot import name partition
Port over the Ulbrich and Atlas 1998 and Tokay and Short 1996 gamma distribution parameters, as well as exponential distribution parameter calculations.
Can I remove drops from the DSD if their fall velocity is ±50% of the empirical value from Beard (1976) theoretical velocity equation using PyDSD?
I appreciate all the built in file readers, but for instances where I apply qc to raw parsivel data and then try to read it in, I need to convert to something akin to ARM's formatting:
def read_precip(nc):
try:
nc_temp.close()
os.remove("temp.nc")
except:
pass
nc_temp = Dataset("temp.nc","w")
time = nc_temp.createDimension("time",1440)
particle_size = nc_temp.createDimension("particle_size",32)
raw_fall_velocity = nc_temp.createDimension("raw_fall_velocity",32)
time_ = nc_temp.createVariable("time","f4",("time"));
time_.units = "seconds since 1970-01-01 00:00:00 0:00"
time_[:] = date2num(num2date(nc.variables["time"][:],nc.variables["time"].units),units=time_.units)
Nd_ = nc_temp.createVariable("number_density_drops","f4",("time","particle_size"));
Nd_[:] = np.nansum(nc.variables["num_conc_rain"][:],axis = 1)
vel_ = nc_temp.createVariable("fall_velocity_calculated","f4",("particle_size"));
vel_[:] = v_gk(nc.variables["particle_size"][:])/100
rr_ = nc_temp.createVariable("precip_rate","f4",("time"));
rr_[:] = nc.variables["rain_rate_qc"][:]
spec_ = nc_temp.createVariable("raw_spectrum","f4",("time","particle_size","raw_fall_velocity"))
spec_[:] = np.moveaxis(nc.variables["qc_spectrum"][:], 1, 2)
raw_velo_ = nc_temp.createVariable("raw_fall_velocity","f4",("raw_fall_velocity"))
raw_velo_[:] = nc.variables["raw_fall_velocity"][:]
dia_ = nc_temp.createVariable("particle_size","f4",("particle_size"))
dia_[:] = nc.variables["particle_size"][:]
width_ = nc_temp.createVariable("class_size_width","f4",("particle_size"))
width_[:] = nc.variables["particle_size_bin_width"][:]
try:
return(pydsd.read_parsivel_arm_netcdf("temp.nc"))
finally:
nc_temp.close()
os.remove("temp.nc")
This isn't a huge problem, but being able to just build the dsd object with fields from non-supported file types would be great.
I notice that the 'nd' data is powered by 10 in function 'read_file' of ParsivelReader class. What is the reason for that? According to the manual of Parsivel Disdropmeter ,the data following code '90' is exactly the 'nd' data, which means number of drops in 1 m**3 volumn and 1mm drop diameter spread.Am i misunderstanding anything?
I'd like to request support for reading in rain rates from NASA GV data (JW and 2DVD).
I have parsivel and thies lpm data and want to use PyDSD to analysis. But my data files is slightly different then the one presented here as test data. I tried to load using package but got error. If there is a way to feed this data then it will be very helpful. For reference I have also uploaded the sample data here with a link https://drive.google.com/drive/folders/1wNPYCEzhoq2gFVm98JZQpKTLP5V9KbSm?usp=sharing
The 2DVD instruments, as well as others, have diameter bins that get fairly large. This can cause issues for the underlying scattering code. We need to implement a feature that allows a max effective scattering parameter to be introduced to the object. Something like
dsd.set_scattering_limits(0, 8)
. We do need to discuss whether this should affect the dsd parameterization code as well or be limited to just the scattering. @nguy, your thoughts?
Convert all times to a standardized Epoch time - '1970-1-1T00:00:00+0:00'
I have calculated new axis ratio for our site and want to use that for calculate radar parameters so to test performance of new equation i replace new equation in DSR.py module in the def bc(D_eq) function. But when i run command "dsd.calculate_radar_parameters" then python kernel is dying
def bc(D_eq):
'''Beard and Chuang Drop Shape relationship model.
Implementation of the Beard and Chuang Drop Shape model given in [1]
. This gives the ratio of the major to minor axis.
Parameters
-----------
D_eq: float
Volume Equivalent Drop Diameter
Returns
-------
axis_ratio: float
The ratio of the major to minor axis.
See Also
--------
tb : Thurai and Bringi DSR
pb : Pruppacher and Beard DSR
References
----------
..[1] Beard, Kenneth V., Catherine Chuang, 1987: A New Model for the
Equilibrium Shape of Raindrops. J. Atmos. Sci., 44, 1509-1524.
'''
#2.628e-02 * np.power(D_eq, 2) + 3.682e-03 * np.power(D_eq, 3) -
#1.677e-04 * np.power(D_eq, 4)
return 0.9946 + 0.02297 * D_eq - 0.02051 * D_eq ** 2 + 0.002795 * D_eq ** 3 -\
0.0001849 * D_eq ** 4
I wanted to write this down since we've gone back and forth a bit (or at least I have).
I'm going to modify all the readers to have the following time field. Time will be stored as a series of integers representing seconds since 1970-1-1 (epoch time). So
time['data'] = array([ 1.32332381e+09, 1.32332382e+09, 1.32332383e+09]) for instance. It will have a units string set as well.
'units': 'seconds since 1970-1-1 00:00:00+0:00'
Currently we have a mix of epoch time, time offset from beginning of file, and datetime instances.
Does this work for you @nguy ? I will try to route much of our time handling through some common functions so that if we change our mind in the future, its an easier update than modifying each reader individually. I think this reflects what we discussed at AMS,
The DropSizeDistribution object needs support for adjusting the terminal fall speed based on air density, or a user supplied fall speed. We can start by implementing the Gunn/Kinzer adjustment. This should only affect the terminal fall speed when calculated by formula, not the measured fall speed. In those cases it would also only propagate along to rain rate I believe.
We've had a request internally to add the ability to calculate the reflectivity weighted doppler velocity from the DSD. Two things are needed:
Not sure I'll implement the second just yet, but the first should be straight forward.
The front page readme.md needs updating. It should include some of the new functionality, as well as just being better organized.
Good beginner issue.
Hello Sir,
In calculation of DSD i dont understand the term "spread" , is id difference between two diameter interval?
def conv_md_to_nd(self, Nd):
F = 0.005
t = 30.0
v = 9.65 - 10.3 * np.exp(-0.6 * self.diameter)
return np.divide(Nd, (F * t * np.multiply(v, self.spread)))
can you explain ?
We have rudimentary support for the droplet spectrum (rather than just N(D)). I'll be expanding this out to help with some support for various ARM VAPs. There will be three main components to this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.