mit-lcp / downcast Goto Github PK
View Code? Open in Web Editor NEWTools for unpacking and converting data from the DWC system
License: GNU General Public License v3.0
Tools for unpacking and converting data from the DWC system
License: GNU General Public License v3.0
Downcast -------- This repository contains tools for processing and converting data from the DWC system into WFDB and other open formats. Requirements ------------ Python 3.4 or later is required. A Unix-like platform is required - Debian and CentOS have been tested; Mac OS might work as well. This package will not work on Windows. For processing data in BCP format, the ply package is required. For processing data directly from SQL Server, the pymssql package is required. (This package is now mostly abandoned and should probably be replaced with a different backend.) Quick start ----------- If you have access to the demo DWC database, download and unpack these files (about 30 GB uncompressed.) You will then need to create a "server.conf" file, which should look like this: [demo] type = bcp bcp-path = /home/user/dwc-demo (where /home/user/dwc-demo is the directory containing "Alert.dat", "Alert.fmt", etc.) See server.conf.example for other examples. The demo database spans the time period from 1:00 AM EDT on October 31, 2004, to midnight EST on November 1. To parse and convert a slice of the data (say, from 10:00 to 10:05 AM), first we initialize an output directory and set the starting time: $ ./downcast.py --init --server demo \ --output-dir /home/user/dwc-test-output \ --start "2004-10-31 10:00:00.000 -05:00" Then run a batch conversion while specifying the end time: $ ./downcast.py --batch --server demo \ --output-dir /home/user/dwc-test-output \ --end "2004-10-31 10:05:00.000 -05:00" If we wanted to keep going, we could run the same --batch command again, increasing the end timestamp each time. We don't need to specify the starting timestamp for --batch, since the "current" timestamp is saved automatically. To "finalize" the output (and forcibly truncate all patient records at the specified end time), we use the --terminate option. This wouldn't be done for a real database conversion, but it's useful for a simple test: $ ./downcast.py --batch --server demo \ --output-dir /home/user/dwc-test-output \ --end "2004-10-31 10:05:00.000 -05:00" \ --terminate This should result in a bunch of patient records in WFDB format, stored in /home/user/dwc-test-output.
In some cases, the stated sample range for a signal (scale_lower/scale_upper) is flat-out wrong. ECG signals in particular are often wrong.
WaveSampleHandler should calculate and report an accurate sample range (adcres/adczero) for each segment. This is required in order to correctly convert the record to other formats (e.g., using wfdb2mat.)
If --output-dir is specified as a relative path, e.g.:
downcast.py --init --server demo --output-dir example-output \
--start '2004-10-31 10:00:00.000 -05:00'
downcast.py --batch --server demo --output-dir example-output \
--end '2004-10-31 10:05:00.000 -05:00' --terminate
then something in the finalization process crashes:
File "/home/benjamin/downcast/downcast/subprocess.py", line 283, in _main1
self.handler.flush()
File "/home/benjamin/downcast/downcast/dispatcher.py", line 151, in flush
self._handler_flush(h)
File "/home/benjamin/downcast/downcast/dispatcher.py", line 313, in _handler_flush
handler.flush()
File "/home/benjamin/downcast/downcast/output/waveforms.py", line 156, in flush
self.archive.flush()
File "/home/benjamin/downcast/downcast/output/archive.py", line 165, in flush
rec.flush(self.deterministic_output)
File "/home/benjamin/downcast/downcast/output/archive.py", line 271, in flush
deterministic = deterministic)
File "/home/benjamin/downcast/downcast/output/archive.py", line 297, in _write_state_file
os.rename(tmpfname, fname)
FileNotFoundError: [Errno 2] No such file or directory: 'example-output/3d/demo_3d97e525-d794-4aa8-82e8-8821b8da12b4_20041031-1500/__phi_properties.tmp' -> 'example-output/3d/demo_3d97e525-d794-4aa8-82e8-8821b8da12b4_20041031-1500/_phi_properties'
It doesn't do this if the output directory is an absolute path.
This is bizarre. Nothing in the entire package calls chdir, so why should it matter if the path is absolute or relative?
In some strange cases we might see two simultaneous waveforms with the same label. WFDB requires that each signal in a multi-segment record has a unique name.
Currently, WaveSampleHandler will set all signal checksums to zero. The checksums should instead be set to the sum of all samples in the segment.
At fall transition time, when the clock switches from wrongly-labelled winter time to correctly-labelled winter time, there is often a small negative clock adjustment (after fixing the broken timestamps).
For example, the raw data might look like this:
TimeStamp SequenceNumber
2020-11-01 01:59:59.123 -05:00 657489599123
2020-11-01 01:00:04.218 -05:00 657489604243
Clearly there's no discontinuity here and the first message is mislabelled as -05:00 when it should be -04:00. But the delta in TimeStamp is only 5095 milliseconds vs. a delta in SequenceNumber of 5120.
There even seems to be a clock adjustment in those rare cases that DWC labels the summer timestamps correctly.
Normally a negative clock adjustment creates ambiguity, but in this case it seems it might be possible to disambiguate based on the timezone. Need to investigate further.
(There is usually no adjustment when the clock switches from correctly-labelled summer time to wrongly-labelled winter time. So the one-hour correction is right.)
When a patient is discharged, we need to mark the record as finalized.
Currently, we finalize a record automatically when there is a gap - i.e., some period of time when no new messages are seen, then a new message appears - but when the patient is discharged, there are no new messages, so this never happens.
(For testing, we can force all records to be finalized by using the --terminate argument, but that's no good for "real" conversion.)
The tricky thing is that since we are processing messages in parallel, it's hard to say which worker process is responsible for finalizing the record.
One way to deal with this would be to periodically check "what is the earliest unprocessed message in any queue"? Call that timestamp T_next. Then, if there are any unfinalized records for which the last processed message is earlier than (T_next - split_interval), those records should be finalized.
I don't think there's a good way to do this without stopping and then restarting all of the worker processes, but we don't need to do so frequently - doing it once for every 3 hours of data should be quite adequate.
Some numeric values (in particular, NBP) have multiple time values and we need to understand what they mean and how to use them.
TimeStamp seems to have one-second resolution.
SequenceNumber seems to have 5120-ms resolution.
Often the two values are wildly different (TimeStamp could be hours earlier.)
Often the same measurement appears multiple times with the same TimeStamp and differing SequenceNumber.
I am guessing, actually, that the TimeStamp is pretty meaningless - that it refers to the time when the measurement was first "requested" rather than when it was actually performed. I'm guessing that the SequenceNumber tells us when the measurement was reported, which might be a few seconds after it was measured.
It might be helpful to hear from somebody who is familiar with using these machines:
is NBP measured automatically (on a schedule) or does the nurse press a button to initiate the measurement? Or both?
how long does it usually take (from inflating the pressure cuff, to deflating it, to when the NBP measurements appear on screen)?
how long do the values stay on screen afterwards?
When a record is finalized, we need to generate a multi-segment header file for it.
Up until now I've been doing this by hand using a hacked version of 'wfdbjoin', but this should ideally be done by the WaveSampleHandler itself.
This requires re-reading the segment header files, and creating a layout header with the composite signal information, and a master header containing the names and lengths of the segments (and gaps, if any.)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.