Comments (5)
What's actually the problem you're describing:
- Not using certain segments of the data?
- npTDMS not fast enough?
In the first case, your suggestion is valid.
In the second case, lazy loading will actually slow down npTDMS. We've got issue #29 for that though, which I should finish some time in the future.
from nptdms.
Let me first add that I tend to deal with large (>100MB, < 2GB) tdms files. typically only one or two groups (consisting of maybe 5 channels) make the size. In data processing, often, there are only two channels used at a time.
This already gives the first use case: Selectively load less than 2/5 of the data. [If I understand the tdms structure correctly, once you've read the metadata from one segment you know the location of a channels raw data in the segment and thus can selectively only load this data. At other times, not all segments are used but you usually only know that after you once loaded the whole thing.] So I'd imagined a speedup in loading (not?) and less memory consumption.
I'd guess only loading the metadata is very quick. If you imagine a user interface where you can select between many tdms files (that may not have the most descriptive filename) you can access the tdms comment and the available channels without loading the data. (update: strictly this does not require selective/lazy loading)
update: Don't get the question wrong, I posted this here as I'm not aware of any other place to discuss these things for npTDMS. I want to see if there's need for that or if I have any major misconceptions about the file format, which might (regarding to your answer) be the case.
from nptdms.
Your saying you're using two channels "at a time", but that doesn't mean you're not using all of them ;-)
I can see that lazy loading can be really useful. I'm not planning to implement this (I more or less quit the engineering world. Not enough free software, too much Windows bloat), but I'm willing to finish my C++ implementation.
If this issue really is about you wanting things faster, I think it's easier to finish the C++ implementation than to implement lazy loading. My tests on ~1GB files gave performance increases of a factor 5, without even optimizing.
from nptdms.
you might compare the same two channels in different measurement files and dont bother with the other ones e.g.. (Or if you open a file you will remember "Oh yeah this measurement did not go so well" after looking at one channel :p )
In any case, 'm looking forward to your C++ implementation.
And I'll probably have a look at loading metadata separately (but not loading channels selectively). Looks quite easy to implement and the performance cost is probably low.
from nptdms.
I looked into implementing this a while back but never got around to it and also found it wasn't that straightforward, due to the way the metadata is stored in segments and having to support interleaved data (although it might be best to force loading all data if it is interleaved, as performance would take a big hit in this case). It should definitely be doable though and would be useful.
npTDMS already loads the metadata before loading the actual data so it can allocate a complete array once for a channel and then read directly into that array. For certain TDMS files with many very small segments this turns out to not work that well due to the large amount of memory taken up by the metadata of each segment (eg. see #19), but for most files it seems to work fairly well. If you run tdmsinfo with the --debug
option it will show the amount of time taken to read metadata and then the time taken to read the actual data. Would probably be useful to show the memory usage of each too.
from nptdms.
Related Issues (20)
- read error UTF8 HOT 5
- Problem with reading SpooledTemporaryFile HOT 2
- Error in loading image files HOT 2
- Deprecation warning for setup.py HOT 1
- [Bug] read version from import lib HOT 1
- Type Hints/Annotations HOT 1
- struct.error: unpack requires a buffer of 4 bytes HOT 3
- Group Properties Not Reading HOT 14
- Error encountered when writing DAQmx data with scale information HOT 1
- [Feature] Add number of samples to read per chunk HOT 3
- API Reference HOT 2
- [Feature] Option to return Dataframe as PyArrow dtype HOT 1
- How to ignore `warning` in read function? HOT 3
- Missing 8 rquired positional arguments when trying to create ChannelObject HOT 1
- Inconsistent Behaviour With Truncated Files HOT 5
- Preserve raw labview timestamps when defragmenting
- defragment fails for scalar channel.data
- thermocouples_reference module is missing HOT 1
- datetime64 shift by 2h from original time HOT 5
- 4 tests fail HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nptdms.