Comments (11)
Does it need to do a conversion before saving? The problem is that it gets too large for the MongoDB record size, which Romanesco is using for its task queue. If there are no conversions it should work on larger file sizes.
from flow.
Is this happening inside romanesco or inside girder? Where do you see the error message?
from flow.
the output matrix does sit safely in the local storage after an analysis is finished, it is a table:rows object.
So the limitation size is only during conversion / processing jobs. Oh. I was able to save it as a rows.json just now.
To Zach’s question, I saw the error when I tried to save a large table.rows object as a CSV using the Flow UI. Jeff got it right that it was during Romanesco conversion. Won't I bump into this problem any time I need to run an analysis on the big dataset, though?
Will converting the assetstore to files have any beneficial effect on this or is this a size limitation of objects Romanesco can work on ?
On Feb 18, 2016, at 7:27 AM, Jeffrey Baumes [email protected] wrote:
Does it need to do a conversion before saving? The problem is that it gets too large for the MongoDB record size, which Romanesco is using for its task queue. If there are no conversions it should work on larger file sizes.
—
Reply to this email directly or view it on GitHub #181 (comment).
from flow.
The real fix for this is that we shouldn't be passing data objects through the message queue, but rather downloading them via input specs, perhaps using the girder_io utility in romanesco.
from flow.
It's a Romanesco limitation, not a Girder one. Perhaps if we used a different message queue instead of MongoDB this problem would go away. We could also enforce that data is stored to Girder directly on upload (and store things directly to Girder after running an analysis) in original format before running analyses on them (Minerva has this sort of approach). However, this would not allow the "no login needed" method of running analyses that we wanted for Arbor.
Odd that it is small enough to be sent back to the browser as table:rows. The same data in table:rows.json format must expand enough to go beyond the size limitation (16MB I believe?).
from flow.
16MB seems like a decent place to start enforcing login - perhaps we change it so data goes into a working area if the user is logged in, and if logged out they get a nicer message in this instance instructing them to log in to use larger file sizes.
from flow.
this does sound like a good strategy. So the near term workaround at the “analysis level" is to pass proxies between analyses (a table that includes the filename where the full data is stored). This seems like the only quick fix, as the assetstore architecture has no effect on this problem since it was a message queue thing.
On Feb 18, 2016, at 7:46 AM, Jeffrey Baumes [email protected] wrote:
16MB seems like a decent place to start enforcing login - perhaps we change it so data goes into a working area if the user is logged in, and if logged out they get a nicer message in this instance instructing them to log in to use larger file sizes.
—
Reply to this email directly or view it on GitHub #181 (comment).
from flow.
Correct, changing the Girder assetstore should not affect this issue.
from flow.
I could also make a quick conversion utility script to support a workaround of "download, convert, upload, save without conversion" if that would be helpful. Local-machine conversions can of course go up to any size. Also, note that larger data can pass between steps of a workflow just fine - it's the final output that has size restrictions.
from flow.
If this isn't too much trouble, I'd appreciate the example of how this
might be done. I'll think about the proxy solution at the same time.
On Thu, Feb 18, 2016 at 8:02 AM, Jeffrey Baumes [email protected]
wrote:
I could also make a quick conversion utility script to support a
workaround of "download, convert, upload, save without conversion" if that
would be helpful. Local-machine conversions can of course go up to any
size. Also, note that larger data can pass between steps of a workflow just
fine - it's the final output that has size restrictions.—
Reply to this email directly or view it on GitHub
#181 (comment).
from flow.
I realized my suggestion has a limitation - a downloadable file needs to be in a serialized format - so table:rows cannot be downloaded (relates to that other issue) - it is "stuck" in the browser :(
Here is the conversion script anyway if it helps:
https://gist.github.com/jeffbaumes/2a8df131b158b0b6c0a1
from flow.
Related Issues (20)
- vagrant provision currently broken HOT 8
- Client error after provision, then goes away HOT 5
- cannot save table:rows type on Arbor AWS HOT 2
- easy mode apps just stopped working HOT 3
- AWS install documentation
- PNG display through flow newly broken HOT 2
- Cannot download a string variable through the Flow interface HOT 1
- Warn user of unsaved data/script/etc. before navigating away
- Render links in descriptions for analyses
- Make sure we are logging analyses that are being run
- Vega scatterplot plugin doesn't work anymore HOT 1
- vagrant up failure HOT 4
- xxx.CSV file not recognized as a CSV file HOT 1
- Trellis graphics in R don't work with cardoonPlot HOT 9
- Trees produced from Treetimer (OpenTree) can only be downloaded/visualized/saved as json HOT 1
- Text file interpretation into various datatypes should be improved HOT 3
- girder version change cause instability? HOT 2
- update nginx configuration to pass larger datasets through to girder
- Arbor apps only work when user is logged in HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flow.