Comments (4)
Hi, that could be a nice enhancement indeed. But just curious: if there is an error, how would you investigate the cause if you can't see the log files?
from dsbulk.
Ideally the error information from mapping-errors.log would print to stderr as logged ERROR messages so that the consuming program can redirect it to it's own logging as needed. The content of operations.log should go to stdout for the same reason (maybe it already does).
I'm also seeing empty files left behind in the "home" directory with the names of the tables that are being loaded. It's not a huge problem since they are empty, but it seems like the program should clean them up?
Thanks
from dsbulk.
Ideally the error information from mapping-errors.log would print to stderr as logged ERROR messages so that the consuming program can redirect it to it's own logging as needed.
That would be an option, but DSBulk creates many similar files for different kinds of errors. It would be a bit challenging to redirect everything to stderr (garbled contents).
I'm also seeing empty files left behind in the "home" directory with the names of the tables that are being loaded.
Now that's a first. Could you please give me a simple reproduction case? This is definitely not normal, DSBulk should not write to the home directory at all.
from dsbulk.
I haven't been able to reproduce the empty files issue so it may have been related to some intermittent problems I was having with the process getting killed by out of memory errors. Now that I have that resolved I'm not seeing any files left behind.
One unrelated question - the project description mentions "2-4x faster" than other bulk tools, is there any way to know what that should translate into in real numbers? I'm seeing between 1000-2000 rows/sec and I don't know if that's slow or fast? I imagine it's related to my Cassandra cluster performance.
Thanks
from dsbulk.
Related Issues (20)
- Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory HOT 9
- dsbulk unload stuck when config -maxConcurrentFiles (write concurrency) greater than 1 HOT 1
- DSBulk Java API
- DSBulk dependency on `logback` implementation
- `ClassLoader` aware DSBulk
- `maxRecords` flag does not apply to write operations
- DSBulk count doesn't work on tables with just partition keys
- dsbulk compat with vector type HOT 4
- Loading from AWS S3 large file gives "Required array length is too large" error HOT 2
- Cannot import multiple values in a map<T,T> column using CSV files
- Add support for loading/unloading vector type data HOT 1
- dsbulk doesn't support toUnixTimestamp? HOT 4
- Parsing trouble when a column is called "vector" HOT 6
- Parsing vector data from JSON fails for "floats" with too many digits (aka doubles) HOT 1
- Split when unloading into smaller files
- Escape character when unloading
- DSBulk unload fails to parse map[value] as provided in query HOT 2
- Windows version only works when dsbulk in in short folders
- DSBulk DELETE can not accept any ranges on the clustering column when used within -query
- Allow file input for dsbulk unload
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dsbulk.