Giter Club home page Giter Club logo

Comments (4)

adutra avatar adutra commented on August 11, 2024

Hi, that could be a nice enhancement indeed. But just curious: if there is an error, how would you investigate the cause if you can't see the log files?

from dsbulk.

bobh66 avatar bobh66 commented on August 11, 2024

Ideally the error information from mapping-errors.log would print to stderr as logged ERROR messages so that the consuming program can redirect it to it's own logging as needed. The content of operations.log should go to stdout for the same reason (maybe it already does).

I'm also seeing empty files left behind in the "home" directory with the names of the tables that are being loaded. It's not a huge problem since they are empty, but it seems like the program should clean them up?

Thanks

from dsbulk.

adutra avatar adutra commented on August 11, 2024

Ideally the error information from mapping-errors.log would print to stderr as logged ERROR messages so that the consuming program can redirect it to it's own logging as needed.

That would be an option, but DSBulk creates many similar files for different kinds of errors. It would be a bit challenging to redirect everything to stderr (garbled contents).

I'm also seeing empty files left behind in the "home" directory with the names of the tables that are being loaded.

Now that's a first. Could you please give me a simple reproduction case? This is definitely not normal, DSBulk should not write to the home directory at all.

from dsbulk.

bobh66 avatar bobh66 commented on August 11, 2024

I haven't been able to reproduce the empty files issue so it may have been related to some intermittent problems I was having with the process getting killed by out of memory errors. Now that I have that resolved I'm not seeing any files left behind.

One unrelated question - the project description mentions "2-4x faster" than other bulk tools, is there any way to know what that should translate into in real numbers? I'm seeing between 1000-2000 rows/sec and I don't know if that's slow or fast? I imagine it's related to my Cassandra cluster performance.

Thanks

from dsbulk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.