Giter Club home page Giter Club logo

Comments (4)

msmygit avatar msmygit commented on August 11, 2024

Here is the testing that I did for this ticket.

Table Definition

CREATE TABLE test.dsbulk (
    pk int PRIMARY KEY,
    c1 text
);

Input CSV file contents. Notice 1st one is quoted and 2nd one isn't

$ cat dsbulk.csv
pk,c1
1,'First line.\nSecond line.'
2,First line.\nSecond line.

DSBulk version

$ ./dsbulk --version
DataStax Bulk Loader v1.7.0

DSBulk execution command and output

$ ./dsbulk load -k test -t dsbulk -header true -url ./dsbulk.csv 
Operation directory: /Users/madhavan.sridharan/Data/Tools/DSBulk/dsbulk-1.7.0/bin/logs/LOAD_20201014-143653-997467
total | failed | rows/s | p50ms | p99ms | p999ms | batches
    2 |      0 |      7 |  5.89 |  8.59 |   8.59 |    1.00
Operation LOAD_20201014-143653-997467 completed successfully in less than one second.
Last processed positions can be found in positions.txt

Result from CQLSH

cqlsh:test> select * from test.dsbulk ;

 pk | c1
----+-----------------------------
  1 | 'First line.\nSecond line.'
  2 |   First line.\nSecond line.

(2 rows)

DSBulk unload operation activity

$ ./dsbulk unload -k test -t dsbulk -header true -url ./dsbulk_unload
Operation directory: /Users/madhavan.sridharan/Data/Tools/DSBulk/dsbulk-1.7.0/bin/logs/UNLOAD_20201014-144238-583471
total | failed | rows/s | p50ms | p99ms | p999ms
    2 |      0 |      4 |  7.62 | 12.98 |  12.98
Operation UNLOAD_20201014-144238-583471 completed successfully in less than one second.

Output of the unloaded csv files

$ ls dsbulk_unload/output-00000
output-000001.csv  output-000002.csv

$ cat dsbulk_unload/output-00000*.csv
pk,c1
2,First line.\nSecond line.
pk,c1
1,'First line.\nSecond line.'

Java Program using Java Driver 4.9.0

I tried to insert records with DataStax Java Driver 4.9.0 using the below code and still couldn't repro this behavior,

import java.net.InetSocketAddress;

import com.datastax.oss.driver.api.core.CqlSession;
import com.datastax.oss.driver.api.core.cql.ResultSet;

public class DAT617 {
    public static void main(String... args) {
	try (CqlSession session = CqlSession.builder().addContactPoint(new InetSocketAddress("localhost", 9042))
		.withLocalDatacenter("dc1").build()) { // Change values to match your environment
	    ResultSet rs = session.execute("insert into test.dsbulk(pk,c1) values (3,'First line.\\nSecond line.')");
	}
    }
}

CQLSH output of the 3rd record inserted into the test table

cqlsh:test> select * from test.dsbulk ;

 pk | c1
----+-----------------------------
  1 | 'First line.\nSecond line.'
  2 |   First line.\nSecond line.
  3 |   First line.\nSecond line.

(3 rows)

DSBulk unload command run and the output csv file contents

$ ./dsbulk unload -k test -t dsbulk -header true -url ./dsbulk_unload
Operation directory: /Users/madhavan.sridharan/Data/Tools/DSBulk/dsbulk-1.7.0/bin/logs/UNLOAD_20201014-150212-234300
total | failed | rows/s | p50ms | p99ms | p999ms
    3 |      0 |      9 |  1.41 |  1.49 |   1.49
Operation UNLOAD_20201014-150212-234300 completed successfully in less than one second.

and the output of the unloaded csv files are below,

$ cat dsbulk_unload/output-00000*.csv
pk,c1
1,'First line.\nSecond line.'
pk,c1
3,First line.\nSecond line.
pk,c1
2,First line.\nSecond line.

FWIW, I tested this against DSE 6.7.10. I don't think DSBulk is doing it wrong in here. Do we've a full minimal reproducible example to reproduce this behavior?

from dsbulk.

adutra avatar adutra commented on August 11, 2024

This has been discussed internally and it appears there was just some misunderstanding about how newline characters are processed by CQLSH and DSBulk. All is good now.

from dsbulk.

ibeatmybrothers avatar ibeatmybrothers commented on August 11, 2024

Hi @adutra I see this issue still present in version 1.7 and 1.8 running against a 5.1.14 cluster. Cqlsh COPY TO command will copy the exact contents, but the dsbulk unload command will interpret or omit escaped characters, such as \0xf and \r\n when present in a text column.

from dsbulk.

adutra avatar adutra commented on August 11, 2024

Hi @ibeatmybrothers do you have a reproducer? Because we discussed this internally and I wasn't able to see any defect in how DSBulk exports such characters. Thanks!

(Note: the example reported above by @msmygit did not contain actual line breaks, but only escaped \n sequences. These are not converted by DSBulk in any way)

from dsbulk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.