Giter Club home page Giter Club logo

Comments (3)

rfdavid avatar rfdavid commented on September 25, 2024

Also, same issue on shortest-path-tests dataset:

(lldb) pro la -t
There is a running process, kill it and restart?: [Y/n] Y
Process 70589 exited with status = 9 (0x00000009)
Process 70804 launched: '/Users/rfdavid/Devel/waterloo/kuzu/build/debug/tools/shell/kuzu_shell' (arm64)
Process 70804 stopped
* thread #7, stop reason = EXC_BAD_ACCESS (code=1, address=0x3)
    frame #0: 0x0000000100831658 kuzu_shell`kuzu::storage::TableCopyExecutor::getListElementPos(l="[10,5][12,8][4,5][1,9][2][3,4,5,6,7][1][10,11,12,3,4,5,6,7][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5"..., from=1, to=4, copyDescription=0x0000600003504a70) at table_copy_executor.cpp:195:54
   180 	    const std::string& filePath) {
   181 	    std::shared_ptr<arrow::io::ReadableFile> infile;
   182 	    throwCopyExceptionIfNotOK(arrow::io::ReadableFile::Open(filePath).Value(&infile));
   183 	    std::unique_ptr<parquet::arrow::FileReader> reader;
   184 	    throwCopyExceptionIfNotOK(
   185 	        parquet::arrow::OpenFile(infile, arrow::default_memory_pool(), &reader));
   186 	    return reader;
   187 	}
   188
   189 	std::vector<std::pair<int64_t, int64_t>> TableCopyExecutor::getListElementPos(
   190 	    const std::string& l, int64_t from, int64_t to, const CopyDescription& copyDescription) {
   191 	    std::vector<std::pair<int64_t, int64_t>> split;
   192 	    int bracket = 0;
   193 	    int64_t last = from;
   194 	    for (int64_t i = from; i <= to; i++) {
-> 195 	        if (l[i] == copyDescription.csvReaderConfig->listBeginChar) {
   196 	            bracket += 1;
   197 	        } else if (l[i] == copyDescription.csvReaderConfig->listEndChar) {
   198 	            bracket -= 1;
   199 	        } else if (bracket == 0 && l[i] == copyDescription.csvReaderConfig->delimiter) {
   200 	            split.emplace_back(last, i - last);
   201 	            last = i + 1;
   202 	        }
   203 	    }
   204 	    split.emplace_back(last, to - last + 1);
   205 	    return split;
   206 	}
   207
   208 	std::unique_ptr<Value> TableCopyExecutor::getArrowVarList(const std::string& l, int64_t from,
   209 	    int64_t to, const LogicalType& dataType, const CopyDescription& copyDescription) {
   210 	    assert(dataType.getLogicalTypeID() == common::LogicalTypeID::VAR_LIST);
Target 0: (

from kuzu.

mewim avatar mewim commented on September 25, 2024

Looks like the csv_to_parquet loader blindly converts every column to string column, and it doesn't skip the header.
We should consider rewriting a python script to do the csv->parquet conversion

from kuzu.

rfdavid avatar rfdavid commented on September 25, 2024

You have to manually specify whether the CSV file has a header. I changed the script on my PR to make it more clear:
https://github.com/kuzudb/kuzu/blob/1286c7b6bd10f8f41b23734ef108d295ea9e3593/scripts/parquet/csv_to_parquet.py

Also, I'm having the same issue when converting using this code:
https://github.com/kuzudb/kuzu/blob/1286c7b6bd10f8f41b23734ef108d295ea9e3593/test/test_runner/csv_to_parquet_converter.cpp:20 arrow::Status CSVToParquetConverter::RunCSVToParquetConversion

from kuzu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.