Comments (3)
Also, same issue on shortest-path-tests
dataset:
(lldb) pro la -t
There is a running process, kill it and restart?: [Y/n] Y
Process 70589 exited with status = 9 (0x00000009)
Process 70804 launched: '/Users/rfdavid/Devel/waterloo/kuzu/build/debug/tools/shell/kuzu_shell' (arm64)
Process 70804 stopped
* thread #7, stop reason = EXC_BAD_ACCESS (code=1, address=0x3)
frame #0: 0x0000000100831658 kuzu_shell`kuzu::storage::TableCopyExecutor::getListElementPos(l="[10,5][12,8][4,5][1,9][2][3,4,5,6,7][1][10,11,12,3,4,5,6,7][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5][10,5"..., from=1, to=4, copyDescription=0x0000600003504a70) at table_copy_executor.cpp:195:54
180 const std::string& filePath) {
181 std::shared_ptr<arrow::io::ReadableFile> infile;
182 throwCopyExceptionIfNotOK(arrow::io::ReadableFile::Open(filePath).Value(&infile));
183 std::unique_ptr<parquet::arrow::FileReader> reader;
184 throwCopyExceptionIfNotOK(
185 parquet::arrow::OpenFile(infile, arrow::default_memory_pool(), &reader));
186 return reader;
187 }
188
189 std::vector<std::pair<int64_t, int64_t>> TableCopyExecutor::getListElementPos(
190 const std::string& l, int64_t from, int64_t to, const CopyDescription& copyDescription) {
191 std::vector<std::pair<int64_t, int64_t>> split;
192 int bracket = 0;
193 int64_t last = from;
194 for (int64_t i = from; i <= to; i++) {
-> 195 if (l[i] == copyDescription.csvReaderConfig->listBeginChar) {
196 bracket += 1;
197 } else if (l[i] == copyDescription.csvReaderConfig->listEndChar) {
198 bracket -= 1;
199 } else if (bracket == 0 && l[i] == copyDescription.csvReaderConfig->delimiter) {
200 split.emplace_back(last, i - last);
201 last = i + 1;
202 }
203 }
204 split.emplace_back(last, to - last + 1);
205 return split;
206 }
207
208 std::unique_ptr<Value> TableCopyExecutor::getArrowVarList(const std::string& l, int64_t from,
209 int64_t to, const LogicalType& dataType, const CopyDescription& copyDescription) {
210 assert(dataType.getLogicalTypeID() == common::LogicalTypeID::VAR_LIST);
Target 0: (
from kuzu.
Looks like the csv_to_parquet loader blindly converts every column to string column, and it doesn't skip the header.
We should consider rewriting a python script to do the csv->parquet conversion
from kuzu.
You have to manually specify whether the CSV file has a header. I changed the script on my PR to make it more clear:
https://github.com/kuzudb/kuzu/blob/1286c7b6bd10f8f41b23734ef108d295ea9e3593/scripts/parquet/csv_to_parquet.py
Also, I'm having the same issue when converting using this code:
https://github.com/kuzudb/kuzu/blob/1286c7b6bd10f8f41b23734ef108d295ea9e3593/test/test_runner/csv_to_parquet_converter.cpp:20 arrow::Status CSVToParquetConverter::RunCSVToParquetConversion
from kuzu.
Related Issues (20)
- Progress Bar causing occasional failed assertions in serial CSV reader
- Bug: Unexpected error message when copying from csv files with mismatched num of columns
- Binding is too restrictive when it comes to properties of node/rel variables in WITH
- Rel table connection info should be part of table info
- Bug: SNB IC1 binding error with "Cannot evaluate expression with type PROPERTY" HOT 1
- Feature: Support expressions as minimum and the maximum number of hops
- Feature: Support CALL with subquery of UNION
- Bug: Unexpected error "unordered_map::at: key not found"
- Pyarrow Performance Problem HOT 4
- CI: Add a workflow to automatically create an issue on the docs repo from a PR in this repo HOT 1
- Only scan + update vectors with updates during node group checkpoint
- Add support for DECIMAL data type for C API
- Add support for MAP data type for C API
- Add write support for LIST and STRUCT types for C API
- Create static lib with all dependencies bundled HOT 2
- Optimization: optimize shell to only read from catalog when there are changes
- Bug: Version compatibility of IMPORT/EXPORT DATABASE
- Bug: Struct casting issues
- Backslashes are poorly supported in paths on Windows
- Optimization: Improve result collector performance
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kuzu.