Comments (5)
@sharpe5 Hi sharpe5, can you provide me with sample parquet files in LZ4 or other compression codec. I need them for testing usage.
from bigdata-file-viewer.
Good point, marked your comment as an enhancement. Thanks for your contribution.
from bigdata-file-viewer.
Here you go:
type=blockStream,rowCount=1000,compression=LZ4.zip
GitHub accepts .zip files, so unzip the .parquet file. There should be 6 columns of random doubles, a few thousand rows.
Anything else, let me know!
from bigdata-file-viewer.
C++ code to create said file (missing functions; demo only). Arrow Parquet library was installed using vcpkg. Compiles with MSVC and gcc.
void demo3()
{
using namespace std;
using namespace fmt;
using namespace System::Diagnostics;
print("Demo 3: Open a file, flush blocks of rows to it until done:\n");
{
print(" - Test:\n");
double r1 { drand() };
print(" - r1={}\n", r1);
}
//const int maxRows = 1'000'000;
const int maxRows = 500;
vector<tuple<double, double, double, double, double, double>> rows;
{
rows.reserve(maxRows);
print(" - Creating raw data:\n");
Stopwatch sw = Stopwatch::StartNew();
for (int i=0;i<maxRows;i++)
{
rows.push_back({drand(), drand(), drand(), drand(), drand(), drand()});
}
sw.Stop();
print(" - rows.size(): {}\n", rows.size());
print(" - Done: {} milliseconds\n", sw.Elapsed().TotalMilliseconds());
}
shared_ptr<arrow::Table> arrowTable;
{
const vector<string> names ={"col1", "col2", "col3", "col4", "col5", "col6"};
print(" - Creating Parquet table:\n");
Stopwatch sw = Stopwatch::StartNew();
if (!arrow::stl::TableFromTupleRange(arrow::default_memory_pool(), rows, names, &arrowTable).ok())
{
// Error handling code should go here.
print(" - Error when creating table.\n");
return;
}
sw.Stop();
print(" - Done: {} milliseconds\n", sw.Elapsed().TotalMilliseconds());
}
string filepath;
{
std::shared_ptr<arrow::io::FileOutputStream> outfile;
const string filename=format("type=blockStream,rowCount={},compression=LZ4.parquet",maxRows * 2); // As we are writing two chunks (see below).
print(" - Write Parquet table:\n");
Stopwatch sw = Stopwatch::StartNew();
PARQUET_ASSIGN_OR_THROW(outfile,arrow::io::FileOutputStream::Open(filename));
parquet::WriterProperties::Builder propertiesBuilder;
propertiesBuilder.compression(parquet::Compression::LZ4);
const auto properties = propertiesBuilder.build();
// https://stackoverflow.com/questions/45572962/how-can-i-write-streaming-row-oriented-data-using-parquet-cpp-without-buffering
auto arrow_output_stream = arrow::io::FileOutputStream::Open(filename, false);
std::unique_ptr<parquet::arrow::FileWriter> writer;
parquet::arrow::FileWriter::Open(*(arrowTable->schema()), ::arrow::default_memory_pool(), *arrow_output_stream, properties, parquet::default_arrow_writer_properties(), &writer);
const int chunkSize = static_cast<int>(rows.size());
writer->WriteTable(*arrowTable, chunkSize);
// Demonstrates writing data in blocks.
writer->WriteTable(*arrowTable, chunkSize);
writer->Close();
print(" - Compression: LZ4\n");
print(" - Block size: {}\n", chunkSize);
print(" - Done: {} milliseconds\n", sw.Elapsed().TotalMilliseconds());
const string dir = System::IO::Directory::GetCurrentDirectoryAlt();
filepath = Path::Combine(dir, filename);
}
{
print(" - Output file: {}\n", filepath);
}
}
from bigdata-file-viewer.
Close the issue since it's over years, will reopen the feature is in the roadmap.
from bigdata-file-viewer.
Related Issues (20)
- Exception in thread "main" HOT 1
- Instead hardcoded "\\" HOT 2
- CSV delimiter. HOT 2
- The file is loaded into database redundantly HOT 1
- The column with String type's aggregation value is also calculated. HOT 1
- The NULL is returned as the max value. HOT 1
- Add pagination, do not load everything into memory HOT 2
- illegal character in column name HOT 3
- Some Types not implemented HOT 1
- Facing this exception Exception Can't redefine: _REDACTED HOT 4
- java.lang.NoClassDefFoundError: javafx/application/Application HOT 2
- How to view aggregates/charts? HOT 4
- Request custom schema-metadata HOT 2
- Error while loading Parquet local file HOT 1
- JavaFX Problem running on macOS Monterey with OpenJDK HOT 5
- WINDOWS EXCEPTION Error: A JNI error has occurred, please check your installation and try again HOT 1
- Releases are unusable HOT 4
- How to filter and export? HOT 1
- Exception when reading parquet file HOT 5
- Needs handling of windows 8.3 dos short paths HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bigdata-file-viewer.