Comments (5)
As you noticed, the ADLCopy tool does a binary copy of the files and since Blob Storage does not align row to extent boundaries, that will not work.
The upcoming refresh that should become available next week should address this issue and make our extractors handle non-aligned boundaries.
Until then you have the following option:
Register your Blob Storage with ADLA (you can do that through the portal by adding a new data source or via a Powershell command).
Then write your extract statement directly against the blob store:
@data = EXTRACT jsondoc string
FROM "wasb://container@account/folder/jsondocuments.txt"
USING Extractors.Text(delimiter:'\r'); // or use your own extractor
Then you can do your processing directly on it, or use an OUTPUT
statement to copy the data into your ADLS account. Note that you will have to currently do this one file at a time.
from usql.
Thanks! I am looking forward to the upcoming release and I will use the wasb workaround in the meantime.
from usql.
Is it really a good solution to fix the extractors instead of the actual problem? Wouldn't it be better to implement an upload option for row structured text files in Adlcopy and have extent bounderies properly aligned with rows for the files stored in ADLS?
How will the new extractors handle non-aligned bounderies? Will they fetch the adjacent block, move it over to the node and complete the fragmented row? That sounds very expensive.
from usql.
This is how the new extractor framework will handle it. it will not fetch all the adjacent data (only 4MB).
Unfortunately, having extent boundaries aligned cannot be guaranteed for all data uploads (eg., when using a WebHDFS call) and thus the extractor framework has to handle it this way at the moment until the file system would give us meta data to tell us if a file is indeed aligned (and currently HDFS does not provide such metadata).
from usql.
Thank you Mike, I understand.
from usql.
Related Issues (20)
- JSON file with duplicate keys
- U-sql referenced assembly built with 4.5.1? Whereas documented 4.5 is needed HOT 11
- Any plans to open source the deployment / PackageDeploymentTool.exe?
- Legitimize the Microsoft.Analytics.Samples.Formats by removing "Samples" and moving out of the Examples folder HOT 1
- Retrieving the Properties Field using the Avro Extractor HOT 2
- Query to extract Key Value pair from Json HOT 1
- How to remove special characters in a column using usql
- Can this be used for outputting the headers from the Table Valued Function
- Update framework version or all .Net core
- Struggling to find a way to extract data between 2 strings
- are there examples showing how to convert csv to arvo? HOT 1
- Unable to install U-SQL Extensions HOT 5
- Local assembly registration
- Visual Studio: Local runs fail HOT 1
- Could not load type 'ScopeRuntime.ScopeDynamicPartitionedOutputCollector' error in local run HOT 2
- Is this project dead? HOT 2
- FAQ is a dead link HOT 1
- U-SQL reference documentation goes to a generic page HOT 1
- flowchart on stackedit
- test
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from usql.