Giter Club home page Giter Club logo

Comments (5)

MikeRys avatar MikeRys commented on July 30, 2024

As you noticed, the ADLCopy tool does a binary copy of the files and since Blob Storage does not align row to extent boundaries, that will not work.

The upcoming refresh that should become available next week should address this issue and make our extractors handle non-aligned boundaries.

Until then you have the following option:

Register your Blob Storage with ADLA (you can do that through the portal by adding a new data source or via a Powershell command).

Then write your extract statement directly against the blob store:

@data = EXTRACT jsondoc string 
               FROM "wasb://container@account/folder/jsondocuments.txt"
               USING Extractors.Text(delimiter:'\r'); // or use your own extractor

Then you can do your processing directly on it, or use an OUTPUT statement to copy the data into your ADLS account. Note that you will have to currently do this one file at a time.

from usql.

 avatar commented on July 30, 2024

Thanks! I am looking forward to the upcoming release and I will use the wasb workaround in the meantime.

from usql.

 avatar commented on July 30, 2024

Is it really a good solution to fix the extractors instead of the actual problem? Wouldn't it be better to implement an upload option for row structured text files in Adlcopy and have extent bounderies properly aligned with rows for the files stored in ADLS?
How will the new extractors handle non-aligned bounderies? Will they fetch the adjacent block, move it over to the node and complete the fragmented row? That sounds very expensive.

from usql.

MikeRys avatar MikeRys commented on July 30, 2024

This is how the new extractor framework will handle it. it will not fetch all the adjacent data (only 4MB).

Unfortunately, having extent boundaries aligned cannot be guaranteed for all data uploads (eg., when using a WebHDFS call) and thus the extractor framework has to handle it this way at the moment until the file system would give us meta data to tell us if a file is indeed aligned (and currently HDFS does not provide such metadata).

from usql.

 avatar commented on July 30, 2024

Thank you Mike, I understand.

from usql.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.