Giter Club home page Giter Club logo

Comments (8)

bxparks avatar bxparks commented on August 22, 2024

Hi, Can you give examples of the formats that you would like to support? And does bq load support them?

from bigquery-schema-generator.

rajib76 avatar rajib76 commented on August 22, 2024

For example, if date is in the format of MM/DD/YYYY or MM-DD-YYYY or MM-DD-YY. Currently bq load only supports ISO format. But I thought if we can handle that in this code and then send the ISO formatted one to bq, it will be a very useful feature. I have a file with date format as
1/27/2015 4:00

from bigquery-schema-generator.

rajib76 avatar rajib76 commented on August 22, 2024

when I say ISO formatted one to bq, it is not only the converated date but also the derived datatype that we will be sending to bq

from bigquery-schema-generator.

bxparks avatar bxparks commented on August 22, 2024

Identifying "MM/DD/YYYY" and other variations as a TIMESTAMP is probably straightforward (I'd have to think about any potential side-effects to be sure). But if bq load is given "4/13/2020" and the schema says that it's a TIMESTAMP, will bq load accept that as a properly formatted TIMESTAMP? If not, then it's not clear to me that making this change to bigquery-schema-generator would help much.

But, if you already have a script that converts your "MM/DD/YYYY" data to ISO 8601, then can't you feed that converted data into bigquery-schema-generator?

Just trying to understand the scope of the problem that you are trying to solve.

from bigquery-schema-generator.

rajib76 avatar rajib76 commented on August 22, 2024

If a data scientist have a csv file with date format other than iso, i wanted the application to deduce the datatype and also convert the date to ISO before sending to load to BQ since BQ handles non ISO date as String. I did not want user to take the pain to convert the date datatype. I thought we can use bigquery-schema-generator to identify any date formats and convert to ISO date. I am not sure if I am able to explain the scenario

from bigquery-schema-generator.

bxparks avatar bxparks commented on August 22, 2024

Even if bigquery-schema-generator is able to understand that the MM/DD/YYYY format is a TIMESTAMP, if you feed that CSV file into bq load, the documentation says that it will reject it. (See https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp_type, although I have not actually verified that the documentation is correct, so I recommend that you actually try it with one of your sample CSV files to be sure.) If bq load does not accept that format, then you are going to have to create a script to convert that format into ISO format anyway.

If you are requesting that bigquery-schema-generator should perform that data conversion, I would decline because I don't think this script should be in the business of doing data transformations. There are simply too many variations of how data can be represented in real life, and how it should be cleaned up before being imported into BigQuery. To me, doing that work is part of the job of the data scientist. For example, if the CSV data actually does contain just "MM/DD/YYYY", then the data scientist needs to figure out the timezone of that date value, and what the time component should be (e.g. "00:00" or "12:00"). bigquery-schema-generator cannot do that.

from bigquery-schema-generator.

rajib76 avatar rajib76 commented on August 22, 2024

Thanks Brian, I agree with your point. I will take care of this in my program before calling bigquery-schema-generator. I will close this.

from bigquery-schema-generator.

bxparks avatar bxparks commented on August 22, 2024

Ok, good luck with your work!

from bigquery-schema-generator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.