Giter Club home page Giter Club logo

xlsximport's Introduction

xlsximport

xlsximport is an SQLite extension that allows direct importing of XLSX Excel 2007 files (a.k.a. Open Office XML)

FEATURES:

The XLSX format is just a glorified ZIP (that I open thanks to zipfile), containing a set of XML files (that I parse thanks to expat). The SQLite shell already imports/exports a variety of text formats. The direct importing of XLSX files removes the need of intermediate XLSX to CSV converters (like cxlsx_to_csv).

DATES BEHAVIOUR:

Notice that Excel stores dates as the number of days that have elapsed since the Excel Epoch (1-January-1900 for Windows Excel, 2-January-1904 for Mac Excel), and this extension imports dates simply as the floating point value they are stored. Be aware the dates in SQLite are stored as Julian days since another Epoch (noon Jan 1st, 4713 BC).

After importing, you can perform by yourself the conversion you need by using simple SQL UPDATE commands:

  UPDATE some_table SET some_date = some_date + 2415018.5; -- Convert from WinExcel to Julian Days
  UPDATE some_table SET some_date = some_date + 2416480.5; -- Convert from MacExcel to Julian Days

USAGE:

.load xlsximport

select xlsximport(FILE, TABLE, SHEET);

--  FILE  input spreadsheet in XLSX Excel 2007 format
--  TABLE name of the table where data is imported
--  SHEET number of the sheet within the workbook (default is first one)

xlsximport():

The xlsximport() function takes two (optionally three) arguments which are: the name of the disk file from which XLSX data is to be read, the name of the SQLite table into which the XLSX data is to be inserted, and (optionally) the number of the sheet being imported.

Example:

.load xlsximport
select xlsximport('C:/work/somedata.xlsx', 'tab1');
select xlsximport('C:/work/somedata.xlsx', 'tab2', 2);

There are two cases to consider: (1) Table tab1 does not previously exist and (2) table tab1 does already exist.

In the first case, when the table does not previously exist, the table is automatically created and the content of the first row of the input XLSX file is used to determine the name of all the columns in the table. In other words, if the table does not previously exist, the first row of the XLSX file is interpreted to be column names and the actual data starts on the second row of the XLSX file.

For the second case, when the table already exists, every row of the XLSX file, including the first row, is assumed to be actual content. If the XLSX file contains an initial row of column labels, that row will be read as data and inserted into the table. To avoid this, make sure that table does not previously exist.

COMPILATION:

cc -shared -lexpat xlsximport.c

MEMORY USAGE:

Be aware the XML data inside your .XLSX file is fully decompressed in memory. The command unzip -l will give you a good estimate of the memory needed. So for example this large 19MB spreadsheet will use temporally about 135MB of memory:

$ ls -ks owid-covid-data.xlsx
19228 owid-covid-data.xlsx

$ unzip -l owid-covid-data.xlsx
Archive:  owid-covid-data.xlsx
  Length      Date    Time    Name
---------  ---------- -----   ----
     1167  01-31-1980 00:00   [Content_Types].xml
      587  01-31-1980 00:00   _rels/.rels
      697  01-31-1980 00:00   xl/_rels/workbook.xml.rels
135734695  01-31-1980 00:00   xl/worksheets/sheet1.xml
      550  01-31-1980 00:00   xl/workbook.xml
    28949  01-31-1980 00:00   xl/sharedStrings.xml
     1341  01-31-1980 00:00   xl/styles.xml
     6994  01-31-1980 00:00   xl/theme/theme1.xml
      592  01-31-1980 00:00   docProps/core.xml
      784  01-31-1980 00:00   docProps/app.xml
---------                     -------
135776356                     10 files

LICENSE:

  • My xlsximport code is Public Domain. Same as SQLite. Notice though that Expat license is MIT.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.