Giter Club home page Giter Club logo

airwho-loader's Introduction

Overview

In 2007 I was trying to research the history of some aircraft and I realized that, while current registration information was available from the FAA, history was not. I briefly considered ways of time-travelling into the past to get this information, until I realized it would be much easier to time-travel into the future.

I set up a very small cron job to download weekly the list of registered US aircraft from the FAA. After a few years, FAA changed to daily updates and I eventually rescheduled the cron job.

It has been running steadily since then and has downloaded over 1300 copies of the database over almost twelve years. Recently I revisited this project. Perhaps now is a good time to do something with this data.

Loading the downloaded data into tables

There are seven different files in each of the downloaded .zip files. Initially I would just like to do something with the "MASTER" file, which contains one row for each US registered aircraft. I would like to construct two structures from this set of files

  • a table of the current value for each active registration: owner, type, manufacturer, model, etc.;
  • a table containing details of all of the changes to each record - new records added, changes to exisiting records (new owner, etc.), and deleted registrations.

Aircraft registration on FAA Website

load

I wrote a simple Perl program to unzip the current file and set up a MySQL "load data infile" statement. This works fine for the current file.

load_history

To load the changes occuring between each of the 1300 files, we need something a little robust. Over the last twelve years, there have been many anomolies involving these files - e.g., some of the files that we have downloaded have just been error messages; at several points the FAA has altered the structure of the file by adding fields, etc.

I wrote a quick version of this last week and ran it on the 1300 files. It loaded about 90 million change records into the MySQL table over several days, but as I looked at the data I realized that there were some problems - mostly duplicate records - caused mostly by dirty data.

I decided to rework load_history and make it more robust and also something that could be scheduled on a cron job - if it's called daily it will refresh the master_changes table with new information, but if it skips a few days or more, it will automatically do the right thing and ensure master_changes is updated and correct.

download

Gets the file from the FAA.

[email protected]
# m h  dom mon dow   command
15   7   *   *   1-5   /var/aircraft/bin/download

airwho-loader's People

Contributors

ratsbane avatar

Watchers

 avatar  avatar

airwho-loader's Issues

Make smtpd connections trusted

Note that this is part of a broader effort to stop outbound emails from being caught by spam filters.

Postfix log shows, for outbound emails: "Untrusted TLS connection established"

Mar 12 16:02:13 airwho postfix/smtp[20284]: Untrusted TLS connection established to hilton-com.mail.protection.outlook.com[104.47.37.36]:25: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)

"load" after "download" is failing when triggered by cron

copy_and_unzip_file_into_temp: Loading /var/aircraft/2019/2019-03-13_07_15_19 into temp as 2019-03-13.zip
unzip /var/aircraft/temp/2019-03-13.zip -d /var/aircraft/temp

End-of-central-directory signature not found. Either this file is not .
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /var/aircraft/temp/2019-03-13.zip or
/var/aircraft/temp/2019-03-13.zip.zip, and cannot find /var/aircraft/temp/2019-03-13.zip.ZIP, period.

load ACFTREF file stops before end of file

running "load {date}" does correctly load all of the rows in the ENGINES and MASTER files, but ACFTREF stops at 23,631 records. There are 64,744 records in the 2007-05-23 copy of this file and 85,562 in the most recent 2019-03-06 file.

I've tried:

  • running the LOAD DATA INFILE statement directly in the MySQL client. That only produces two warnings and neither is significant.
  • Running this with the oldest and newest files that we have and both do the same thing.
  • Putting a character set clause in the load data infile statement, with various character sets, e.g. utf8, ascii, with no change
  • Removing the trailing commas from the lines in the file - no change
  • Searching the file for characters above ASCII 127 - none found
  • Adding "optionally" to the "enclosed by '"'" clause - no difference

The last row in the file, beginning with code 99999XX, is successfully loaded into the table.
On spot-checking the input file in various places, there is no regular pattern of missing rows (e.g., every other row or every third row.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.