A data set of UFO sighting reports was collected from The National UFO Reporting Center Online Database; you can see an example report here. The data, downloadable as a zipped json
file here, contains the URL, raw HTML for that URL, and the time it was scraped for each report in the database along with the remnants of a database id.
We wanted to conduct unsupervised machine learning methods on a large, messy UFO data set.
If the majority of these are genuine observations of uncommon things in sky, how could this data be used in a productive way? Identifying secret missile tests? Identifying meteor showers? How is our data clustered? Are sightings made along an established flight path, or only in specific locations? How much is this data tied to total population in areas where the sightings are clustered? Does higher population partially explain more sightings? Are there spikes in sightings around holidays or summer vacations, when more people are in less populated places?
See our full presentation here. https://docs.google.com/presentation/d/16703jWy2XTHHeAM33iFSTdUNl_418pNCwYmfKfajo7w/edit#slide=id.p