Giter Club home page Giter Club logo

haversine's Introduction

I have recently enrolled to Introduction to Data Science. One of the very first assignments was Twitter sentinent analysis performed in Python. Leaving a whole lot aside, what captured my attention was a requirement to resolve tweets' geocoded locations WITHOUT relying on 3rd party services.

The assignment paper suggested to use a Python Dictionary of State Abbreviations. That proved helpful indeed. I have decided to combine this resource with Average Latitude and Longitude for US States and ended up with a single dictionary containing all essential information, i.e. state codes, names and coordinates:

{
  'AK': {'name':'Alaska','coords':[61.3850,-152.2683]},
  'AL': {'name':'Alabama','coords':[32.7990,-86.8073]},
  'AR': {'name':'Arkansas','coords':[34.9513,-92.3809]},
  'AS': {'name':'American Samoa','coords':[14.2417,-170.7197]},
  'AZ': {'name':'Arizona','coords':[33.7712,-111.3877]},
  'CA': {'name':'California','coords':[36.1700,-119.7462]},
  'CO': {'name':'Colorado','coords':[39.0646,-105.3272]},
  'CT': {'name':'Connecticut','coords':[41.5834,-72.7622]},
  'DC': {'name':'District of Columbia','coords':[38.8964,-77.0262]},
  'DE': {'name':'Delaware','coords':[39.3498,-75.5148]},
  'FL': {'name':'Florida','coords':[27.8333,-81.7170]},
  'GA': {'name':'Georgia','coords':[32.9866,-83.6487]},
  'HI': {'name':'Hawaii','coords':[21.1098,-157.5311]},
  'IA': {'name':'Iowa','coords':[42.0046,-93.2140]},
  ..

A complete dictionary is to be found in us_states.py.

Having all the relevant information in place, I was looking for a feasible way of associating the tweets with the list of US states. Turns out that Haversine formula is one of the most popular methods for calculating distance between two pairs of coordinates.

My implementation of the Haversine formula merely mirrors a Python example at platoscave.net, here is the result (see us_states.py for full details):

def haversine(self, origin, destination):
  # two pairs of latitude and longitude, i.e. origin vs destination
  lat1, lon1 = origin
  lat2, lon2 = destination

  # deltas between origin and destination coordinates
  dlat = math.radians(lat2-lat1)
  dlon = math.radians(lon2-lon1)

  # a central angle between the two points
  a = math.sin(dlat/2) * math.sin(dlat/2) + math.cos(math.radians(lat1)) \
      * math.cos(math.radians(lat2)) * math.sin(dlon/2) * math.sin(dlon/2)

  # the determinative angle of the triangle on the surface of the sphere (Earth) 
  c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))

  # a spherical distance between the two points, i.e. hills etc are not considered 
  return self.R * c 

The algorithm above is the core of my custom search method, which simply picks up the state which closely matches the provided coordinates (a minimum distance). To eliminate non-US countries, I have set a hard limit of 500 km as a maximum distance between the provided coordinates and the average coordinates of any of the states. This leaves me with a nice and handy feature:

def main():
  us_states = USStates()
  
  # Sacramento, California - prints CA
  print us_states.by_coords(38.3454, -121.2935)
  
  # Austin, Texas - prints TX
  print us_states.by_coords(30.25, -97.75)
  
  # New Delhi, India - yields no results 
  # as the minimum calculated distance is well over 13.000 km
  print us_states.by_coords(28.6139, 77.2089)

One last note, the coordinates comprise latitude and longitude using the convention of a signed decimal degrees without compass direction. Negative numbers represent south or west, examples:

#   latitudes:
#   30° 45´ 50´´N -> 30.4550
#   28° 61´ 39´´S -> -28.6139
#
#   longitudes:
#   77° 20´ 89´´E -> 77.2089
#   30° 45´ 50´´W -> -30.4550

us_states.py contains the full implementation, whereas us_states_test.py are unit tests covering the main scenarios as well as some edge cases.

haversine's People

Contributors

zezutom avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.