Giter Club home page Giter Club logo

tweetloc's Introduction

Introduction

This code is used to predict geographical locations for tweets using their text. The method determines different settings of kernel functions for every term in tweets based on the location indicativeness of these terms. It also identifies bigrams with significant spatial patterns in training set and extends the feature space using these bigrams. The code is an implementation of LocKDE-SCoP published in:

O. Ozdikis, H. Ramampiaro and K. Nørvåg, (2019), Locality-Adapted Kernel Densities of Term Co-occurrences for Location Prediction of Tweets. In Information Processing & Management (IPM). (Accepted for publication)

Dependencies

The code was tested with Python 2.7 and libraries below:

  • geopy (v1.11.0)
  • numpy (v1.10.2)
  • scipy (v0.19.1)
  • shapely (v1.5.13)

Input files

The paths to input files should be defined in data.__init__.py.

  • Grid definition: The grid that divides the region of interest into smaller grid cells should be defined in a file at data.grid_file. Each line in this file is expected to represent a grid cell with an id and coordinates of its South-West and North-East corners. Example: For a 2x2 grid spanning the region bounded by (10.0, 10.0) in South-West and (20.0, 20.0) in North-East, the content of grid_file should contain the following four lines:
0	10.0	10.0	15.0	15.0
1	15.0	10.0	20.0	15.0
2	10.0	15.0	15.0	20.0
3	15.0	15.0	20.0	20.0

see visual

  • Training data: Training data should be provided in a file located at data.training_file. Each line in this file is expected to represent a tweet text and its location. First column represents the grid cell corresponding to the latitude (second column) and longitude (third column) of the tweet location. The last column contains the tokens in tweet text separated by space. Example:
1	16.0	11.0	lorem ipsum dolor sit amet
2	12.0	17.0	consectetur adipiscing elit
0	12.0	12.0	sed do eiusmod tempor incididunt ut labore et dolore magna aliqua
  • Test data: It has the same structure as the training file. The lines in this file are used to test the classifier that is trained using the training file.

Running the program

  • Step 1: Set paths of input files in data.__init__.py

  • Step 2: Run cooc.main_cooc in Python. It finds the bigrams in training data with attraction and repulsion patterns, and writes them to data.kscore_analysis_file.

  • Step 3: Run prediction.main_prediction in Python. It predicts locations for tweet texts in test_file, and prints the median of error distances between the estimated coordinates and the expected coordinates according to the ground truths in test file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.