Giter Club home page Giter Club logo

geolocation's Introduction

geolocation

Geolocation prediction for a given Tweet, or a short text. The system trains a neural net, as described in

Philippe Thomas and Leonhard Hennig (2017), "Twitter Geolocation Prediction using Neural Networks." In Proceedings of GSCL

Performance

This section briefly provides some information about the performance of our method. We removed the original model and only provide information about the new model.

Model Acc Median Mean Acc Median Mean
Location 0.366 203.9 4514.1 0.448 41.7 3821.0
Text 0.201 1834.8 4320.1 0.330 213.9 2441.7
Description 0.096 3335.7 5837.4 0.121 2800.3 5491.0
User-name 0.060 3852.3 5909.3 0.058 4154.9 6131.7
Timezone 0.057 5280.1 5554.8 0.061 5489.9 5481.4
User-lang 0.061 6465.1 7306.9 0.047 8903.7 8523.1
Links 0.033 7606.3 6984.0 0.045 6734.4 6571.7
UTC 0.045 5387.4 5570.0 0.048 5365.8 5412.7
Source 0.045 8026.4 7539.6 0.045 6903.5 6901.1
Tweet-time 0.028 8442.5 7694.4 0.024 11720.6 10275.5
Full-fixed 0.442 43.6 1151.0 0.530 14.1 771.2
Baseline 0.028 11,723.0 10,264.3 0.024 11,771.5 10,584.4

Usage

Download

Source code from this repository has been published here (https://github.com/Erechtheus/geolocation/releases).

  • Version 2.3 uses keras functional API (instead of Keras sequential API). This code runs with Keras Version 2, whereas the original release worked with Keras Version 1. Also has some minor improvement regarding preprocessing and has a REST-API.

Local installation (python)

This section briefly explains the steps to download the source code, installs python dependencies in Anaconda, downloads the models and processors and performs text classification for one text example. Download model and preprocessor relevantData.tar.lzma.

git clone https://github.com/Erechtheus/geolocation.git
cd geolocation/
conda create --name geoloc  --file requirements.txt
conda activate geoloc

tar xfva relevantData.tar.lzma

python predictText.py

Local installation (docker image)

We provide a docker image of our code using functional API and a REST Service

unlzma geolocationV2.3.tar.lzma
docker load --input geolocationV2.3.tar
docker run -d -p   5000:5000 --name geolocation --network host  geoloc:latest

Alternatively, you can download the model from docker hub.

docker pull erechtheus79/geolocation
docker run -d -p   5000:8080 --name geolocation --network host  erechtheus79/geolocation

Access the simple text model using the URL and it returns

{
    "query": "Montmartre is truly beautiful",
    "results": [
        {
            "city": "paris-a875-fr",
            "lat": 48.857779087136095,
            "lon": 2.3539118329464914,
            "score": 0.18563927710056305
        },
        {
            "city": "city of london-enggla-gb",
            "lat": 51.50090096289424,
            "lon": -0.09162320754762229,
            "score": 0.04953022673726082
        },
        {
            "city": "boulogne billancourt-a892-fr",
            "lat": 48.82956285864007,
            "lon": 2.2603947479966044,
            "score": 0.04159574210643768
        },
        {
            "city": "saint denis-a893-fr",
            "lat": 48.947253923722585,
            "lon": 2.4314893304822607,
            "score": 0.02842172235250473
        },
        {
            "city": "argenteuil-a895-fr",
            "lat": 48.97509961545753,
            "lon": 2.1906891017164387,
            "score": 0.021229125559329987
        }
    ]
}

Example usage to predict location of a text snippet:

The code below briefly describes how to use our neural network, trained on text only. For other examples (e.g., using Twitter text and metadata), we refer to the examples in the two evaluation scripts

from keras.models import load_model
import pickle
from keras.preprocessing.sequence import pad_sequences
import numpy as np


#Load Model
textBranch = load_model('data/models/textBranchNorm.h5')

#Load tokenizers, and mapping
file = open("data/binaries/processors.obj",'rb')
descriptionTokenizer, domainEncoder, tldEncoder, locationTokenizer, sourceEncoder, textTokenizer, nameTokenizer, timeZoneTokenizer, utcEncoder, langEncoder, placeMedian, colnames, classEncoder  = pickle.load(file)

#Predict text (e.g., 'Montmartre is truly beautiful')
testTexts=[];
testTexts.append("Montmartre is truly beautiful")

textSequences = textTokenizer.texts_to_sequences(testTexts)
textSequences = np.asarray(textSequences)
textSequences = pad_sequences(textSequences)

predict = textBranch.predict(textSequences)

#Print the top 5
for index in reversed(predict.argsort()[0][-5:]):
    print("%s with score=%.3f" % (colnames[index], float(predict[0][index])) )

The output is (scores might vary between different model versions):

paris-a875-fr with score=0.413
boulogne billancourt-a892-fr with score=0.070
saint denis-a893-fr with score=0.058
creteil-a894-fr with score=0.029
argenteuil-a895-fr with score=0.026

Train and apply models

To train models, training data (tweets and gold labels) needs to be retrieved. As Tweets can not be shared directly, we refer to the WNUT'16 workshop page for further information.

After retrieving the training files, the preprocess script converts tweets into the desired representation to train a neural network. Models can be trained from scratch using the trainindividual script. Pretrained models and preprocessors (e.g., used tokenizer) are available here. The evaluation of models is implemented here.

Licence

Erechtheus/geolocation is licensed under the "GNU General Public License v3.0". See also the licence file here.

Possible improvements

  • Transformer models
  • LSTM representation via Keras generators to save memory
  • REST API with twitter JSON object as input
  • How's the performance for the full network when we only feed partial info? (E.g. only text, timezone, ...)
  • Incorporate user-graph for prediction (e.g. using neural structure learning)
  • Character CNN (memory consumption pretty high in my implementation, needs generators)
  • Use image data
  • Train a worldwide country-model? Clustering?

Tested improvements

  • FastText as embedding method -> Performance for text-model is below our current methods. But, we did not use a fast-text model explicitly learned on social-media data
  • LSTM using recurrent dropout -> no improvement can be oberved (TrainIndividualModelsCNN.py)

geolocation's People

Contributors

dependabot[bot] avatar erechtheus avatar ianroberts avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

geolocation's Issues

Error when loading the model

Hi,

when I tried to run the predictText.py script, I get this exception:

Traceback (most recent call last): File "/home/virostatiq/PycharmProjects/keras-geolocation/predictText.py", line 38, in <module> textBranch = load_model('models/textBranchNorm.h5') File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/keras/engine/saving.py", line 419, in load_model model = _deserialize_model(f, custom_objects, compile) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/keras/engine/saving.py", line 225, in _deserialize_model model = model_from_config(model_config, custom_objects=custom_objects) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/keras/engine/saving.py", line 458, in model_from_config return deserialize(config, custom_objects=custom_objects) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/keras/layers/__init__.py", line 55, in deserialize printable_module_name='layer') File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/keras/utils/generic_utils.py", line 145, in deserialize_keras_object list(custom_objects.items()))) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/keras/engine/sequential.py", line 301, in from_config model.add(layer) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/keras/engine/sequential.py", line 165, in add layer(x) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/keras/engine/base_layer.py", line 431, in __call__ self.build(unpack_singleton(input_shapes)) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/keras/layers/embeddings.py", line 109, in build dtype=self.dtype) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/keras/engine/base_layer.py", line 252, in add_weight constraint=constraint) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 402, in variable v = tf.Variable(value, dtype=tf.as_dtype(dtype), name=name) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 213, in __call__ return cls._variable_v1_call(*args, **kwargs) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 176, in _variable_v1_call aggregation=aggregation) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 155, in <lambda> previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 2495, in default_variable_creator expected_shape=expected_shape, import_scope=import_scope) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 217, in __call__ return super(VariableMetaclass, cls).__call__(*args, **kwargs) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 1395, in __init__ constraint=constraint) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 1515, in _init_from_args initial_value, name="initial_value", dtype=dtype) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1039, in convert_to_tensor return convert_to_tensor_v2(value, dtype, preferred_dtype, name) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1097, in convert_to_tensor_v2 as_ref=False) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1175, in internal_convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/home/virostatiq/PycharmProjects/keras-geolocation/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 977, in _TensorTensorConversionFunction (dtype.name, t.dtype.name, str(t))) ValueError: Tensor conversion requested dtype int32 for Tensor with dtype float32: 'Tensor("embedding_3/random_uniform:0", shape=(100000, 100), dtype=float32)'

The installed dependencies match the requirements.txt exactly, but I also tried with TF 0.12.0 and TF 2.0, and with different Keras versions.

Can you help?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.