Giter Club home page Giter Club logo

media-search-engine's Introduction

OSINT Geolocation Databases Search

Check whether a (social) media post already exists and has been geolocated in several databases (e.g. Belllingcat's, Cen4InfoRes, GeoConfirmed).


Use it live: Geolocation Databases Search


Now with visualization on a map via leaflet.js

Has both a simple web UI as well as an API and a command-line client.

Installation

Requires Python, pip and the flask package (the CLI can be used without flask). For data extraction, the osint-geo-extractor library is used.

virtualenv .venv
.venv/bin/activate
pip install -e .

Usage

Command line

# Before first run: Download necessary files
media_search -o
# Dump data into pre-formatted database so the web API can use it.
media_search -d

# Print URLs as JSON
media_search -j
# Seach for a specific URL
media_search <url>

Obtaining data

The Belllingcat database is from their Civilian Harm in Ukraine project. There's an "Export to JSON" button.

The Cen4InfoRes database is from the Center for Information Resilience's Eyes on Russia map.

The GeoConfirmed database is from the GeoConfirmed map.

The Reukraine database is from reukraine.shtab.net. Note: Beta status, currently not publicly available.

The Texty database is from Texty.org.ua. with kind permission from the authors.

The media_search -o command will download the data on your behalf and put it into a data/ folder.

API & Web UI

There's a simple flask application with a plain HTML+JS frontend.

The API part requires you to do a dump of the pre-formatted database beforehand. Run media_search -d before you start the API server.

Run the server:

$ flask --app 'media_search.web' run --port 8000

By default, the server will run on http://localhost:8000/. It is not recommended to run a flask development server in production.

Use e.g. gunicorn:

$ gunicorn -w 2 media_search.web:app'

Available routes

  • /: Web UI

  • /api/v1.1/export - GET - Export all URLs as JSON

  • /api/v1.1/query - GET/POST - Check if URL is in a database
    params:

    • urls: List of urls to search (as POST JSON body array of strings or comma-separated GET argument)
    • format: text, json or csv - Show results in web UI (text) or force downloading as results.json or results.csv file

    response:

    {
      "dataset": {
        "t.me/truexanewsua/57093": [
          {
            "desc": "2022-08-15 - A naval mine exploded...",
            "id": "UW11823",
            "location": {
              "latitude": "46.0552",
              "longitude": "30.443433",
              "place_desc": "Ukraine - Odessa Oblast - Zatoka"
            },
            "source": "CENINFORES",
            "unsanitized_url": "https://t.me/truexanewsua/57093"
          }
        ],
        "twitter.com/foo/bar": ["..."],
        "baz/bar": ["..."],
      },
      "message": "Success! Url found in database",
      "success": true
    }

    If the entry as not found, success is false.

  • /api/v1.1/query/csv - POST - Check all links in uploaded .csv file
    params: File as POST form data named file

curl example:

# POST
curl 'http://localhost:8000/api/v1.1/query' \
    -X POST \
    -H 'Content-Type: application/json' \
    --data-raw '{"urls":["https://twitter.com/RALee85/status/1497853526881546241"]}'
# GET
curl 'http://localhost:8000/api/v1.1/query?\
    urls=https://twitter.com/RALee85/status/1497853526881546241,\
         https://twitter.com/GeoConfirmed/status/1508518239567065090'

Web UI screenshot:

webui

Example

$ media_search 'https://twitter.com/RedIntelPanda/status/1488569554028707847'

Found URL https://twitter.com/RedIntelPanda/status/1488569554028707847 in 'CEN4INFORES' dataset
Id: UW0067
Description:

DATE:  01/01/2022
LINK: https://twitter.com/GirkinGirkin/status/1488544876908187650
GEOLOCATION: https://twitter.com/RedIntelPanda/status/1488569554028707847
BRIEF DESCRIPTION: Russian military equipment moving in Belarus
COUNTRY: Belarus
PROVINCE: 
DISTRICT: 
TOWN/CITY: Petrishki
COORDINATES: 54.069385, 27.211645
ARMS/MUNITION: 
VIOLENCE LEVEL: 1
ENTRY: UW0067

Prior Art

See also atlos.org, which tries to do something quite similar, but it seems it's mostly used for closed investigations.

The Centre for Information Resilience has also recently intensified efforts to integrate other sources into their map, so the utility of this tool, which was always intended to be transitional, is shrinking further.

LICENSE

Copyright 2022-2023 conflict-investigations Team. Licensed under MIT License

autosize library by Jack Moore, MIT.

leaflet for maps, licensed 2-clause BSD:

Copyright (c) 2010-2023, Volodymyr Agafonkin
Copyright (c) 2010-2011, CloudMade

Marker favicon by Leaflet project, Copyright (c) 2010-2022, Vladimir Agafonkin

media-search-engine's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.