The rts from modulexcite

Description

RTS (Realtime scrapper) is a tool developed to scrap all pasties,github,reddit..etc in real time to identify occurrence of search terms configured. Upon match an email will be triggered. Thus allowing company to react in case of leakage of code, any hacks tweeted..etc.. and harden themselves against an attack before it goes viral.

The same tool in malicious user hands can be used offensively to get update on any latest hacks, code leakage etc..

List of sites which will be monitored are:

Non-Pastie Sites

Twitter
Reddit
Github

Pastie Sites

Pastebin.com
Codepad.org
Dumpz.org
Snipplr.com
Paste.org.ru
Gist.github.com
Pastebin.ca
Kpaste.net
Slexy.org
Ideone.com
Pastebin.fr

For architecture information and details of how this tool work refer documnetation folder in this repository.

Configuration

Before using this tool is is neccessary to understand the properties file present in scrapper_config directory.

consumer.properties: Holds all the neccessary config data needed for consumer of Kafka (Refer apache Kafka guide for more information). The values present here are default options and does nto require any changes
producer.properties: Holds all the neccessary config data needed for Producer (Refer apache Kafka guide for more information).The values present here are default options and does nto require any changes
email.properties: Holds all the configuration data to send email.
scanner-configuration.properties: This is the core configuration file. Update all the config for enabling search on twitter/github(To get tokens and key refer respective sites). For pastie sites and reddit there is no need for any changes in config.

scrapper.(pastie name).profile=(Pastie profile name)
scrapper.(pastie name).homeurl=(URL from where pastie ids a extracted)
scrapper.(pastie name).regex=(Regex to fetch pastie ids)
scrapper.(pastie name).downloadurl= (URL to get information about each apstie)
scrapper.(pastie name).searchterms=(Mention terms to be searched seperated by comma)
scrapper.(pastie name).timetosleep=(Time for which pastie thread will sleep before fetching pastie ids again)

scrapper.github.profile=Github
scrapper.github.baseurl=https://api.github.com/search/code?q={searchTerm}&sort=indexed&order=asc
scrapper.github.access_token=(Get your own github access token)
scrapper.github.searchterms=(Mention terms to be searched seperated by comma)
scrapper.github.timetosleep=(Time for which github thred should sleep before searching again)

scrapper.reddit.profile=Reddit
scrapper.reddit.baseurl=https://www.reddit.com/search.json?q={searchterm}
scrapper.reddit.searchterms=(Mention terms to be searched seperated by comma)
scrapper.reddit.timetosleep=(Time for which github thred should sleep before searching again)

scrapper.twitter.apikey=test
scrapper.twitter.profile=Twitter
scrapper.twitter.searchterms=(Mention terms to be searched seperated by comma)
scrapper.twitter.consumerKey=(Get your own consumer key)
scrapper.twitter.consumerSecret=(Get your own consumerSecret)
scrapper.twitter.accessToken=(Get your own accessToken)
scrapper.twitter.accessTokenSecret=(Get your own accessTokenSecret)

How to use the tool

Install JDK
Install mvn and set the path
Start the zookeeper and Kafka Server (Refer https://kafka.apache.org/documentation/#quickstart for more information)

Commands needed to start kafka in windows:
- zooper-server-start.bat ../../config/consumer.properties
- kafka-server-start.bat ../../config/server.properties
- kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic "Kafka Topic name"
Commands needed to start kafka in linux:
- zooper-server-start.sh ../config/consumer.properties
- kafka-server-start.sh ../config/server.properties
- kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic "Kafka Topic name"

Use kafka topic created in previous step
Navigate to "rts" folder. Run command "mvn clean install -DskipTests". This willbuild the code.
Navigate to scraptool/tartget
Run the command "java -jar scraptool-1.0-SNAPSHOT-standalone.jar -t "Kafka Topic name" -c "complete path of config directory""

Authors:

Naveen Rudrappa

modulexcite / rts Goto Github PK

rts's Introduction

Description

Configuration

How to use the tool

rts's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent