ail-project / ail-framework Goto Github PK

View Code? Open in Web Editor NEW

553.0 553.0 80.0 94.94 MB

AIL framework - Analysis Information Leak framework

License: GNU Affero General Public License v3.0

Python 43.53% Shell 2.16% CSS 1.04% JavaScript 17.56% Dockerfile 0.04% HTML 35.67% YARA 0.01%

ail-framework data-mining information-extraction information-security leak

ail-framework's People

Contributors

Stargazers

Watchers

ail-framework's Issues

Add new Trackers type: CVE, IBAN, Cryptocurrencies, ...

CVE trackers, by CVE numbers

Correlation: Filter Objects by IDs, tags, subtype, ...

Website like visitorfi5kl7q7i.onion can create a lot of correlations. It would be nice to disable correlation per type with specific domains to skip such issue.

Remove or hide correlation on blank screenshots

Flask not running

When I start AIL Flask it does not start and I force it with the -m option and it does not lift. Why would this happen?
Thank you and greetings

Crawlers out of memory: Kill child process

I don't know if you experience that too. But I have a lot of docker cgroup out of memory issues resulting into the killing of child process with splash crawler. Don't know if there is an actual negative impact.
I did set --memory=2G to --memory=3G at
screen -S "Docker_Splash" -X screen -t "docker_splash:$port_number" bash -c 'sudo docker run -d -p '$port_number':8050 --restart=always --cpus=1 --memory=3G -v '$f':/etc/splash/proxy-profiles/ --net="bridge" scrapinghub/splash --maxrss '$u'; read x' in bin/torcrawler/launch_splash_crawler.sh and that seems to have resolved this.
I am running 6 crawlers concurrently.
Don't know if this should be changed as a standard setting in the source.

Installation output potential errors

ERROR: peepdf 0.4.2 has requirement colorama==0.3.7, but you'll have colorama 0.4.3 which is incompatible.
ERROR: peepdf 0.4.2 has requirement Pillow==3.2.0, but you'll have pillow 7.2.0 which is incompatible.
ERROR: sflock 0.3.10 has requirement click==6.6, but you'll have click 7.1.2 which is incompatible.
ERROR: sflock 0.3.10 has requirement python-magic==0.4.12, but you'll have python-magic 0.4.18 which is incompatible.

/usr/lib/python3.6/runpy.py:125: RuntimeWarning: 'nltk.downloader' found in sys.modules after import of package 'nltk', but prior to execution of 'nltk.downloader'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
[nltk_data] Downloading package vader_lexicon to
[nltk_data] /home/ail/nltk_data...
/usr/lib/python3.6/runpy.py:125: RuntimeWarning: 'nltk.downloader' found in sys.modules after import of package 'nltk', but prior to execution of 'nltk.downloader'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
.

Feature request: Editing YARA rules for leak hunter in the WebUI

As much as I love the YARA rules for leak hunter it would be really useful if it was possible to edit the rules for already generated trackers in the WebUI. Especially if a lot of YARA trackers are used it is hard to identify the correct rule via the UUID at the filesystem.

Installation error

I get mv: Call of stat for 'temp/jquery.canvasjs.min.js' not possible: file or directory not found .
It is the error message translated from German, so it may not be completely correct.

Enhancements for handling huge pastes (probably also a bug)

If accessing a huge paste via the web interface on https://<aildomain.tld>:7000/showsavedpaste/? it is not displayed/loaded properly ...

A possible solution for that could be to initially just display/load the first 100 lines and when scrolling down dynamically load more lines.

More so it would be great to have a button in the webinterface to just download the raw content as .txt file.
(I know it is possible via right click on [raw content] and select "save target as". Nevertheless a button just doing that would enhance the usability)

Bug?: YARA tracker notification not working

I am receiving notifications from my regex trackers without a problem but even if there was a hit in the YARA trackers, which can be seen in /trackers in the sparkline column, and configured email notification address, I don't receive anything.
I don't know how to debug that. Can you confirm for sure that email notification for YARA trackers is working properly?

Correlations graph: bitcoin transactions

Documentation: User roles

Some documentation of the differences and use-cases of the possible user roles that can be chosen in https://domain.tld:7000/settings/create_user would be great.

Exploring false-positive

AWS key -> how to reduce or validate the key?

Integration/iteraction with MISP Feeds

create trackers
import objects

Bug?: Hostname is missing in notification email

When configuring ...

##### Notifications ######
[Notifications]
ail_domain = https://<sub>.<domain>.<tld>:7000 
...

to something else than localhost the sent out email body doesn't show the ail_domain but nothing e.g. ...

item id: submitted/2020/09/04/54cc3ebb-81ce-4609-8cd6-9d9022876022.gz
url: /showsavedpaste/?paste=submitted/2020/09/04/54cc3ebb-81ce-4609-8cd6-9d9022876022.gz

Feature Request: Choose source for tracker hit

It would be sometimes comfortable if it was possible to have the possibility to only define certain sources for a tracker to hit. For example I d'like to to have a certain regex tracker which only hits if the source was pastebin_pro or crawler or probably also specific url or both.

Sig alarm to review

Credential
DomClassifier

thirdparty update: no module named pytaxonomies

After last pull we getting this error, when trying to run AIL.

Traceback (most recent call last):
File "./Flask_server.py", line 24, in
from pytaxonomies import Taxonomies
ModuleNotFoundError: No module named 'pytaxonomies'

I cannot find the module, which flask is looking for. I tried to run -t thirdparty update, but without luck. Can anyone help. Thank You

ardb repo from yinqiwen doesn't seem to be maintained anymore

ARDB

test ! -d ardb/ && git clone https://github.com/yinqiwen/ardb.git pushd ardb/ make popd

I have the following gcc version when I try to compile ardb I get this error:

error: implicitly-declared 
‘constexpr rocksdb::FileDescriptor::FileDescriptor(const rocksdb::FileDescriptor&)’ 
is deprecated [-Werror=deprecated-copy]

gcc (Debian 9.3.0-10) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Someone has already opened an issue over yinqiwen/ardb#484
This issue seems to related to a gcc-9 change
https://gcc.gnu.org/gcc-9/changes.html
grpc/grpc#19570
GPUOpen-Drivers/AMDVLK#131

Are there any workarounds? Thanks

Add original tweet ID reference

Twitter has an URL where you can get a tweet only by its ID:

https://mobile.twitter.com/user/status/{ID}

Adding the reference from a Twitter item would allow user to find back the original tweet.

Confirming a tag removes the previous automatic tag

Documentation: core.cfg

Would be really great to get some more insight in all the settings and their dependencies in core.cfg.

Fix ARDB compilation issue

While we work on switching to kvrocks we should patch the ARDB install script.
See #15

Question/Documentation/Usability: How does the search exactly work

In https://aildomain.tld:7000/search I sometimes find stuff sometimes I don't. How does the search work exactly?

Is it possible to search also search for the name of a paste like in https://pastebin.com/a68jmkq9 searching 'a68jmkq9' yields the paste ingested by ail? => This would be a great feature if this isnt already possible
Is the serach dependant of the loaded index? If yes it would be also great to have the possibility to search in all indices.

A bit more detail or overhauled search site would help to highten usability.

Question: Crawling of onion domains found in pastes automatically feeded to ail by e.g. pystemon

Regarding the crawling of onion domains found in pastes ... Is it correct that those domains are only crawled once a month and not when they are automatically feeded to ail by e.g pystemon? If this is correct this information would be a good addition to the readme.md .

Extracting Google Analytics ID for correlation

    	<!-- Global site tag (gtag.js) - Google Analytics -->
    <!--OLD CODE<script async src="https://www.googletagmanager.com/gtag/js?id=UA-58643-34"></script>-->

    <!--OLD CODE<script>
      window.dataLayer = window.dataLayer || [];
      function gtag(){dataLayer.push(arguments);}
      gtag('js', new Date());

      gtag('config', 'UA-58643-34');
    </script>-->

running pystemon.py has multiple indentation errors

(AILENV) ail@ail:/AIL-framework$ cd pystemon/
(AILENV) ail@ail:/AIL-framework/pystemon$ ./pystemon.py
File "./pystemon.py", line 69
exit('You need python version 2.7 or newer.')
^
TabError: inconsistent use of tabs and spaces in indentation
(AILENV) ail@ail:~/AIL-framework/pystemon$

My goal is to feed data to AIL, but somehow I am unable to, been trying for a while now.

Question: Possibility to define more than one cc in core.cfg

Is it possible to define more than one cc in core.cfg?

[Url]
cc_critical = DE

[DomClassifier]
cc = DE
cc_tld = r'\.de$'

Flask not working

I`m deploying a fresh AIL but when try to run the server the flask server wont run.

this is the error:

Misp not connected
The HIVE not connected
VT submission is disabled
Traceback (most recent call last):
File "/usr/lib/python3.6/configparser.py", line 1138, in _unify_values
sectiondict = self._sections[section]
KeyError: 'Splash_Manager'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./Flask_server.py", line 41, in
from blueprints.crawler_splash import crawler_splash
File "/opt/ail-framework/var/www/blueprints/crawler_splash.py", line 28, in
import crawlers
File "/opt/ail-framework//bin/lib/crawlers.py", line 41, in
splash_manager_url = config_loader.get_config_str('Splash_Manager', 'splash_url')
File "/opt/ail-framework//bin/lib/ConfigLoader.py", line 45, in get_config_str
return self.cfg.get(section, key_name)
File "/usr/lib/python3.6/configparser.py", line 781, in get
d = self._unify_values(section, vars)
File "/usr/lib/python3.6/configparser.py", line 1141, in _unify_values
raise NoSectionError(section)
configparser.NoSectionError: No section: 'Splash_Manager'

Feature-Request: Allow configuration of proxy for pystemon

Would be great if it was possible to configure a proxy for pystemon on system level via e.g. http_proxy=http://proxy.domain.tld:8080 and https_proxy=http://proxy.domain.tld:8080 . Because even if using the most current repo of pystemon where you can define a list of proxies to use in the pystemon config it doesnt seem to work very well. If configuring a proxy systemwide the splash crawler fails.
So probably it is possible to have some variable read from the core.cfg and put in front of the feeder launcher script in LAUNCH.sh.
A proxy could be neccesary if using pastebin pro account tied to a special external IP.
Would be a great and useful addtion, also for companies where pastesites are blocked and are allowed using a certain proxy.

Detailed logs

I am having problems getting email notifications up and running. So I am wondering if there is a place where detailed logs are stored or if there is some kind of debug mode or testscripts.

UI improvement

I am looking at the leakhunter page and I think that it would be super useful to have more informative links to items, for instance:

instead of twitter/2020/09/02/1300683477965209601.gz, we could have the tweet's content,
instead of urlextract/2020/09/02/blog.ardennes-developpement.com_actualites_european-green-deal-appel-projets-h2020-1-milliard-euros840d3dcc-1c31-4cd7-917a-3f9b9e3925b3.gz we could have the page title.

Crawling error

HI,

when I start crawling onion sites, all appears down on dashboard.

After view the screen running the Crawler_AIL I have this error:

Traceback (most recent call last):
File "./Crawler.py", line 413, in
crawler_config = load_crawler_config(to_crawl['type_service'], url_data['domain'], to_crawl['paste'], to_crawl['url'], date)
File "./Crawler.py", line 189, in load_crawler_config
crawler_config['crawler_options'] = get_crawler_config(redis_crawler, 'auto', service_type, domain, url=url)
File "./Crawler.py", line 173, in get_crawler_config
crawler_options['time'] = int(config['time'])
KeyError: 'time'

html2text to add in the interface and processing

html2text is a nifty Python library to convert to readable text any HTML document. The idea would be the following:

Implementing a simple tool on the UI to display any HTML item into text for readability

Feature-Request: Best Practical RT/RTIR connector for tracker notifications

It would be very helpful for a lot of CERTs to provide a RT/RTIR connector in order to be able, for example, to send out tracker notifications via the ticket system (incident reports, investigations, tickets) to constituents directly with a custom description in a definable body and subject. As a basis, for example, the already available python API from https://github.com/CZ-NIC/python-rt could be a starting point.

Add Source

Hi all,

it`s possible to consider to make a ingest for the site xup.in.

thanks. #

Feature request: Allow editing settings of existing automatic crawlers

It would be great to be able to edit the settings of already existing automatic crawlers in https://aildomain.tld:7000/crawlers/auto_crawler.
It seems that this is already somehow thought of because an icon already exists.

Stops after one hour

When I start AIL after about an hour it stops. The command executed is the ./LAUNCH.sh within the venv.
The interface works but the feeders and pystemon do not report anything to the console. Is there any way to make them persistent?

Thanks and greetings

Feature request: Add misp-galaxies to automatic crawlers overview (or add a comment section to display)

It would be great to be able to see assigned misp-galaxies in the automatic crawler overview at /crawlers/auto_crawler in order to have a better outline of the purpose of tracked pages. Alternatively also some kind of comment ability to add to automatic crawlers would be great. This information also could be displayed in the overview.

feeder parsing qr-code to extract url

Feature request: Add tag to tracker overview (or add some grouping functionality)

It would be great to add the tag (or description) of a tracker to the tracker overview in /trackers as a column. Then it would be possible to order by tag (or description), making it easier to have an overview of the trackers purpose. Alternatively add the possibility to add trackers to definable groups and allow to view those groups.

Error when installing ARDB

020-06-14 10:11:19 (694 KB/s) - ‘/home/polatalemdar/circl/ail-framework/ardb/src/../deps/rocksdb-5.14.2.tar.gz’ saved [4685894]

<<<<< Done dowloading RocksDB

Unpacking ROCKSDB
<<<<< Done unpacking ROCKSDB
Building ROCKSDB
make[2]: Entering directory '/home/polatalemdar/circl/ail-framework/ardb/deps/rocksdb-5.14.2'
GEN util/build_version.cc
GEN util/build_version.cc
CC cache/clock_cache.o
CC cache/lru_cache.o
CC cache/sharded_cache.o
CC db/builder.o
In file included from ./db/range_del_aggregator.h:16,
from ./db/memtable.h:19,
from ./db/memtable_list.h:17,
from ./db/column_family.h:17,
from ./db/version_set.h:31,
from ./db/compaction.h:11,
from ./db/compaction_iterator.h:12,
from db/builder.cc:16:
./db/version_edit.h: In constructor ‘rocksdb::FdWithKeyRange::FdWithKeyRange(rocksdb::FileDescriptor, rocksdb::Slice, rocksdb::Slice, rocksdb::FileMetaData*)’:
./db/version_edit.h:157:33: error: implicitly-declared ‘constexpr rocksdb::FileDescriptor::FileDescriptor(const rocksdb::FileDescriptor&)’ is deprecated [-Werror=deprecated-copy]
157 | largest_key(_largest_key) {}
| ^
./db/version_edit.h:47:19: note: because ‘rocksdb::FileDescriptor’ has user-provided ‘rocksdb::FileDescriptor& rocksdb::FileDescriptor::operator=(const rocksdb::FileDescriptor&)’
47 | FileDescriptor& operator=(const FileDescriptor& fd) {
| ^~~~~~~~
./db/version_edit.h: In instantiation of ‘constexpr std::pair<_T1, _T2>::pair(_U1&&, _U2&&) [with _U1 = int&; _U2 = rocksdb::FileMetaData; typename std::enable_if<(std::_PCC<true, _T1, _T2>::_MoveConstructiblePair<_U1, _U2>() && std::_PCC<true, _T1, _T2>::_ImplicitlyMoveConvertiblePair<_U1, _U2>()), bool>::type = true; _T1 = int; _T2 = rocksdb::FileMetaData]’:
/usr/include/c++/9/ext/new_allocator.h:147:4: required from ‘void __gnu_cxx::new_allocator<_Tp>::construct(_Up*, _Args&& ...) [with _Up = std::pair<int, rocksdb::FileMetaData>; _Args = {int&, rocksdb::FileMetaData}; _Tp = std::pair<int, rocksdb::FileMetaData>]’
/usr/include/c++/9/bits/alloc_traits.h:484:4: required from ‘static void std::allocator_traits<std::allocator<_CharT> >::construct(std::allocator_traits<std::allocator<_CharT> >::allocator_type&, _Up*, _Args&& ...) [with _Up = std::pair<int, rocksdb::FileMetaData>; _Args = {int&, rocksdb::FileMetaData}; _Tp = std::pair<int, rocksdb::FileMetaData>; std::allocator_traits<std::allocator<_CharT> >::allocator_type = std::allocator<std::pair<int, rocksdb::FileMetaData> >]’
/usr/include/c++/9/bits/vector.tcc:115:30: required from ‘void std::vector<_Tp, _Alloc>::emplace_back(_Args&& ...) [with _Args = {int&, rocksdb::FileMetaData}; _Tp = std::pair<int, rocksdb::FileMetaData>; _Alloc = std::allocator<std::pair<int, rocksdb::FileMetaData> >]’
./db/version_edit.h:227:48: required from here
./db/version_edit.h:76:8: error: implicitly-declared ‘constexpr rocksdb::FileDescriptor::FileDescriptor(const rocksdb::FileDescriptor&)’ is deprecated [-Werror=deprecated-copy]
76 | struct FileMetaData {
| ^~~~~~~~~~~~
./db/version_edit.h:47:19: note: because ‘rocksdb::FileDescriptor’ has user-provided ‘rocksdb::FileDescriptor& rocksdb::FileDescriptor::operator=(const rocksdb::FileDescriptor&)’
47 | FileDescriptor& operator=(const FileDescriptor& fd) {
| ^~~~~~~~
In file included from /usr/include/c++/9/bits/stl_algobase.h:64,
from /usr/include/c++/9/bits/char_traits.h:39,
from /usr/include/c++/9/string:40,
from ./db/builder.h:9,
from db/builder.cc:10:
/usr/include/c++/9/bits/stl_pair.h:342:64: note: synthesized method ‘rocksdb::FileMetaData::FileMetaData(rocksdb::FileMetaData&&)’ first required here
342 | : first(std::forward<_U1>(__x)), second(std::forward<_U2>(__y)) { }
| ^
In file included from ./db/range_del_aggregator.h:16,
from ./db/memtable.h:19,
from ./db/memtable_list.h:17,
from ./db/column_family.h:17,
from ./db/version_set.h:31,
from ./db/compaction.h:11,
from ./db/compaction_iterator.h:12,
from db/builder.cc:16:
./db/version_edit.h: In instantiation of ‘constexpr std::pair<_T1, _T2>::pair(_U1&&, const _T2&) [with _U1 = int&; typename std::enable_if<std::_PCC<true, _T1, _T2>::_MoveCopyPair<true, _U1, _T2>(), bool>::type = true; _T1 = int; _T2 = rocksdb::FileMetaData]’:
/usr/include/c++/9/ext/new_allocator.h:147:4: required from ‘void __gnu_cxx::new_allocator<_Tp>::construct(_Up*, _Args&& ...) [with _Up = std::pair<int, rocksdb::FileMetaData>; _Args = {int&, const rocksdb::FileMetaData&}; _Tp = std::pair<int, rocksdb::FileMetaData>]’
/usr/include/c++/9/bits/alloc_traits.h:484:4: required from ‘static void std::allocator_traits<std::allocator<_CharT> >::construct(std::allocator_traits<std::allocator<_CharT> >::allocator_type&, _Up*, _Args&& ...) [with _Up = std::pair<int, rocksdb::FileMetaData>; _Args = {int&, const rocksdb::FileMetaData&}; _Tp = std::pair<int, rocksdb::FileMetaData>; std::allocator_traits<std::allocator<_CharT> >::allocator_type = std::allocator<std::pair<int, rocksdb::FileMetaData> >]’
/usr/include/c++/9/bits/vector.tcc:115:30: required from ‘void std::vector<_Tp, _Alloc>::emplace_back(_Args&& ...) [with _Args = {int&, const rocksdb::FileMetaData&}; _Tp = std::pair<int, rocksdb::FileMetaData>; _Alloc = std::allocator<std::pair<int, rocksdb::FileMetaData> >]’
./db/version_edit.h:232:37: required from here
./db/version_edit.h:76:8: error: implicitly-declared ‘constexpr rocksdb::FileDescriptor::FileDescriptor(const rocksdb::FileDescriptor&)’ is deprecated [-Werror=deprecated-copy]
76 | struct FileMetaData {
| ^~~~~~~~~~~~
./db/version_edit.h:47:19: note: because ‘rocksdb::FileDescriptor’ has user-provided ‘rocksdb::FileDescriptor& rocksdb::FileDescriptor::operator=(const rocksdb::FileDescriptor&)’
47 | FileDescriptor& operator=(const FileDescriptor& fd) {
| ^~~~~~~~
In file included from /usr/include/c++/9/bits/stl_algobase.h:64,
from /usr/include/c++/9/bits/char_traits.h:39,
from /usr/include/c++/9/string:40,
from ./db/builder.h:9,
from db/builder.cc:10:
/usr/include/c++/9/bits/stl_pair.h:312:51: note: synthesized method ‘rocksdb::FileMetaData::FileMetaData(const rocksdb::FileMetaData&)’ first required here
312 | : first(std::forward<_U1>(__x)), second(__y) { }
| ^
cc1plus: all warnings being treated as errors
make[2]: *** [Makefile:1879: db/builder.o] Error 1
make[2]: Leaving directory '/home/polatalemdar/circl/ail-framework/ardb/deps/rocksdb-5.14.2'
make[1]: *** [Makefile:401: /home/polatalemdar/circl/ail-framework/ardb/src/../deps/rocksdb-5.14.2/librocksdb.a] Error 2
make[1]: Leaving directory '/home/polatalemdar/circl/ail-framework/ardb/src'
make: *** [Makefile:4: all] Error 2

Feature request - VT hunting integration

It'd be nice to have an integration with the VT hunting API as source feed.
The integration would download the matched binaries/files and then ingest them as input like anything else and apply all the other magical AIL features such as pattern matching and so on.

Documentation: Module Manager

It would be really great if there was a bit more documentation on how to use the module manager. Or is it depricated?
I am asking because I have complete different behaviour with the queues using ail-framework with or without the module manager.

Bug: Cannot delete user

I have added a read-only test user and wanted to delete it afterwards but it doesnt work.

Feature request: Allow manual trigger of group of automatic crawlers

In some situations it would be helpful to trigger a (group of) automatic crawler(s) manually to get "instant" results of the trackers.
Kind of a button "crawl now" and than every automatic crawler that is somehow checked by a box, or so, is activated.

Problems with YARA rules

I love the YARA rules feature for leak hunter but it seems that there are some problems. When choosing a default rule like that

an error is thrown ...