ail-project / ail-framework Goto Github PK
View Code? Open in Web Editor NEWAIL framework - Analysis Information Leak framework
License: GNU Affero General Public License v3.0
AIL framework - Analysis Information Leak framework
License: GNU Affero General Public License v3.0
Website like visitorfi5kl7q7i.onion can create a lot of correlations. It would be nice to disable correlation per type with specific domains to skip such issue.
I don't know if you experience that too. But I have a lot of docker cgroup out of memory issues resulting into the killing of child process with splash crawler. Don't know if there is an actual negative impact.
I did set --memory=2G
to --memory=3G
at
screen -S "Docker_Splash" -X screen -t "docker_splash:$port_number" bash -c 'sudo docker run -d -p '$port_number':8050 --restart=always --cpus=1 --memory=3G -v '$f':/etc/splash/proxy-profiles/ --net="bridge" scrapinghub/splash --maxrss '$u'; read x'
in bin/torcrawler/launch_splash_crawler.sh
and that seems to have resolved this.
I am running 6 crawlers concurrently.
Don't know if this should be changed as a standard setting in the source.
ERROR: peepdf 0.4.2 has requirement colorama==0.3.7, but you'll have colorama 0.4.3 which is incompatible.
ERROR: peepdf 0.4.2 has requirement Pillow==3.2.0, but you'll have pillow 7.2.0 which is incompatible.
ERROR: sflock 0.3.10 has requirement click==6.6, but you'll have click 7.1.2 which is incompatible.
ERROR: sflock 0.3.10 has requirement python-magic==0.4.12, but you'll have python-magic 0.4.18 which is incompatible.
/usr/lib/python3.6/runpy.py:125: RuntimeWarning: 'nltk.downloader' found in sys.modules after import of package 'nltk', but prior to execution of 'nltk.downloader'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
[nltk_data] Downloading package vader_lexicon to
[nltk_data] /home/ail/nltk_data...
/usr/lib/python3.6/runpy.py:125: RuntimeWarning: 'nltk.downloader' found in sys.modules after import of package 'nltk', but prior to execution of 'nltk.downloader'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
.
As much as I love the YARA rules for leak hunter it would be really useful if it was possible to edit the rules for already generated trackers in the WebUI. Especially if a lot of YARA trackers are used it is hard to identify the correct rule via the UUID at the filesystem.
I get mv: Call of stat for 'temp/jquery.canvasjs.min.js' not possible: file or directory not found
.
It is the error message translated from German, so it may not be completely correct.
If accessing a huge paste via the web interface on https://<aildomain.tld>:7000/showsavedpaste/? it is not displayed/loaded properly ...
A possible solution for that could be to initially just display/load the first 100 lines and when scrolling down dynamically load more lines.
More so it would be great to have a button in the webinterface to just download the raw content as .txt file.
(I know it is possible via right click on [raw content] and select "save target as". Nevertheless a button just doing that would enhance the usability)
I am receiving notifications from my regex trackers without a problem but even if there was a hit in the YARA trackers, which can be seen in /trackers
in the sparkline column, and configured email notification address, I don't receive anything.
I don't know how to debug that. Can you confirm for sure that email notification for YARA trackers is working properly?
Some documentation of the differences and use-cases of the possible user roles that can be chosen in https://domain.tld:7000/settings/create_user
would be great.
When configuring ...
##### Notifications ######
[Notifications]
ail_domain = https://<sub>.<domain>.<tld>:7000
...
to something else than localhost
the sent out email body doesn't show the ail_domain but nothing e.g. ...
item id: submitted/2020/09/04/54cc3ebb-81ce-4609-8cd6-9d9022876022.gz
url: /showsavedpaste/?paste=submitted/2020/09/04/54cc3ebb-81ce-4609-8cd6-9d9022876022.gz
It would be sometimes comfortable if it was possible to have the possibility to only define certain sources for a tracker to hit. For example I d'like to to have a certain regex tracker which only hits if the source was pastebin_pro or crawler or probably also specific url or both.
After last pull we getting this error, when trying to run AIL.
Traceback (most recent call last):
File "./Flask_server.py", line 24, in
from pytaxonomies import Taxonomies
ModuleNotFoundError: No module named 'pytaxonomies'
I cannot find the module, which flask is looking for. I tried to run -t thirdparty update, but without luck. Can anyone help. Thank You
test ! -d ardb/ && git clone https://github.com/yinqiwen/ardb.git pushd ardb/ make popd
I have the following gcc version when I try to compile ardb I get this error:
error: implicitly-declared
‘constexpr rocksdb::FileDescriptor::FileDescriptor(const rocksdb::FileDescriptor&)’
is deprecated [-Werror=deprecated-copy]
gcc (Debian 9.3.0-10) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Someone has already opened an issue over yinqiwen/ardb#484
This issue seems to related to a gcc-9 change
https://gcc.gnu.org/gcc-9/changes.html
grpc/grpc#19570
GPUOpen-Drivers/AMDVLK#131
Are there any workarounds? Thanks
Twitter has an URL where you can get a tweet only by its ID:
https://mobile.twitter.com/user/status/{ID}
Adding the reference from a Twitter item would allow user to find back the original tweet.
Confirming a tag removes the previous automatic tag
Would be really great to get some more insight in all the settings and their dependencies in core.cfg.
While we work on switching to kvrocks we should patch the ARDB install script.
See #15
In https://aildomain.tld:7000/search
I sometimes find stuff sometimes I don't. How does the search work exactly?
Is it possible to search also search for the name of a paste like in https://pastebin.com/a68jmkq9
searching 'a68jmkq9' yields the paste ingested by ail? => This would be a great feature if this isnt already possible
Is the serach dependant of the loaded index? If yes it would be also great to have the possibility to search in all indices.
A bit more detail or overhauled search site would help to highten usability.
Regarding the crawling of onion domains found in pastes ... Is it correct that those domains are only crawled once a month and not when they are automatically feeded to ail by e.g pystemon? If this is correct this information would be a good addition to the readme.md .
Extracting Google Analytics ID for correlation
<!-- Global site tag (gtag.js) - Google Analytics -->
<!--OLD CODE<script async src="https://www.googletagmanager.com/gtag/js?id=UA-58643-34"></script>-->
<!--OLD CODE<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-58643-34');
</script>-->
(AILENV) ail@ail:/AIL-framework$ cd pystemon//AIL-framework/pystemon$ ./pystemon.py
(AILENV) ail@ail:
File "./pystemon.py", line 69
exit('You need python version 2.7 or newer.')
^
TabError: inconsistent use of tabs and spaces in indentation
(AILENV) ail@ail:~/AIL-framework/pystemon$
My goal is to feed data to AIL, but somehow I am unable to, been trying for a while now.
Is it possible to define more than one cc in core.cfg?
[Url]
cc_critical = DE
[DomClassifier]
cc = DE
cc_tld = r'\.de$'
I`m deploying a fresh AIL but when try to run the server the flask server wont run.
this is the error:
Misp not connected
The HIVE not connected
VT submission is disabled
Traceback (most recent call last):
File "/usr/lib/python3.6/configparser.py", line 1138, in _unify_values
sectiondict = self._sections[section]
KeyError: 'Splash_Manager'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./Flask_server.py", line 41, in
from blueprints.crawler_splash import crawler_splash
File "/opt/ail-framework/var/www/blueprints/crawler_splash.py", line 28, in
import crawlers
File "/opt/ail-framework//bin/lib/crawlers.py", line 41, in
splash_manager_url = config_loader.get_config_str('Splash_Manager', 'splash_url')
File "/opt/ail-framework//bin/lib/ConfigLoader.py", line 45, in get_config_str
return self.cfg.get(section, key_name)
File "/usr/lib/python3.6/configparser.py", line 781, in get
d = self._unify_values(section, vars)
File "/usr/lib/python3.6/configparser.py", line 1141, in _unify_values
raise NoSectionError(section)
configparser.NoSectionError: No section: 'Splash_Manager'
Would be great if it was possible to configure a proxy for pystemon on system level via e.g. http_proxy=http://proxy.domain.tld:8080
and https_proxy=http://proxy.domain.tld:8080
. Because even if using the most current repo of pystemon where you can define a list of proxies to use in the pystemon config it doesnt seem to work very well. If configuring a proxy systemwide the splash crawler fails.
So probably it is possible to have some variable read from the core.cfg
and put in front of the feeder launcher script in LAUNCH.sh
.
A proxy could be neccesary if using pastebin pro account tied to a special external IP.
Would be a great and useful addtion, also for companies where pastesites are blocked and are allowed using a certain proxy.
I am having problems getting email notifications up and running. So I am wondering if there is a place where detailed logs are stored or if there is some kind of debug mode or testscripts.
I am looking at the leakhunter page and I think that it would be super useful to have more informative links to items, for instance:
HI,
when I start crawling onion sites, all appears down on dashboard.
After view the screen running the Crawler_AIL I have this error:
Traceback (most recent call last):
File "./Crawler.py", line 413, in
crawler_config = load_crawler_config(to_crawl['type_service'], url_data['domain'], to_crawl['paste'], to_crawl['url'], date)
File "./Crawler.py", line 189, in load_crawler_config
crawler_config['crawler_options'] = get_crawler_config(redis_crawler, 'auto', service_type, domain, url=url)
File "./Crawler.py", line 173, in get_crawler_config
crawler_options['time'] = int(config['time'])
KeyError: 'time'
html2text is a nifty Python library to convert to readable text any HTML document. The idea would be the following:
It would be very helpful for a lot of CERTs to provide a RT/RTIR connector in order to be able, for example, to send out tracker notifications via the ticket system (incident reports, investigations, tickets) to constituents directly with a custom description in a definable body and subject. As a basis, for example, the already available python API from https://github.com/CZ-NIC/python-rt could be a starting point.
Hi all,
it`s possible to consider to make a ingest for the site xup.in.
thanks. #
When I start AIL after about an hour it stops. The command executed is the ./LAUNCH.sh within the venv.
The interface works but the feeders and pystemon do not report anything to the console. Is there any way to make them persistent?
Thanks and greetings
It would be great to be able to see assigned misp-galaxies in the automatic crawler overview at /crawlers/auto_crawler
in order to have a better outline of the purpose of tracked pages. Alternatively also some kind of comment ability to add to automatic crawlers would be great. This information also could be displayed in the overview.
feeder parsing qr-code to extract url
It would be great to add the tag (or description) of a tracker to the tracker overview in /trackers
as a column. Then it would be possible to order by tag (or description), making it easier to have an overview of the trackers purpose. Alternatively add the possibility to add trackers to definable groups and allow to view those groups.
020-06-14 10:11:19 (694 KB/s) - ‘/home/polatalemdar/circl/ail-framework/ardb/src/../deps/rocksdb-5.14.2.tar.gz’ saved [4685894]
<<<<< Done dowloading RocksDB
Unpacking ROCKSDB
<<<<< Done unpacking ROCKSDB
Building ROCKSDB
make[2]: Entering directory '/home/polatalemdar/circl/ail-framework/ardb/deps/rocksdb-5.14.2'
GEN util/build_version.cc
GEN util/build_version.cc
CC cache/clock_cache.o
CC cache/lru_cache.o
CC cache/sharded_cache.o
CC db/builder.o
In file included from ./db/range_del_aggregator.h:16,
from ./db/memtable.h:19,
from ./db/memtable_list.h:17,
from ./db/column_family.h:17,
from ./db/version_set.h:31,
from ./db/compaction.h:11,
from ./db/compaction_iterator.h:12,
from db/builder.cc:16:
./db/version_edit.h: In constructor ‘rocksdb::FdWithKeyRange::FdWithKeyRange(rocksdb::FileDescriptor, rocksdb::Slice, rocksdb::Slice, rocksdb::FileMetaData*)’:
./db/version_edit.h:157:33: error: implicitly-declared ‘constexpr rocksdb::FileDescriptor::FileDescriptor(const rocksdb::FileDescriptor&)’ is deprecated [-Werror=deprecated-copy]
157 | largest_key(_largest_key) {}
| ^
./db/version_edit.h:47:19: note: because ‘rocksdb::FileDescriptor’ has user-provided ‘rocksdb::FileDescriptor& rocksdb::FileDescriptor::operator=(const rocksdb::FileDescriptor&)’
47 | FileDescriptor& operator=(const FileDescriptor& fd) {
| ^~~~~~~~
./db/version_edit.h: In instantiation of ‘constexpr std::pair<_T1, _T2>::pair(_U1&&, _U2&&) [with _U1 = int&; _U2 = rocksdb::FileMetaData; typename std::enable_if<(std::_PCC<true, _T1, _T2>::_MoveConstructiblePair<_U1, _U2>() && std::_PCC<true, _T1, _T2>::_ImplicitlyMoveConvertiblePair<_U1, _U2>()), bool>::type = true; _T1 = int; _T2 = rocksdb::FileMetaData]’:
/usr/include/c++/9/ext/new_allocator.h:147:4: required from ‘void __gnu_cxx::new_allocator<_Tp>::construct(_Up*, _Args&& ...) [with _Up = std::pair<int, rocksdb::FileMetaData>; _Args = {int&, rocksdb::FileMetaData}; _Tp = std::pair<int, rocksdb::FileMetaData>]’
/usr/include/c++/9/bits/alloc_traits.h:484:4: required from ‘static void std::allocator_traits<std::allocator<_CharT> >::construct(std::allocator_traits<std::allocator<_CharT> >::allocator_type&, _Up*, _Args&& ...) [with _Up = std::pair<int, rocksdb::FileMetaData>; _Args = {int&, rocksdb::FileMetaData}; _Tp = std::pair<int, rocksdb::FileMetaData>; std::allocator_traits<std::allocator<_CharT> >::allocator_type = std::allocator<std::pair<int, rocksdb::FileMetaData> >]’
/usr/include/c++/9/bits/vector.tcc:115:30: required from ‘void std::vector<_Tp, _Alloc>::emplace_back(_Args&& ...) [with _Args = {int&, rocksdb::FileMetaData}; _Tp = std::pair<int, rocksdb::FileMetaData>; _Alloc = std::allocator<std::pair<int, rocksdb::FileMetaData> >]’
./db/version_edit.h:227:48: required from here
./db/version_edit.h:76:8: error: implicitly-declared ‘constexpr rocksdb::FileDescriptor::FileDescriptor(const rocksdb::FileDescriptor&)’ is deprecated [-Werror=deprecated-copy]
76 | struct FileMetaData {
| ^~~~~~~~~~~~
./db/version_edit.h:47:19: note: because ‘rocksdb::FileDescriptor’ has user-provided ‘rocksdb::FileDescriptor& rocksdb::FileDescriptor::operator=(const rocksdb::FileDescriptor&)’
47 | FileDescriptor& operator=(const FileDescriptor& fd) {
| ^~~~~~~~
In file included from /usr/include/c++/9/bits/stl_algobase.h:64,
from /usr/include/c++/9/bits/char_traits.h:39,
from /usr/include/c++/9/string:40,
from ./db/builder.h:9,
from db/builder.cc:10:
/usr/include/c++/9/bits/stl_pair.h:342:64: note: synthesized method ‘rocksdb::FileMetaData::FileMetaData(rocksdb::FileMetaData&&)’ first required here
342 | : first(std::forward<_U1>(__x)), second(std::forward<_U2>(__y)) { }
| ^
In file included from ./db/range_del_aggregator.h:16,
from ./db/memtable.h:19,
from ./db/memtable_list.h:17,
from ./db/column_family.h:17,
from ./db/version_set.h:31,
from ./db/compaction.h:11,
from ./db/compaction_iterator.h:12,
from db/builder.cc:16:
./db/version_edit.h: In instantiation of ‘constexpr std::pair<_T1, _T2>::pair(_U1&&, const _T2&) [with _U1 = int&; typename std::enable_if<std::_PCC<true, _T1, _T2>::_MoveCopyPair<true, _U1, _T2>(), bool>::type = true; _T1 = int; _T2 = rocksdb::FileMetaData]’:
/usr/include/c++/9/ext/new_allocator.h:147:4: required from ‘void __gnu_cxx::new_allocator<_Tp>::construct(_Up*, _Args&& ...) [with _Up = std::pair<int, rocksdb::FileMetaData>; _Args = {int&, const rocksdb::FileMetaData&}; _Tp = std::pair<int, rocksdb::FileMetaData>]’
/usr/include/c++/9/bits/alloc_traits.h:484:4: required from ‘static void std::allocator_traits<std::allocator<_CharT> >::construct(std::allocator_traits<std::allocator<_CharT> >::allocator_type&, _Up*, _Args&& ...) [with _Up = std::pair<int, rocksdb::FileMetaData>; _Args = {int&, const rocksdb::FileMetaData&}; _Tp = std::pair<int, rocksdb::FileMetaData>; std::allocator_traits<std::allocator<_CharT> >::allocator_type = std::allocator<std::pair<int, rocksdb::FileMetaData> >]’
/usr/include/c++/9/bits/vector.tcc:115:30: required from ‘void std::vector<_Tp, _Alloc>::emplace_back(_Args&& ...) [with _Args = {int&, const rocksdb::FileMetaData&}; _Tp = std::pair<int, rocksdb::FileMetaData>; _Alloc = std::allocator<std::pair<int, rocksdb::FileMetaData> >]’
./db/version_edit.h:232:37: required from here
./db/version_edit.h:76:8: error: implicitly-declared ‘constexpr rocksdb::FileDescriptor::FileDescriptor(const rocksdb::FileDescriptor&)’ is deprecated [-Werror=deprecated-copy]
76 | struct FileMetaData {
| ^~~~~~~~~~~~
./db/version_edit.h:47:19: note: because ‘rocksdb::FileDescriptor’ has user-provided ‘rocksdb::FileDescriptor& rocksdb::FileDescriptor::operator=(const rocksdb::FileDescriptor&)’
47 | FileDescriptor& operator=(const FileDescriptor& fd) {
| ^~~~~~~~
In file included from /usr/include/c++/9/bits/stl_algobase.h:64,
from /usr/include/c++/9/bits/char_traits.h:39,
from /usr/include/c++/9/string:40,
from ./db/builder.h:9,
from db/builder.cc:10:
/usr/include/c++/9/bits/stl_pair.h:312:51: note: synthesized method ‘rocksdb::FileMetaData::FileMetaData(const rocksdb::FileMetaData&)’ first required here
312 | : first(std::forward<_U1>(__x)), second(__y) { }
| ^
cc1plus: all warnings being treated as errors
make[2]: *** [Makefile:1879: db/builder.o] Error 1
make[2]: Leaving directory '/home/polatalemdar/circl/ail-framework/ardb/deps/rocksdb-5.14.2'
make[1]: *** [Makefile:401: /home/polatalemdar/circl/ail-framework/ardb/src/../deps/rocksdb-5.14.2/librocksdb.a] Error 2
make[1]: Leaving directory '/home/polatalemdar/circl/ail-framework/ardb/src'
make: *** [Makefile:4: all] Error 2
It'd be nice to have an integration with the VT hunting API as source feed.
The integration would download the matched binaries/files and then ingest them as input like anything else and apply all the other magical AIL features such as pattern matching and so on.
It would be really great if there was a bit more documentation on how to use the module manager. Or is it depricated?
I am asking because I have complete different behaviour with the queues using ail-framework with or without the module manager.
In some situations it would be helpful to trigger a (group of) automatic crawler(s) manually to get "instant" results of the trackers.
Kind of a button "crawl now" and than every automatic crawler that is somehow checked by a box, or so, is activated.
Add an optional correlation (disabled by default) between CVE numbers and items, domain.
(use simple correlation helper)
Add an option to import / export trackers (can be used for backup)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.