funilrys / pyfunceble Goto Github PK
View Code? Open in Web Editor NEWThe tool to check the availability or syntax of domain, IP or URL.
Home Page: https://pyfunceble.github.io
License: Apache License 2.0
The tool to check the availability or syntax of domain, IP or URL.
Home Page: https://pyfunceble.github.io
License: Apache License 2.0
Indeed, we should not share logs by default.
I have been told that this is a "magic" tool. And I congratulate you for that, however i have read the instructions several times:
https://pyfunceble.readthedocs.io/en/latest/what-can-we-do.html
And I still have no idea how to verify a list of urls, nor the format that this list should have.
You could review the manual and make it more friendly, with examples. Thank you
Is your feature request related to a problem? Please describe.
Yes indeed a problem, commits fail when a file size exceeds the 100mb GIT Limit - As has been seen on https://github.com/mitchellkrogza/Phishing.Database all of yesterday with no error message helping us to discover what was going wrong in the PyFunceble commits.
I traced the error this morning by reintroducing my own commit script which then revealed the error message being thrown back when we push our commit.
remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com.
remote: error: File input-source/ALL-feeds-URL.list is 127.30 MB; this exceeds GitHub's file size limit of 100.00 MB
See > https://travis-ci.org/mitchellkrogza/Phishing.Database/builds/534100063
Describe the solution you'd like
I fixed this and got commit to work by simply adding this into my commit script
git lfs install
git lfs track "*.list"
We should have this LFS support above included into PyFunceble
Additionally we should improve the way PyFunceble gets errors back from GIT by having PyFunceble capture any message with remote: error:
and report them back to us so we know we have encountered a problem that right now is leading us in circles to diagnose.
Describe alternatives you've considered
Alternatives would be
a) to split all our list files on various projects into chunks of 20mb and adapt our scripts to loop through them thereby keeping all list files of domains and urls well under the 20mb limit
b) zip them before commiting and unzip them as a new build starts - this could lead to very large objects causing our repo sizes to become big quick and require regular use of BFG Repo Cleaner
Additional context
Working commit using LFS > https://travis-ci.org/mitchellkrogza/Phishing.Database/builds/534100948
Installing the -dev version for obtaining your latest commit from ``
.PyFunceble.yaml
None, it's empty....
Steps to reproduce the behavior:
sudo -H python3 -m pip install --upgrade git+https://gitlab.com/funilrys/PyFunceble@dev
pyfunceble -m -p 4 -db --database-type mariadb -f
Try to merge upstream configuration file into /home/$user/.config/PyFunceble/.PyFunceble.yaml ? [y/n] y
Traceback (most recent call last):
File "/usr/local/bin/pyfunceble", line 8, in <module>
sys.exit(tool())
File "/usr/local/lib/python3.7/dist-packages/PyFunceble/cli/__init__.py", line 1041, in tool
raise exception
File "/usr/local/lib/python3.7/dist-packages/PyFunceble/cli/__init__.py", line 1024, in tool
PyFunceble.cconfig.Merge(PyFunceble.CONFIG_DIRECTORY)
File "/usr/local/lib/python3.7/dist-packages/PyFunceble/config/merge.py", line 108, in __init__
self._load()
File "/usr/local/lib/python3.7/dist-packages/PyFunceble/config/merge.py", line 188, in _load
self._save()
File "/usr/local/lib/python3.7/dist-packages/PyFunceble/config/merge.py", line 143, in _save
PyFunceble.helpers.Dict(self.new_config).to_yaml_file(self.path_to_config)
File "/usr/local/lib/python3.7/dist-packages/PyFunceble/helpers/dict.py", line 345, in to_yaml_file
sort_keys=sort_keys,
File "/usr/lib/python3/dist-packages/yaml/__init__.py", line 200, in dump
return dump_all([data], stream, Dumper=Dumper, **kwds)
TypeError: dump_all() got an unexpected keyword argument 'sort_keys'
ll /home/$USER/.config/PyFunceble/.PyFunceble.yaml
-rw-rw-r-- 1 $USER $USER 0 Nov 17 23:03 /home/$USER/.config/PyFunceble/.PyFunceble.yaml
Leaving this to your imagination ๐
๐ ๐ ๐ฐ We now support IPv6! ๐ฐ ๐ ๐
A configuration key is missing.
Try to merge upstream configuration file into /home/$USER/.config/PyFunceble/.PyFunceble.yaml ? [y/n] n
Traceback (most recent call last):
File "/usr/local/bin/pyfunceble", line 8, in <module>
sys.exit(tool())
File "/usr/local/lib/python3.6/dist-packages/PyFunceble/cli/__init__.py", line 1041, in tool
raise exception
File "/usr/local/lib/python3.6/dist-packages/PyFunceble/cli/__init__.py", line 1024, in tool
PyFunceble.cconfig.Merge(PyFunceble.CONFIG_DIRECTORY)
File "/usr/local/lib/python3.6/dist-packages/PyFunceble/config/merge.py", line 108, in __init__
self._load()
File "/usr/local/lib/python3.6/dist-packages/PyFunceble/config/merge.py", line 202, in _load
raise PyFunceble.exceptions.ConfigurationFileNotFound()
PyFunceble.exceptions.ConfigurationFileNotFound
OS: for example Arch Linux (5.0.5-arch1-1-ARCH)
Python Version: Python 2.7.16, 3.7.3, 3.7.4
PyFunceble Version: for example 1.2.0
pyfunceble -v
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/PyFunceble/cli/__init__.py", line 95, in tool
PyFunceble.load_config(generate_directory_structure=False)
File "/usr/local/lib/python3.7/dist-packages/PyFunceble/__init__.py", line 682, in load_config
cconfig.Load(CONFIG_DIRECTORY, custom)
File "/usr/local/lib/python3.7/dist-packages/PyFunceble/config/load.py", line 90, in __init__
self.__load_it()
File "/usr/local/lib/python3.7/dist-packages/PyFunceble/config/load.py", line 101, in __load_it
self._load_config_file()
File "/usr/local/lib/python3.7/dist-packages/PyFunceble/config/load.py", line 300, in _load_config_file
self._install_iana_config()
File "/usr/local/lib/python3.7/dist-packages/PyFunceble/config/load.py", line 409, in _install_iana_config
iana_link = self.data["links"]["iana"]
TypeError: 'NoneType' object is not subscriptable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/pyfunceble", line 8, in <module>
sys.exit(tool())
File "/usr/local/lib/python3.7/dist-packages/PyFunceble/cli/__init__.py", line 1039, in tool
PyFunceble.LOGGER.exception()
AttributeError: 'NoneType' object has no attribute 'exception'
Add any other context about the problem here.
I was wondering if PyFunceble could be packaged as a snap package or flatpak. I'm not a python update expert and it looks like the documentation shows 3 ways to install and 3 ways to update depending on the python, pip or github way you do it. I just got a message that I needed to update my PyFunceble to a new version and I don't remember how I installed it in the 1st place :/
I hate to propose yet another way to implement it but if it was an isolated installation it could be packaged with its dependencies, its environment would be self-contained- regardless of Linux flavor, and updating would be as simple as "sudo snap refresh" which finds updates for any snaps that need updating.
Just a thought :)
Would be great to see a feature added to achieve the following
test a full url (or input file or urls) not just the domains
test only for http status codes 200 OK, 404 NOT FOUND, 410 GONE and 403 FORBIDDEN (no whois checks so will be much faster)
produce simple output files of URLs ACTIVE, URLS INACTIVE
Describe the bug
When testing for domain syntax a valid case is rejected with "INVALID".
Modifications under .PyFunceble.yaml
None
To Reproduce
Steps to reproduce the behavior:
$ pyfunceble -d google.de. -s
google.de. INVALID
$ curl google.de.
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.de/">here</A>.
Expected behavior
Domains ending with a dot are verified valid and tested if they are available
Versions (please complete the following information):
Additional context
I guess this is nitpicking over what is a valid domain syntax and what not. Also the mentioned syntax might be quiet uncommon.
But if syntax validation is offered, the check should return reliable results.
Additional Information http://www.dns-sd.org./TrailingDotsInDomainNames.html
Tested on - pyfunceble 2.6.6.dev (Green Galago: Skitterbug) with Python 3.7.4 (Python 3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 19:29:22) [MSC v.1916 32 bit (Intel)] on win32)
-dns switch to use custom dns server is not working in Windows 7, still uses OS settings for DNS.
I also tried setting in .PyFunceble.yaml but it still goes to OS defined DNS server.
Is this a PyFunceble or Python issue?
Is your feature request related to a problem? Please describe.
Url testing marks links with port numbers after the domain name like :80 :81 :8080 etc are marked INVALID and never tested
https://github.com/mitchellkrogza/Phishing.Database/tree/master/phishing-links/output/domains/INVALID/list
Describe the solution you'd like
Test all links that include a random port number
Describe alternatives you've considered
Additional context
Add any other context or screenshots about the feature request here.
Describe the bug
AppKit for macos not working or even needed.
As far as i can see, appkit python module is used to figure out what the config directory on macos should be. I do not get why this needs to be any different from what linux does, so i changed line 108 to
if system().lower() == "linux" or system().lower() == "darwin":
Works flawlessly
Modifications under .PyFunceble.yaml
nothing
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Working script telling me google.de is valid domain
Screenshots
srsly? Screenshots of a terminal?
Traceback (most recent call last):
File "./pyfunceble", line 11, in <module>
load_entry_point('PyFunceble==1.0.0', 'console_scripts', 'pyfunceble')()
File "/Users/sebastian/.pyenv/versions/funceble/lib/python3.6/site-packages/pkg_resources/__init__.py", line 487, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/Users/sebastian/.pyenv/versions/funceble/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2728, in load_entry_point
return ep.load()
File "/Users/sebastian/.pyenv/versions/funceble/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2346, in load
return self.resolve()
File "/Users/sebastian/.pyenv/versions/funceble/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2352, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "/Users/sebastian/.pyenv/versions/funceble/lib/python3.6/site-packages/PyFunceble-1.0.0-py3.6.egg/PyFunceble/__init__.py", line 141, in <module>
from AppKit import ( # pylint: disable=import-error
ModuleNotFoundError: No module named 'AppKit'
Versions (please complete the following information):
Additional context
Add any other context about the problem here.
I've seen this done in a pfSense add-on called pfBlockerNG. It will consume a bunch of DNSBL host files and also convert any International Domain Names into the appropriate punycode.
https://en.wikipedia.org/wiki/Punycode
https://thehackernews.com/2017/04/unicode-Punycode-phishing-attack.html
So instead of outputting https://www.ะฐััำะต.com
you'd get https://www.xn--80ak6aa92e.com/
since it's actually using Cyrillic to obfuscate www.apple.com.
You can copy & paste that first apple url into your address bar to see that it takes you to a proof-of-concept site.
sample code links to convert unicode to ASCII on the wikipedia page and here in the pfB add-on
I'm trying to understand the 'public-suffix.json'.
What I can guess from https://pyfunceble.readthedocs.io/en/latest/code/publicsuffix.html it is a kind of whitelist, is that corect?
if [ a == yes ]
then
How do I omit it?
else
if [ a == no ]
why are domains like blogspot.tld listed there (as those are some I really don't like to be whitelisted in my project...)
fi
fi
Is your feature request related to a problem? Please describe.
Would be useful to have version number printed into the percentage.txt file helpful for diagnosing big time differences between tests and versions of dev.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
A clear and concise description of what the bug is.
I'm running this app (only) on a virtual host with 4GB, but after a while it have eaten all the memory to final fail
$ free -m
total used free shared buff/cache available
Mem: 3850 3014 326 29 509 596
Swap: 0 0 0
5 minutes later...
free -m
total used free shared buff/cache available
Mem: 3850 3097 242 29 510 513
Swap: 0 0 0
.PyFunceble.yaml
Ask @mitchellkrogza running on his Badd-Boyz-Hosts
Steps to reproduce the behavior:
Open source at https://github.com/spirillen/Dead-Domains
freeing memory as each query is processed and written to disk
OS: Debian 10, (Buster)
Python Version: Python 3.7.3
PyFunceble Version: pyfunceble 2.2.0. (Green Galago: Skitterbug)
None
Is your feature request related to a problem? Please describe.
It is not possible to use DoT.
Describe the solution you'd like
I would like to use DoT to protect my DNS requests.
Using the DNS of the host system with DoT often leads to problems because some applications simply don't work anymore.
This is why DoT should be implemented in PyFunceble.
I recommend to provide a way to configure the DoT settings in the configuration file, so that you don't have to enter everything manually every time you need to use it.
https://developers.cloudflare.com/1.1.1.1/dns-over-tls/
Additional context
It should be possible to specify the URL and server for DoT.
Example:
#Configfile
DoTurl = Your_ID.dns.nextdns.io
Server 1= ip.ip.ip.ip
Server 2= ip.ip.ip.ip
Server 3= ipv6:ipv6:1pv6:ipv6
Server 4= ipv6:ipv6:1pv6:ipv6
user@computer:~/repos/PyFunceble$ . venv/bin/activate && pip3 install -e .
Obtaining file:///home/user/repos/PyFunceble
Collecting colorama>=0.3.9 (from PyFunceble-dev==0.133.1)
Downloading https://files.pythonhosted.org/packages/4f/a6/728666f39bfff1719fc94c481890b2106837da9318031f71a8424b662e12/colorama-0.4.1-py2.py3-none-any.whl
Collecting domain2idna>=1.6.1 (from PyFunceble-dev==0.133.1)
Using cached https://files.pythonhosted.org/packages/4e/27/b7336824583e26d3e33f7b6917c00e51b7c8a94bc1d4b78d6aa1eb9c7e8b/domain2idna-1.6.1-py3-none-any.whl
Collecting PyYAML>=3.13 (from PyFunceble-dev==0.133.1)
Collecting requests>=2.19.1 (from PyFunceble-dev==0.133.1)
Using cached https://files.pythonhosted.org/packages/ff/17/5cbb026005115301a8fb2f9b0e3e8d32313142fe8b617070e7baad20554f/requests-2.20.1-py2.py3-none-any.whl
Collecting setuptools>=40.4.3 (from PyFunceble-dev==0.133.1)
Using cached https://files.pythonhosted.org/packages/e7/16/da8cb8046149d50940c6110310983abb359bbb8cbc3539e6bef95c29428a/setuptools-40.6.2-py2.py3-none-any.whl
Collecting urllib3>=1.23 (from PyFunceble-dev==0.133.1)
Using cached https://files.pythonhosted.org/packages/62/00/ee1d7de624db8ba7090d1226aebefab96a2c71cd5cfa7629d6ad3f61b79e/urllib3-1.24.1-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests>=2.19.1->PyFunceble-dev==0.133.1)
Downloading https://files.pythonhosted.org/packages/9f/e0/accfc1b56b57e9750eba272e24c4dddeac86852c2bebd1236674d7887e8a/certifi-2018.11.29-py2.py3-none-any.whl (154kB)
100% |โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 163kB 2.9MB/s
Collecting chardet<3.1.0,>=3.0.2 (from requests>=2.19.1->PyFunceble-dev==0.133.1)
Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl
Collecting idna<2.8,>=2.5 (from requests>=2.19.1->PyFunceble-dev==0.133.1)
Using cached https://files.pythonhosted.org/packages/4b/2a/0276479a4b3caeb8a8c1af2f8e4355746a97fab05a372e4a2c6a6b876165/idna-2.7-py2.py3-none-any.whl
Installing collected packages: colorama, setuptools, domain2idna, PyYAML, certifi, chardet, idna, urllib3, requests, PyFunceble-dev
Found existing installation: PyFunceble-dev 0.127.5
Exception:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/usr/lib/python3/dist-packages/pip/commands/install.py", line 360, in run
prefix=options.prefix_path,
File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 778, in install
requirement.uninstall(auto_confirm=True)
File "/usr/lib/python3/dist-packages/pip/req/req_install.py", line 734, in uninstall
FakeFile(dist.get_metadata_lines('entry_points.txt'))
File "/usr/lib/python3.6/configparser.py", line 763, in readfp
self.read_file(fp, source=filename)
File "/usr/lib/python3.6/configparser.py", line 718, in read_file
self._read(f, source)
File "/usr/lib/python3.6/configparser.py", line 1092, in _read
fpname, lineno)
configparser.DuplicateOptionError: While reading from '<???>' [line 3]: option 'pyfunceble' in section 'console_scripts' already exists
I'd like to inform you also that your script wrongly detects some domains, for example
pogotowie-komputerowe-warszawa.com.pl
is still registered. I assume that's probably, cuz it doesn't correctly extracts expiration date for that domains.
PyFunceble 2.2.0 does not seem to me to parse:
#
in hiding rules, e.g. example.org##.example
.||example.org$document
.$all
(as seen in https://github.com/uBlockOrigin/uAssets/blob/master/filters/badware.txt#L334)..PyFunceble.yaml
None that I'm aware of.
Steps to reproduce the behavior:
PyFunceble -ad -f https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/badware.txt
See that ~180 domains get processed by PyFunceble.
OS: Windows 10 May 2019 Update
Python Version: 3.6.8, according to Cygwin
PyFunceble Version: 2.2.0
||example.org$document
is a uBlock Origin-created syntax that remains unsupported in ABP-likes or in AdGuard, and it's considered a very important syntax to use in uBO-specific lists, especially anti-malware ones.
Describe the bug
Ultimate-Hosts-Blacklist/dev-center#16
*.github.io
should be added to the list of SPECIAL rules.
Indeed, if it returns 404, we can be sure that it does not exists anymore. Some extra checks of the HTML like what we do with .blogspot. may not be necessary.
Indeed, as "sharing is caring", we should share the Internal PyFunceble.Lookup class.
When I try to use the --database-type mysql
on travis I get the following error
.PyFunceble.yaml
cat dev-tools/.pyfunceble-env
PYFUNCEBLE_DB_CHARSET=utf8mb4
PYFUNCEBLE_DB_HOST=localhost
PYFUNCEBLE_DB_NAME=PyFunceble
PYFUNCEBLE_DB_PASSWORD=
PYFUNCEBLE_DB_PORT=3306
PYFUNCEBLE_DB_USERNAME=root
These should be the currect info to use according to travis doc
Test string:
PyFunceble --travis -h -db -m -p 4 -ex --dns 127.0.0.1 --cmd-before-end \
"bash ${TRAVIS_BUILD_DIR}/dev-tools/FinalCommit.sh" --plain --autosave-minutes 20 \
--commit-autosave-message "V1.${yeartag}.${monthtag}.${TRAVIS_BUILD_NUMBER} [Auto Saved]" \
--commit-results-message "V1.${yeartag}.${monthtag}.${TRAVIS_BUILD_NUMBER}" \
-f ${testfile}
Tested on the dev version by install command
install:
- pip3 install --upgrade pip
- pip3 install PyFunceble-dev
A clear and concise description of what you expected to happen.
If applicable, add screenshots to help explain your problem.
PyFunceble Version: Dev
If you use the Pyf with -uf 'url'
all records is marked as invalid, but if you download the same file and then test it with the -f
thinks runs as excepted
.PyFunceble.yaml
nothing
Steps to reproduce the behavior:
pyfunceble -m -p 8 -db --database-type mariadb -uf 'https://gitlab.com/my-privacy-dns/external-sources/antipopads/raw/master/hosts'
Subject Status HTTP Code
---------------------------------------------------------------------------------------------------- ----------- ----------
jyahmckzsbh.com INVALID ***
nuowoczmvits.com INVALID ***
dsdiztki.bid INVALID ***
lwtsrwwlfd.com INVALID ***
wftduglf.com INVALID ***
wget 'https://gitlab.com/my-privacy-dns/external-sources/antipopads/raw/master/hosts'
pyfunceble -m -p 8 -db --database-type mariadb -f hosts
Subject Status HTTP Code
---------------------------------------------------------------------------------------------------- ----------- ----------
xrkfqpbubaq.com ACTIVE ***
htabtzmi.bid INACTIVE ***
Tets running equally
OS: Ubuntu Bionic
Python Version: 3.7
PyFunceble Version: pyfunceble 2.2.0. (Green Galago: Skitterbug)
sudo -H python3 -m pip install --upgrade PyFunceble
prior to the test...
with sudo -H python3 -m pip install --upgrade PyFunceble-dev
resolts is:
pyfunceble -m -p 8 -db --database-type mariadb -uf 'https://gitlab.com/my-privacy-dns/external-sources/antipopads/raw/master/hosts'
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/PyFunceble/cli/__init__.py", line 95, in tool
PyFunceble.load_config(generate_directory_structure=False)
File "/usr/local/lib/python3.6/dist-packages/PyFunceble/__init__.py", line 682, in load_config
cconfig.Load(CONFIG_DIRECTORY, custom)
File "/usr/local/lib/python3.6/dist-packages/PyFunceble/config/load.py", line 90, in __init__
self.__load_it()
File "/usr/local/lib/python3.6/dist-packages/PyFunceble/config/load.py", line 101, in __load_it
self._load_config_file()
File "/usr/local/lib/python3.6/dist-packages/PyFunceble/config/load.py", line 300, in _load_config_file
self._install_iana_config()
File "/usr/local/lib/python3.6/dist-packages/PyFunceble/config/load.py", line 409, in _install_iana_config
iana_link = self.data["links"]["iana"]
TypeError: 'NoneType' object is not subscriptable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/pyfunceble", line 8, in <module>
sys.exit(tool())
File "/usr/local/lib/python3.6/dist-packages/PyFunceble/cli/__init__.py", line 1039, in tool
PyFunceble.LOGGER.exception()
AttributeError: 'NoneType' object has no attribute 'exception'
As it is now possible (PyFunceble-dev) to do the following
$ PyFunceble -d github.com twitter.com
It should be also possible to use
$ PyFunceble -d github.com twitter.com --complements
Describe the solution you'd like
Generate an additional list with WWW subdomains.
Some lists have subdomains with WWW.BEISPIEL.COM
And with many missing these WWW subdomains simply, therefore I suggest for each repository an additional WWW list to the Active Invalid etc. to generate.
This is very helpful for many people.
Additional context
The folder structure should then look like this.
ACTIVE
INACTIVE
INVALID
VALID
WWW
.keep
Thanks to @dnmTX (Ultimate-Hosts-Blacklist/dev-center#18 and all @Ultimate-Hosts-Blacklist issues he reported),
I was able to find out that in some rare cases the autocontinue subsystem may not work proprelly.
A patch to fix it is highly required.
Is your feature request related to a problem? Please describe.
It takes far too long to test all URLs of a list (Slow Internet).
It would therefore be very good if you could check a list of URLs only for incorrect URLs.
Without DNS query or so, only a check if all URLs are correct and if no errors have occurred e.g. a dot at the end of a URL.
Describe the solution you'd like
When I check a list of domains I only want to check if all URLs are correct.
And if there are invalid URLs these should be output in a file.
Describe alternatives you've considered
The alternative is to check everything manually, but it takes too much time.
Additional context
Example domains for testing: ()
0.0.0.0 194.58.122.146\032stratum.aikapool.com
0.0.0.0 POOL.moneropool.com
0.0.0.0 aikapool.com.
0.0.0.0 VPWCHCDEVWEB001.cryptopia.co.nz
0.0.0.0 aikapool.comwww.aikapool.com
All these URLs should be output to an extra file with a switch --invalidcheck
.
Such a feature is very useful because it allows you to create accurate updates and save a lot of time.
The program freezes every 2 or 3 hours.
Data Sheet:
Hardware / Software: Intel Xeon 2 core, 16 GB RAM, HDD 2 TB, Ubuntu 18.04.2 LTS x64
python -V: 2.7.15+
pip3 --version: pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.6)
pip --version: pip 9.0.1 from /usr/lib/python2.7/dist-packages (python 2.7)
run: PyFunceble -m -p 150 -f list
Installation method: pip3 install -r requirements.txt && pip3 install PyFunceble
File to be processed: Unix/Linux List format (UTF-8 .txt) with 5 M of lines of domains
Measures taken: CTRL + C and run it again
The -p parameter has been changed (-p 200, -p 150, -p 100, -p 50 and -p has been removed), and the result is the same. I have tested on other computers with superior hardware and the result is the same
PD: What is the recommended installation method (error-proof)?
Problem with uninstall with this method:
#38 (comment)
As reported by @dnmTX at Ultimate-Hosts-Blacklist/dev-center#9:
everything with
##[href^=
....
are ignored.
version 1.7.0
amongst these I show the main a4.tl domain as inactive but it should work:
a4.tl INACTIVE ***
apptrk.a4.tl ACTIVE 302
els.a4.tl ACTIVE 403
jrs.a4.tl ACTIVE ***
ldap.a4.tl ACTIVE ***
preroll.a4.tl ACTIVE 403
sdk.a4.tl ACTIVE 403
https://www.instra.com/en/whois/whois-result/a4_tl
Another 2:
adform.net
adformdsp.net
https://reports.internic.net/cgi/whois?whois_nic=adform.net&type=domain
https://reports.internic.net/cgi/whois?whois_nic=adformdsp.net&type=domain
more
adkmob.com INACTIVE ***
bp.adkmob.com ACTIVE 403
ssdk.adkmob.com ACTIVE 403
adleads.com INACTIVE ***
https://reports.internic.net/cgi/whois?whois_nic=adleads.com&type=domain
0.0.0.0 a4.tl
0.0.0.0 adformdsp.net
0.0.0.0 adform.net
0.0.0.0 adkmob.com
0.0.0.0 adleads.com
0.0.0.0 admoda.com
0.0.0.0 adsmogo.mobi
0.0.0.0 adsmogo.net
0.0.0.0 adywind.com
0.0.0.0 adzerk.net
0.0.0.0 alexajstrack.com
0.0.0.0 applifier.info
0.0.0.0 appnexus.net
0.0.0.0 apxadtracking.net
0.0.0.0 atti.com
0.0.0.0 avazunativeads.com
0.0.0.0 cpro.baidu.cn
0.0.0.0 bayctrk.com
0.0.0.0 billymobile.com
0.0.0.0 cb-cdn.com
0.0.0.0 cedexis-radar.net
0.0.0.0 chartboosts.com
0.0.0.0 clickkydsp.com
0.0.0.0 cnbc7.com
Is your feature request related to a problem? Please describe.
Would be a nice feature to have the execution time recorded into the percentage.txt file.
Describe the solution you'd like
Just print the final execution time from the -ex
paramater into the percentage.txt file as in screengrab below shown from console.
Describe alternatives you've considered
Broken builds suddenly - https://travis-ci.org/mitchellkrogza/Phishing-URL-Testing-Database-of-Link-Statuses/jobs/532790699#L365-L366
Full traceback:
Traceback (most recent call last):
File "/home/travis/virtualenv/python3.7.1/bin/PyFunceble", line 10, in <module>
sys.exit(_command_line())
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/PyFunceble/__init__.py", line 1460, in _command_line
link_to_test=ARGS.link,
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/PyFunceble/dispatcher.py", line 122, in __init__
FileCore(url_file_path, "url").read_and_test_file_content()
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/PyFunceble/file_core.py", line 613, in read_and_test_file_content
self._test_line(line)
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/PyFunceble/file_core.py", line 517, in _test_line
status = self.__process_test(subject)
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/PyFunceble/file_core.py", line 367, in __process_test
return self.url(subject)
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/PyFunceble/file_core.py", line 256, in url
subject, subject_type="file_url", filename=self.file
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/PyFunceble/status.py", line 700, in __init__
"http_status_code": HTTPCode(self.subject, "url").get(),
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/PyFunceble/http_code.py", line 188, in get
http_code = self._access()
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/PyFunceble/http_code.py", line 146, in _access
verify=PyFunceble.CONFIGURATION["verify_ssl_certificate"],
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/requests/api.py", line 101, in head
return request('head', url, **kwargs)
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/requests/sessions.py", line 519, in request
prep = self.prepare_request(req)
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/requests/sessions.py", line 462, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/requests/models.py", line 313, in prepare
self.prepare_url(url, params)
File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/requests/models.py", line 387, in prepare_url
raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL '=======': No schema supplied. Perhaps you meant http://=======?
When checking the URLs in a list, .onion addresses are always recognized as invalid.
I expect that .onion addresses will simply be skipped.
Example of a .onion address from my list.
# File generated by PyFunceble (v2.17.9.dev) / https://github.com/funilrys/PyFunceble
# Date of generation: Sun 03 Nov 23:11:59 CET 2019
xrqwig7erykgll4z.onion
Describe the bug
I am unable to use PyFunceble, because trying to install or update PyFunceble, through the "pure Python" methods that are described in the installation and update guides, isn't going all that well. This is because it only installs 1.0.0 instead of 1.8.0, and then refuses to run due to a message that tells me to update PyFunceble, which I can't.
Modifications under .PyFunceble.yaml
No changes that I know about.
To Reproduce
Steps to reproduce the behavior:
git checkout master && git fetch origin && git merge origin/master
pyfunceble -v
pyfunceble 1.0.0. (Blue Bontebok)
Expected behavior
A clear and concise description of what you expected to happen.
Versions (please complete the following information):
Additional context
I am unable to test with either of the pip3 methods, because of something of some sort:
I have absolutely no idea how to get the new GitHub compressed-archive download method to work either, with this incomprehensible error being shown:
This is my recommended way of running PyFunceble on just about any distro.
@funilrys guided me on this some time ago and I would actually never run PyFunceble now in any other way so kudos on this goes to him.
I in fact don't run anything to with Python now without it running inside a Conda virtual environment. Distributions like Ubuntu are especially troublesome with Python issues which are easily solved by just running Python in Conda environments.
@funilrys feel free to add to improve this in any way.
# -------------------------------
# Setup Conda Python Environments
# -------------------------------
# 1. Add Conda Path to .bashrc (add line below to bottom of bashrc)
export PATH="${HOME}/miniconda/bin:${PATH}"
# 2. Reload your bashrc
source .bashrc
# 3. Download Conda
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
# 4. Install Conda
bash miniconda.sh -b -p ${HOME}/miniconda
# 5. Setup Conda
hash -r
conda config --set always_yes yes --set changeps1 no
# 6. Update Conda
conda update -q conda
# 7. Create an Environment (EXAMPLE: creating an environment called pyfuncebletesting with Python version 3.7.3)
conda create -q -n pyfuncebletesting python="3.7.3"
# 8. Activate this environment you just created
source activate pyfuncebletesting
# 9. Query Python and Pip versions inside this environment
python -VV
pip --version
# 10. Install PyFunceble in this environment (pyfuncebletesting)
pip install PyFunceble
# 11. Create the directory where you are going to run PyFunceble and output results
mkdir /home/myuser/pyfuncebletesting
# 12. Export the Working Path to PyFunceble before running PyFunceble
export PYFUNCEBLE_CONFIG_DIR=/home/myuser/pyfuncebletesting/
# 13. Run PyFunceble Testing
PyFunceble -m -p 50 -ex --plain --idna -f mylist.txt
# 14. When finished - Deactivate the environment
source deactivate pyfuncebletesting
# Your results ?? Exactly where they should be in the folder you created in step 11 - inside the /output folder
# ----------------------------
# Run tests again another day?
# ----------------------------
# 1. First Update Conda
conda update -q conda
# 2. Activate your environment
source activate pyfuncebletesting
# 3. Upgrade your environment
pip install --upgrade pip
pip install PyFunceble --upgrade
# 4. Export the Path to PyFunceble before running PyFunceble
export PYFUNCEBLE_CONFIG_DIR=/home/myuser/pyfuncebletesting/
# 5. Run PyFunceble
PyFunceble -m -p 50 -ex --plain --idna -f mylist.txt
# 6. When finished - Deactivate the environment
source deactivate pyfuncebletesting
Is your feature request related to a problem? Please describe.
--quiet
of course gives no output (as expected) ๐น but it causes Travis-CI to fail due to no input received.
Describe the solution you'd like
A one line feedback message to Travis-CI every 30-60 seconds "PyFunceble - Testing in Progress"
Describe alternatives you've considered
Additional context
With multiprocess some builds can create very big logs so we eventually want PyFunceble to be quiet and not create build logs that are too big for Travis-CI
I am always thinking about how we can finally get this script accepted as THE go-to domain list curation script. Some of my ideas may seem completely impractical, and I realize that, but I just thought I'd dump my current thoughts as they are. Maybe they're useful or maybe not.
I have talked about using proxies and VPNs and such in the past to get around network vector and general connectivity issues causing erroneous results. However, independent of the inevitable technical failures and other mysteries that befall network-connected devices, inclusion of WHOIS data could also be factored in as a more stable resource to check if a domain is still a baddie or not. Obviously saving WHOIS data for every domain would take a long time, so maybe just a column with a hash of the WHOIS data for each domain. And when the hash changes, the domain should be marked to be manually rechecked. Some events that cause WHOIS data to change, such as transferring ownership, could signal a domain changing its ways and no longer following the Dark Side, so it should be manually checked for reconsideration of being on the list because its category could possibly change. This would also be independent of network connectivity issues and possible random outages or whatnot.
Other events, such as a simple domain renewal and the updating of years and contact information for an owner that is retaining ownership, shouldn't happen frequently enough to make this extra check too annoying, maybe once a year to once every few years per domain. And a quick manual check to verify the current state of the domain shouldn't take too much of the curator's time. To be more thorough, the WHOIS data could possibly be stored either to a file structure of some type or to a SQL/SQLite database to speed things up and only be updated when the hash changes to signify a change. Obviously then the first time the script is run it would take a long time to dump all the WHOIS data, but subsequent runs would be much faster as only WHOIS data that has changed would be updated. If the WHOIS data itself is saved, the curator could then just quickly compare the data and see if it's actually an ownership transfer or just a number here or there changing.
I have also been thinking about the inclusion of possibly scraping key values off of websites to check if a website has changed enough to warrant manual intervention, such as a header or footer changing brand/company names or actually looking for particular malware scripts included in the page. These would be stored as a simple search string of what to look for and what the value should be, so a basic key-value pair database. If the value scraped doesn't match the value stored in the database, then the domain could then be marked for manually checking. Scraping, however, would still be prone to connectivity and network vector issues.
Is your feature request related to a problem? Please describe.
Since my use of PyFunceble (as of at the time of writing) relies on me manually editing out domains from my lists that PyFunceble have declared to be inactive, I wonder if there's a way to make the terminal window only display red/inactive domains that it has parsed?
That way I won't have to scroll quite as much through walls of green/active domains as I've had to do so far, and that I won't have to stretch the terminal window to 1400~1500px height in order to spot all the inactive domains.
Describe the solution you'd like
A way to only have the invalid domains that PyFunceble have parsed show up in the terminal window, or some quick advice on how to turn on (or use a modifier to achieve) such a function if it already exists.
Describe alternatives you've considered
If the above is not possible to implement, then I suppose it could be possible to add an adblock
subfolder to the output
foldertree or something like that, or perhaps that I'd look into PyFunceble\output\splited\invalid
much more often, but it'd feel like a detour in my eyes since it'd not be in PyFunceble's de-facto UI itself.
Additional context
None that I'm aware of.
Found a few misses in the INVALID/list after testing https://raw.githubusercontent.com/Clefspeare13/pornhosts/master/domains%20to%20check.txt
cat output/domains/INVALID/list
# File generated by PyFunceble (v2.45.0.dev) / https://github.com/funilrys/PyFunceble
# Date of generation: 2019-11-18T01:14:07.846084
a
added
and
checked
collectionofbestporn
command,
curl
did
double
file
hosts
list
needed.
not
of
properly
respond
the
which
will
www.adultcashtraffic.com
www.blog.gfrevenge.com
www.hqporner.comstudiowow-girls
www.largehdtube.comen
www.media.the-adult-company.com
www.pornblade.comcategoryanal-porn
www.porndoepremium.comcategories
www.sexdating
www.spankbang
yallainternethotnig
The errors I see is:
www.adultcashtraffic.com
www.blog.gfrevenge.com
www.media.the-adult-company.com
gfrevenge.com
Domain Name: GFREVENGE.COM
Registry Domain ID: 1312074137_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.eurodns.com
Registrar URL: http://www.EuroDNS.com
Updated Date: 2019-10-27T04:31:37Z
Creation Date: 2007-11-02T00:31:15Z
Registry Expiry Date: 2020-11-02T00:31:15Z
Registrar: EuroDNS S.A.
Registrar IANA ID: 1052
Registrar Abuse Contact Email: [email protected]
Registrar Abuse Contact Phone: +352.27220150
Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
Name Server: DNS1.P03.NSONE.NET
Name Server: DNS2.P03.NSONE.NET
Name Server: DNS3.P03.NSONE.NET
Name Server: DNS4.P03.NSONE.NET
Name Server: SDNS3.ULTRADNS.BIZ
Name Server: SDNS3.ULTRADNS.COM
Name Server: SDNS3.ULTRADNS.NET
Name Server: SDNS3.ULTRADNS.ORG
DNSSEC: unsigned
whois adultcashtraffic.com
Expired
whois the-adult-company.com
Domain Name: THE-ADULT-COMPANY.COM
Registry Domain ID: 1370101655_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.safenames.net
Registrar URL: http://www.safenames.net
Updated Date: 2019-08-04T05:05:18Z
Creation Date: 2008-01-03T13:29:55Z
Registry Expiry Date: 2021-01-03T13:29:55Z
Registrar: SafeNames Ltd.
Registrar IANA ID: 447
Registrar Abuse Contact Email: [email protected]
Registrar Abuse Contact Phone: +44.1908200022
Domain Status: clientDeleteProhibited https://icann.org/epp#clientDeleteProhibited
Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
Domain Status: clientUpdateProhibited https://icann.org/epp#clientUpdateProhibited
Name Server: NS1.XMODELS-LIVE.CH
Name Server: NS2.XMODELS-LIVE.CH
Name Server: NS3.XMODELS-LIVE.CH
DNSSEC: unsigned
This means two out of 3 should have been added to the ACTIVE/list
pyfunceble --plain -h -m -p 4 -db --database-type mariadb -f 'https://raw.githubusercontent.com/Clefspeare13/pornhosts/master/domains%20to%20check.txt'
Only invalid entries, or invalid tld's are added to invalid/list
OS: Disco
Python Version: 3.7.3
PyFunceble Version: pyfunceble -v pyfunceble 2.45.0.dev (Green Galago: Skitterbug)
Have you seen issue at gitlab?
Hi Sir
If you are interested, I will donate a logo for your project. However, before I start it I need to ask your permission first ;). if I get permission, I need details of the logo like what you want ๐
The app currently sorts the domains in alphabetical order but I was wondering if it would be more readable in a hierarchical order.
example host file sorted with domain hieararchical order.
So if I have a full domain and its subdomains as AAAA.BBBB.CCCC.DDDD.TLD
instead of sorting it purely like that I'd sort by DDDD then CCCC,BBBB,AAAA and finally the TLD. This puts all of the subdomains grouped together in the list.
instead of all of google's entries spread all over a host file they'd get clumped together based on the google.com domain.
0.0.0.0 adservice.google.com
0.0.0.0 googleadapis.l.google.com
0.0.0.0 s0-2mdn-net.l.google.com
0.0.0.0 ssl-google-analytics.l.google.com
0.0.0.0 www-google-analytics.l.google.com
0.0.0.0 pagead2.googleadservices.com
0.0.0.0 partner.googleadservices.com
0.0.0.0 www.googleadservices.com
0.0.0.0 googleadservices.com
0.0.0.0 ssl.google-analytics.com
0.0.0.0 www.google-analytics.com
0.0.0.0 google-analytics.com
0.0.0.0 chart.googleapis.com
0.0.0.0 ad-creatives-public.commondatastorage.googleapis.com
0.0.0.0 imasdk.googleapis.com
0.0.0.0 ade.googlesyndication.com
0.0.0.0 pagead2.googlesyndication.com
0.0.0.0 tpc.googlesyndication.com
0.0.0.0 www.googletagmanager.com
0.0.0.0 www.googletagservices.com
0.0.0.0 redirector.googlevideo.com
Indeed, actually (PyFunceble-dev) the whois server for example example.com
is resolved by the systemwide resolver and not the custom one.
@mitchellkrogza said:
I think we need some logic in PyFunceble that if a domain was inactive once and became active again we mark it as suspicious and keep it on active?
For reference:
Problems:
Possible bugs:
Hardware Test:
I have performed different tests in physical environments with Ubuntu 18.04.3 x64 and large lists (+ 3 M). This is the result::
PC1: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz, RAM 32028 MiB
a. PyFunceble -m -p 200 -f file = system collapses
b. PyFunceble -m -p 150 -f file = freezes after a while running
c. PyFunceble -m -p 100 -f file = freezes after a while running
d. PyFunceble -m -p 50 -f file = test abort. Read 'CPU Usage'
e. PyFunceble -f file = Stable but slower than a bash
PC2: Intel(R) Xeon(TM) CPU ES-2603 v4 @ 1.70 GHz, RAM 15903 MiB
a. PyFunceble -m -p 200 -f file = system collapses
b. PyFunceble -m -p 150 -f file = freezes after a while running
c. PyFunceble -m -p 100 -f file = freezes after a while running
d. PyFunceble -m -p 50 -f file = freezes after a while running
e. PyFunceble -f file = Stable but slower than a bash
CPU usage:
In all tests the program reaches 100% CPU usage with a large lists (+ 3 M).
Speed test: PyFunceble vs bash
Bash:
#!/bin/bash
while read LINE; do
curl -o /dev/null --silent --head --write-out '%{http_code}' "$LINE"
echo " $LINE"
done < source.txt
PyFunceble:
PyFunceble -f source.txt
Results after +1 hour:
PyFunceble: 1364 processed lines (in hosts/ACTIVE hosts/INACTIVE hosts/INVALID)
Bash: 2930 processed lines
Conclusion:
This application is only faster than a simple bash with the "-m -p" flag, but it becomes unstable and freezes or collapses the system. I suggest that it be improved in this regard so that it is usable. regards
Installed latest stable and get this msg:
$ PyFunceble --version
PyFunceble 1.3.0. (Blue Bontebok: Dragonfly)
user@hp2570p:~/repos/PyFunceble-lists/blackjack$ PyFunceble -uf https://raw.githubusercontent.com/BlackJack8/iOSAdblockList/master/Hosts.txt
Traceback (most recent call last):
File "/home/user/.local/bin/PyFunceble", line 11, in <module>
sys.exit(_command_line())
File "/home/user/.local/lib/python3.6/site-packages/PyFunceble/__init__.py", line 1120, in _command_line
link_to_test=ARGS.link,
File "/home/user/.local/lib/python3.6/site-packages/PyFunceble/core.py", line 129, in __init__
self._entry_management()
File "/home/user/.local/lib/python3.6/site-packages/PyFunceble/core.py", line 301, in _entry_management
self.file_url()
File "/home/user/.local/lib/python3.6/site-packages/PyFunceble/core.py", line 1064, in file_url
PyFunceble.repeat(list_to_test[-1]),
IndexError: list index out of range
The best and shortest way is simply having a look at these Travis builds that completly fails after a few test
https://travis-ci.com/Import-External-Sources/google.tld/builds/136245802
https://travis-ci.com/Import-External-Sources/google.tld/builds/136310675
https://travis-ci.com/Import-External-Sources/google.tld/builds/136436604
https://travis-ci.com/Import-External-Sources/google.tld/builds/136549183
Running as usual
Without us having to do this in our repo scripts as below, it would be great if PyFunceble could always prep itself for committing back to the repo by just pulling the environment variables and running this sequence before it runs --cmd
or --cmd-before-end
git remote rm origin
git remote add origin https://${GH_TOKEN}@github.com/${TRAVIS_REPO_SLUG}.git
git config --global user.email "${GIT_EMAIL}"
git config --global user.name "${GIT_NAME}"
git config --global push.default simple
git checkout "${GIT_BRANCH}"
We should not write or produce output if an element which is in the database is still ACTIVE
or INVALID
on retest.
@dnmTX said (anudeepND/blacklist#27 (comment)):
@funilrys i got your point but it makes me wonder what good are they doing in a folder that is design to collect invalid domains that came from the original lists during filtering.In our case here they're no longer present there(in the orig. lists).Maybe a sub folder for collecting a "old,no longer present invalid domains"? So they can pile up there and keep the main folder tight,with only the fresh ones.
As per https://twitter.com/zero_dot1/status/1193291314319765506 (from @ZeroDot1), we can see that when testing the coloration (at the end) is wrong and does not reflect the test result correctly.
.PyFunceble.yaml
Nothing relevant.
Steps to reproduce the behavior:
Run a test with the --syntax
argument.
It should be like green (like this) if VALID
or ACTIVE
> 50% and red (like in the link) for INACTIVE
or INVALID
> 50%
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.