Giter Club home page Giter Club logo

Comments (58)

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024 1

Please bare in mind I'm a user just like you, I'm not the author but have been using this extensively since Nissar started building it from some of my crazy ideas.

I am clear that your role is a contribution. It's in README. And thank you very much for your help. But I think it's time for the creator to intervene in this thread, because the HowTo document is quite confusing.

Pleasure and don't stress we will get you up and running for sure. @funilrys working mon-fri and his time is limited so I help where I can he will respond once he's online which he has not been all day so I know hes hammering away at some code somewhere

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024 1

Data is definitely real and there will be no duplicates. Go and look yourself at the contents of output/domains/ACTIVE/list

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

Lists should be plain text domains one per line
Just pass the f mydomainslistt.txt to PyFunceble

Simple example PyFunceble -ex --dns 1.1.1.1 1.0.0.1 --plain -f ${input}

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

Here's just one of my many repos using it, this script uses all the TravisCI functionality which you won't need in your local environment https://github.com/mitchellkrogza/Badd-Boyz-Hosts/blob/master/dev-tools/DataTesting.sh

from pyfunceble.

maravento avatar maravento commented on May 25, 2024

What format should the list have, and the parameter to "write" in an output file?
Captura de pantalla -2019-07-11 11-37-36

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

domain1.com
domains2.com
www.whatever.com

Simple List with just one per line

Output folder is created in whatever folder you run PyFunceble

See output folder here that gets created by PyFunceble

https://github.com/mitchellkrogza/Phishing.Database/tree/master/phishing-domains

You can specify the location of the output by using export

from pyfunceble.

maravento avatar maravento commented on May 25, 2024

Look at the image. That's how they are. However it says that google.com INVALID (PyFunceble -uf test)
Captura de pantalla -2019-07-11 11-48-51

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

Must be just -f parameter not -uf
Try this simple PyFunceble -d google.com

Then try PyFunceble -f domains.txt

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

Not sure if you are seeing the docs correct version but it does state the format
Screenshot_20190711_185339

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

The ones I use in all my projects

ACTIVE = my list of active domains
INACTIVE = my list of dead domains
INVALID = the domain syntax is somehow invalid

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

So INACTIVE is what you want for your dead domains lists

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

Also if you want to force re-testing of everything every time you run PyFunceble add the -db flag to disable it re-testing from it's own database. But once you learn how smart the database is you should just leave it do its own thing.

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

See my usage of ACTIVE INACTIVE and INVALID here https://github.com/mitchellkrogza/Phishing.Database

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

The INVALID lists I use every now and again to clean up my input lists of any formatting errors but you will see on Phishing Database the numbers of INVALID hardly feature anymore for me but they were crucial in the beginning to get all the cleaning functions of my input sources correct

from pyfunceble.

maravento avatar maravento commented on May 25, 2024

Not sure if you are seeing the docs correct version but it does state the format
Screenshot_20190711_185339

the confusion was in the description of the help file:

-f FILE, --file FILE Read the given file and test all domains inside it. If
a URL is given we download and test the content of the
given URL.

-uf URL_FILE, --url-file URL_FILE
Read and test the list of URL of the given file. If a
URL is given we download and test the list (of URL) of
the given URL content.

But it is already clear. It is "-f" only

from pyfunceble.

funilrys avatar funilrys commented on May 25, 2024

Hi @maravento, and thanks for your feedback.

Sorry that it wasn't clear. I'll do my best to improve the documentation for the future.

To recapitulate.


Test of file with URLs

If you want to test a list of URLs so in this format:

https://example.org/
http://example.org
https://example.org/hello_world

You can parse the file path with -uf.


Test of a file in plain text or host file format

if you want to test a list of domains, IPs which are in plain text or hosts format so in this format:

127.0.0.1 example.org
beispiel.org
0.0.0.0 example.com

You can parse the file path with -f.


Confusions (to fix in docs)

Sorry for the confusion I created.

Indeed both -uf and -f can take a raw URL.

For example, let's say I want to test this file, I can give it to -f and PyFunceble will download and test its content. That's what I tried to explain in the doc.


What is the difference between ACTIVE vs VALID and INACTIVE vs INVALID?

Documentation: https://pyfunceble.readthedocs.io/en/latest/columns/status.html#status

Because there are many possibilities, I created the structure of this project into one file called dir_structure_production.json which is later downloaded and found in your local filesystem as dir_structure.json.

What I'm doing is I generate the output directory before even starting the test.
So to explain,

  • ACTIVE reference to the output of an availability test.
  • VALID reference to the output of a syntax test.
  • INVALID reference to the output of an availability test or syntax test.
  • INACTIVE reference to the output of an availability test.

Difference between availability and syntax test

Availability test

The availability test consists of finding the availability of a domain, IP or URL.

Domain and IP

The availability of domain and IP are found based on the result of WHOIS records, NSLOOKUP and HTTP status code.

URL

The availability of a URL is found based on the HTTP status code. cf: documentation

Syntax test

The syntax test is just a syntax test.

As you understand Python, you can review our syntax test/check logic here.


Auto continue

Your question:

Suppose I start processing a list of 5 million URLs. And there is a power cut or internet drops. Which parameter allows start / continue from the last point (where did the cut occur)?

Documentation: https://pyfunceble.readthedocs.io/en/latest/components/auto-continue.html

As the auto continue system is activated by default (unless you disable it into your personal .PyFunceble.yaml), you have nothing to do. The system will auto continue itself.

How does it work?

Documentation: https://pyfunceble.readthedocs.io/en/latest/components/auto-continue.html#how-does-it-work

Said, in other words, everything happens into output/continue.json or if you use the MariaDB/MySQL database type into the continue table.

The idea is to log everything which has been tested and on next run (after the power cut in your example) remove the tested element from the original list to test.

Said in python we do the equivalent of the following on a bigger scale.

to_test = [1,2,3,4,5]
already_tested = [2,3,4]

to_test = list(set(to_test) - set(already_tested))

Thanks again for your feedback. I hope that I clarified things here. If not, please let me know.

Cheers,
Nissar

from pyfunceble.

maravento avatar maravento commented on May 25, 2024

perfect. well explained. Thanks a lot.

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

@maravento awesome now let me make that even better for you as I currently process 60000 domains in 4 hours.

Now welcome to the absolutely brilliant Multiprocessing of PyFunceble

Now add the flags -m -p 100 to your PyFunceble command line now and see the magic. Max processes we have discovered is about 200-250 so experiment with what works for you.

from pyfunceble.

maravento avatar maravento commented on May 25, 2024

I'm running the command like this:
PyFunceble -qf list
to increase the processing ratio, your proposal is to run it like this?:
PyFunceble -q -f -m -p 100 list

-p PROCESSES, --processes PROCESSES
Set the number of simultaneous processes to use while using multiple processes. configured value: 25
-m, --multiprocess
Switch the value of the usage of multiple process. Configured value: False

Why there is no value for flag "-m"?

What is the maximum level of processing allowed and what is the consumption of resources per process?

PD: I am using a proliant M110 G9 HP test server 24/7, 8 GB RAM free and 10 Mb bandwidth

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

No PyFunceble -q -m -p 100 -f list this will run 100 processes at the same time.

from pyfunceble.

maravento avatar maravento commented on May 25, 2024

Then, according to my resources described above, how should I run the command for maximum performance and speed?

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

Try 100 processes if it's too much drop it to 50 if it's too little up it to Max 250 . -m is just the switch to turn multi on the you specify how many processes with -p xx with that CPU you should comfortably get away with running 150 processes ... Just ty the exact command line I gave and let us know

from pyfunceble.

maravento avatar maravento commented on May 25, 2024

I think that with -p 100 is fine (so as not to recharge the CPU I divided the list to run it on 2 servers). Once finished, I will publish the results. Thanks for all the help.

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

Awesome and welcome. You soon won't be able to do without PyFunceble like me.
Tag me when you release your results. -p 100 is safe for sure and you probably won't notice its even running. Have fun

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

@maravento if you ever kill iPyFunceble with ctrl+c be sure to run PyFunceble --clean first before you once again run your normal full command line. This will clean the /output/ folder for you.

from pyfunceble.

maravento avatar maravento commented on May 25, 2024

But that command eliminates everything and I lose what I have done until the moment of the ctrl + c.

--clean Clean all files under the output directory

PyFunceble freezes a lot and i have to stop (ctrl + c) and restart. It does not matter if I use -p xxx or not (and it's not the hardware)

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

Weird I run it hourly on an Ubuntu server and daily on Arch with very big lists and processes of 200 and don't have any freezing. What distro you using? What Python version?
I do run mine inside Conda environments which you may want to try.

Not sure where you said you were getting duplicates in your results, you deleted that message but I got it via email. All I was saying is if you interrupted PyFunceble your contents of /output "might" need cleaning once you restart your tests.

from pyfunceble.

maravento avatar maravento commented on May 25, 2024

Ubuntu Mate 18.04.2 x64. HP proliant ml110, 16 RAM HD 4TB. I have tried with the following options, with the same result of freezing every so often (I have not been able to establish the amount of time until freezing):
PyFunceble -m -p 200 -f list PyFunceble -m -p 150 -f list PyFunceble -m -p 100 -f list PyFunceble -f list

PD: Only the program is frozen. Not the rest of the server

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

Misread your post. I run it on Ubuntu 16.04.2 and 18.04.2 but I run it inside Conda Environments as Python on Ubuntu is unreliable and full of troubles.

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

You should also be aware, when it reaches the end of processing your list (in multiprocessing mode) it spends some time merging all the processes data to produce the output. That merging can take some time and may appear as if its frozen.

from pyfunceble.

maravento avatar maravento commented on May 25, 2024

In my case, it freezes at any point. every 1 or 2 hours. I have tried on another computer with more hardware resources and the same result. I could not determine the cause.
Please tell me what is your installation method that you use and its dependencies and recommended operating system, to create a VM Virtualbox with those specifications. Thank you

PD: or if there is a bash script to automate the installation with its dependencies much better

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

@maravento see #39
I run various instances of PyFunceble all automated through bash and cron using Conda environments. Good luck.

from pyfunceble.

maravento avatar maravento commented on May 25, 2024

So if this program is limited to Conda environments, why is this limitation not described in the README requirements? (I thought it could be run on any linux system)

from pyfunceble.

maravento avatar maravento commented on May 25, 2024

The HowTo does not show how to uninstall PyFunceble:

Install: pip3 install -r requirements.txt && pip3 install PyFunceble

~$ pip3 uninstall PyFunceble

Exception:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/usr/lib/python3/dist-packages/pip/commands/uninstall.py", line 76, in run
requirement_set.uninstall(auto_confirm=options.yes)
File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 346, in uninstall
req.uninstall(auto_confirm=auto_confirm)
File "/usr/lib/python3/dist-packages/pip/req/req_install.py", line 734, in uninstall
FakeFile(dist.get_metadata_lines('entry_points.txt'))
File "/usr/lib/python3.6/configparser.py", line 763, in readfp
self.read_file(fp, source=filename)
File "/usr/lib/python3.6/configparser.py", line 718, in read_file
self._read(f, source)
File "/usr/lib/python3.6/configparser.py", line 1092, in _read
fpname, lineno)
configparser.DuplicateOptionError: While reading from '<???>' [line 3]: option 'pyfunceble' in section 'console_scripts' already exists

~$ pip uninstall PyFunceble
Cannot uninstall requirement PyFunceble, not installed

~$ sudo pip3 uninstall PyFunceble
The directory '/home/user/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Exception:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/usr/lib/python3/dist-packages/pip/commands/uninstall.py", line 76, in run
requirement_set.uninstall(auto_confirm=options.yes)
File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 346, in uninstall
req.uninstall(auto_confirm=auto_confirm)
File "/usr/lib/python3/dist-packages/pip/req/req_install.py", line 734, in uninstall
FakeFile(dist.get_metadata_lines('entry_points.txt'))
File "/usr/lib/python3.6/configparser.py", line 763, in readfp
self.read_file(fp, source=filename)
File "/usr/lib/python3.6/configparser.py", line 718, in read_file
self._read(f, source)
File "/usr/lib/python3.6/configparser.py", line 1092, in _read
fpname, lineno)
configparser.DuplicateOptionError: While reading from '<???>' [line 3]: option 'pyfunceble' in section 'console_scripts' already exists

From root:
pip3 uninstall PyFunceble
Cannot uninstall requirement PyFunceble, not installed
pip uninstall PyFunceble
Cannot uninstall requirement PyFunceble, not installed

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

It's not limited to Conda environments at all ... My recommended solution to running Python (not PyFunceble) is to use Conda environments.

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

You don't need to uninstall PyFunceble you just run it from inside the environment follow my guide it does work I use it on several production machines. Must make it clear my guide is what I use because Ubuntu sucks with anything to do with Python but I have no issues whatsoever using Conda

from pyfunceble.

maravento avatar maravento commented on May 25, 2024

Ok, But anyway I want to know how to uninstall via pip3, to change the installation method from github

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

pip3 uninstall PyFunceble or pip uninstall PyFunceble should do it but it seems that gave you an error @funilrys can maybe assist you there

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

@maravento try pip3 uninstall --user PyFunceble let's see if that helps

from pyfunceble.

maravento avatar maravento commented on May 25, 2024

@maravento try pip3 uninstall --user PyFunceble let's see if that helps

Usage:
pip uninstall [options] < package > ...
pip uninstall [options] -r < requirements file > ...
no such option: --user

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

Did you install it with pip or pip3 🤔

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

My bad sorry uninstall has no --user option indeed. Helping you off my phone as best as I can. Should be just pip uninstall package or pip3 uninstall package 🤔 @funilrys will have to assist further. For now why not just leave it as is and fire up Conda and run it there ? Won't matter if you have it installed on your system as you will be running a new instance from inside the Conda environment

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

Just going back a few posts from earlier, are you doing all this in a VM on Virtual box or did you want a guide to creating a fool proof VM environment for running PyFunceble ?

from pyfunceble.

maravento avatar maravento commented on May 25, 2024

Just going back a few posts from earlier, are you doing all this in a VM on Virtual box or did you want a guide to creating a fool proof VM environment for running PyFunceble ?

On a dedicated physical server (description is HERE)

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

Ok got that just was referencing your request to doing it in a VM ... I could build one tomorrow which will work and may benefit others too. Still I cannot explain why you are experiencing freezing on your hardware we run PyFunceble in dcoker containers with Multiprocessing and don't get freezes or anything @funilrys will have to assist you to trace that.

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

Please bare in mind I'm a user just like you, I'm not the author but have been using this extensively since Nissar started building it from some of my crazy ideas.

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

Please bare in mind I'm a user just like you, I'm not the author but have been using this extensively since Nissar started building it from some of my crazy ideas.

I am clear that your role is a contribution. It's in README. And thank you very much for your help. But I think it's time for the creator to intervene in this thread, because the HowTo document is quite confusing.

I think you need to add the -nl parameter to your existing command line

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

No logs (nl) defaults to false but adding -nl toggles it to true

from pyfunceble.

funilrys avatar funilrys commented on May 25, 2024

Hello there,

Sorry for being so silent. I here between work, next version of this tool, huge private project and family :)

So let's go!

Multiprocessing

Why there is no value for the flag "-m"?

The -m flag is the one that activates the Multiprocessing subsystem. It's just a switch.

What is the maximum level of processing allowed and what is the consumption of resources per process?

I can't really answer that as there are too many variables. But generally in modern x64 machines, 100-150 is sufficient if you have other business running.

Those are some of the variables that directly comes in mind and are obvious:

  • Internet speed/bandwidth
  • Memory usage/impact
  • Drive sanity as we do a lot of I/O
  • DNS Server
  • Whois Server
  • ...

It really depends on the machine most of the time.

Reduce memory impact (and freezes ?)

For your big amount of data (I didn't think you will test 5 Millions of entries), I'll recommend setting us a MySQL/MariaDB database to
handle the big amount of data that have to reread/reconstructed of each loop.

It's actually way better as we don't have to keep the following dataset/subsystem in memory:

  • Auto continue
  • InactiveDB
  • Mining
  • WhoisDB

The (short) documentation about the database can be found here: https://pyfunceble.readthedocs.io/en/latest/components/databases.html

I should mention that more deeply in the documentation. Thanks for mentioning.

Please read more about it in the documentation:

Freeze

PyFunceble freezes a lot and I have to stop (ctrl + c) and restart. It does not matter if I use -p xxx or not (and it's not the hardware)

I'm not aware of any freeze. But I hope that using the MariaDB/MySQL database type can solve that.

Uninstallation

The HowTo does not show how to uninstall PyFunceble:

Well, it depends on how you install it but I never thought it was necessary. Will be added to the documentation.

Arch Linux

Arch user can simply do

$ yourFavoriteAurHelper -Rns pyfunceble

PyPi

PyPi installed package can be uninstalled like follow

$ pip3 uninstall pyfunceble

I don't understand why you get the following.

configparser.DuplicateOptionError: While reading from '<???>' [line 3]: option 'pyfunceble' in section 'console_scripts' already exists

It might be because if your version of pip/pip3. Here is mine under my virtualenv but it's actually the same from outside the env under Arch:

$ pip --version
pip 19.1.1 from /home/funilrys/repositories/GitHub/source/PyFunceble/venv/lib/python3.7/site-packages/pip (python 3.7)

Can you try to pip3 install pip --upgrade and try to uninstall it again? It might be a pip issue not a PyFunceble issue at all as it's working on my side ...

Otherwise, you can delete the output of the following commands.

$ pip show pyfunceble | grep Location
$ which pyfunceble
$ which PyFunceble

Virtualenv/Conda

You can start from the beginning by setting up a virtualenv.

Advantages

You don't need to rely on the system version of pip or even python

(Mini)Conda

@mitchellkrogza already explained it there and I have nothing to add except Mitch @mitchellkrogza please make a PR from it !! 😸

Advantages of conda

Conda let you install and use a python version of your choice and work from there! While virtualenv will only use the one installed by the system.

Virtualenv

Here is my routine when I'm at work using Debian 9 (from the head as I'm out of office).

$ apt-get install python3-virtualenv
# Create the virtualenv and install it into the venv directory
$ virtualenv -p python3 venv
# Activate the environment (installed)
$ . venv/bin/activate
$ pip3 --version
# update pip
$ pip3 install pip --upgrade # Will be install inside the venv directory.
# Install and play with what we need
$ pip3 install pyfunceble # Will be install inside the venv directory.
# play with pyfunceble and other
$ pip3 --version
$ PyFunceble --version
$ PyFunceble -d microsoft_google.com
# When done and you want to go back to your system.
# Deactivate the virtual env.
$ deactivate
# Now you are back into your system
# proof PyFunceble is installed systemwide.
$ pip shoe pyfunceble | grep Location

Logs

Does this program generate logs to verify possible cause?

Actually not but I have a private branch with the work around it. It was never my priority but it will be for 2.5+.

The only logs generated are the one we produce after each test so you can keep a track of what was the output of what domain for example.

Warnings

--clean

if you ever kill iPyFunceble with ctrl+c be sure to run PyFunceble --clean first before you once again run your normal full command line.

@mitchellkrogza can do that because he mostly uses the MySQL/MariaDB database type.

MariaDB/MySQL over 2 server

As you previously stated:

so as not to recharge the CPU I divided the list to run it on 2 servers

if you use the MariaDB/MySQL database type be sure to have 2 different filenames. That way PyFunceble can handle data from both.

Side note for me (todolist)

  • Reduce confusion around -uf, -f, -m and others.
  • Add more warning about the multiprocessing usage and big inputs.
  • Add uninstallation method.
  • Add installation method with conda and virtualenv.
  • Create a docker image?

Thanks again for your feedback. I hope that I clarified things here. If not, please let me know.

Cheers,
Nissar

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

@maravento I highly recommend the Mariadb solution. If you're not ok with it right now you could just split your large file into parts of maybe 500000 each with split -l 500000 filename and test each one separately, not ideal so SQL is the way, the Mariadb or MySQL setup is rather simple to get up and running.

from pyfunceble.

maravento avatar maravento commented on May 25, 2024

@mitchellkrogza Hi. A query: For example, my file has 5 M lines, and host-active has 1.5 M and host-inactive has 1.3 (host-invalid has few, so it doesn't apply for the example).
Does the above mean that the program has processed 2.8 M of lines or this data is not real because the output has duplicates? (the input file was debugged from duplicates before running the program) THX

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

@maravento it's hard to say why you got such results. I am currently testing your entire list in 5 x parts of 1M each all at the same time using Mini(Conda) environments running in parallel with each environment / instance of PyFunceble using multiprocessing and 50 processes each all using the mariadb database system.

I estimate it will be finished by tomorrow morning and then I can push my results to my fork of your repo.

This is the only way I can tell is to see what my results show versus yours.

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

You can look at any of the files while they are being created or just tail them and you will see

from pyfunceble.

maravento avatar maravento commented on May 25, 2024

@mitchellkrogza Hi. the same problem. At this time the program has processed the following data:
Original List: 5.9 M
ACTIVE/hosts = 3.6 M
INACTIVE/host = 3 M
INVALID/host = 42.000
Total = + 6.6 M
... And it's not over (still running)
I have detected duplicate lines in ACTIVE/hosts. The original file has not duplicate lines. I think "auto continue system" is not working as it should (it may not work when the program is interrupted with ctrl + c and restarted).
Then I did what @funilrys recommended above:

Warnings
--clean
if you ever kill iPyFunceble with ctrl+c be sure to run PyFunceble --clean first before you once again run your normal full command line.
@mitchellkrogza can do that because he mostly uses the MySQL/MariaDB database type.

And I lost all the work, and it started from the beginning again

from pyfunceble.

mitchellkrogza avatar mitchellkrogza commented on May 25, 2024

--clean will clean your output folders. Be careful using it I should have been more clear on that. Can't explain the duplications I've never seen any dupes before but I will have to check some of my big lists to see if active has any dupes.

For now you can just run a final sort on the active and inactive files when the test is finished to remove any dupes until @funilrys can look into what might cause that.

Just run sort -u list.txt -o list.txt on each of the output files and then do a recount to see your totals

from pyfunceble.

maravento avatar maravento commented on May 25, 2024

That's why I reopened the ticket. I just lost 3 weeks of work by following the instructions of @funilrys
In short, the program freezes and i have to stop it with ctrl + c. But @funilrys says: "if you ever kill iPyFunceble with ctrl + c be sure to run PyFunceble --clean first before you once again run your normal full command line". Then this causes the job to be lost.
Conclusion: This program is very unstable and the instructions of HowTO and @funilrys are imprecise. So, unfortunately I have to temporarily remove it from my blackweb project, until these bugs are fixed

I have summarized the proposals for improvements and bug fixes in issue 41

from pyfunceble.

funilrys avatar funilrys commented on May 25, 2024

@maravento If you have a problem with the output and multiprocessing then use the API and manage your file and your multiprocessing yourself.

I do it for @Ultimate-Hosts-Blacklist. You can do it and it is as simple as the following. Again, it's documented.

from PyFunceble import test as PyFunceble

print(PyFunceble("google.com", complete=True))

I have no time actually to go deep into reproducing what you do (@mitchellkrogza might help with that) but in my plan there the full database (so MariaDB/MySQL) processing so that files are generated when it's really done.
But please be patient. I have a life, family, work, study and other things that have to come before this whole issue in my workflow.

What database type do you use ? If it's JSON then no, then it's normal that's one of the reasons I introduced the database types. It's not in the documentation yet but I talked about it in the Reduce memory impact (and freezes ?) section ...

The auto continue is guaranteed - if you use the multiprocessing option - only if you use the MySQL/MariaDB database types. That's what @mitchellkrogza implicitly said and that's what I confirmed:

@mitchellkrogza can do that because he mostly uses the MySQL/MariaDB database type.


I agree a lot with the state of the documentation. And that is in my workflow. But for the rest, you're using PyFunceble in a way we never used it before. Indeed, I tested it with 1.2 million records but never with so many records. That's what we need to go further into the database types implementation because JSON is not good for multiprocessing and memory.

Cheers,
Nissar

P.S.: Please keep this open, it does not make sense to close it if the documentation and things you mentioned here are not fixed/handled.

from pyfunceble.

maravento avatar maravento commented on May 25, 2024

It is not necessary to keep it open. I think everything is clear. And I summarized my experiences and proposal for improvement in issue 41
You have family and other priorities and me too, so consider the proposal and when you can make the corrections you will be welcome. In general, the program is good, you just have to fix some things.
regards

from pyfunceble.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.