Giter Club home page Giter Club logo

extract-emails's Introduction

Extract Emails

Image

PyPI version

Extract emails and linkedins profiles from a given website

Support the project with BTC: bc1q0cxl5j3se0ufhr96h8x0zs8nz4t7h6krrxkd6l

Documentation

Requirements

  • Python >= 3.9

Installation

pip install extract_emails[all]
# or
pip install extract_emails[requests]
# or
pip install extract_emails[selenium]

Simple Usage

As library

from pathlib import Path

from extract_emails import DefaultFilterAndEmailFactory as Factory
from extract_emails import DefaultWorker
from extract_emails.browsers.requests_browser import RequestsBrowser as Browser
from extract_emails.data_savers import CsvSaver


websites = [
    "website1.com",
    "website2.com",
]

browser = Browser()
data_saver = CsvSaver(save_mode="a", output_path=Path("output.csv"))

for website in websites:
    factory = Factory(
        website_url=website, browser=browser, depth=5, max_links_from_page=1
    )
    worker = DefaultWorker(factory)
    data = worker.get_data()
    data_saver.save(data)

As CLI tool

$ extract-emails --help

$ extract-emails --url https://en.wikipedia.org/wiki/Email -of output.csv -d 1
$ cat output.csv
email,page,website
[email protected],https://en.wikipedia.org/wiki/Email,https://en.wikipedia.org/wiki/Email

extract-emails's People

Contributors

chiaminchuang avatar dmitriiweb avatar vikramdurai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

extract-emails's Issues

Advanced Usage

Need to add descriptions and examples of how to create and use custom elements (filters, browser, factories, etc.)

longer time without response

hi @dmitriiweb please can you help identify what's wrong i tired running your example also from the doc and i've pip install the latest extract_emails v5.0.2 but i'm not getting any response or output seems there's an issues somewhere.
Did you run the example from your end also ?

ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

Hi,

Again, thanks for your job. It am experimenting it before to scale. It is great but I get issue when it doesn't find emails obviously.

I run this simple script for testing:

from extract_emails import ExtractEmails


em = ExtractEmails("http://www.formationgrowthhacking.com/", depth=None, print_log=False, ssl_verify=False, user_agent=None, request_delay=0.0)
emails = em.emails

print(emails)

I get these errors:

Traceback (most recent call last):
File "C:/Users/Nino/PycharmProjects/EmailVerif/github_extract-email/extract_emails/myextrator.py", line 4, in
em = ExtractEmails("http://www.formationgrowthhacking.com/", depth=None, print_log=False, ssl_verify=False, user_agent=None, request_delay=0.0)
File "C:\Users\Nino\PycharmProjects\EmailVerif\github_extract-email\extract_emails\extract_emails.py", line 30, in init
self.extract_emails(url)
File "C:\Users\Nino\PycharmProjects\EmailVerif\github_extract-email\extract_emails\extract_emails.py", line 43, in extract_emails
self.extract_emails(new_url)
File "C:\Users\Nino\PycharmProjects\EmailVerif\github_extract-email\extract_emails\extract_emails.py", line 43, in extract_emails
self.extract_emails(new_url)
File "C:\Users\Nino\PycharmProjects\EmailVerif\github_extract-email\extract_emails\extract_emails.py", line 43, in extract_emails
self.extract_emails(new_url)
[Previous line repeated 30 more times]
File "C:\Users\Nino\PycharmProjects\EmailVerif\github_extract-email\extract_emails\extract_emails.py", line 36, in extract_emails
self.get_all_links(r.text)
File "C:\Users\Nino\PycharmProjects\EmailVerif\github_extract-email\extract_emails\extract_emails.py", line 59, in get_all_links
tree = html.fromstring(page)
File "C:\Users\Nino\PycharmProjects\EmailVerif\venv\lib\site-packages\lxml\html_init_.py", line 875, in fromstring
doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
File "C:\Users\Nino\PycharmProjects\EmailVerif\venv\lib\site-packages\lxml\html_init_.py", line 761, in document_fromstring
value = etree.fromstring(html, parser, **kw)
File "src\lxml\etree.pyx", line 3234, in lxml.etree.fromstring
File "src\lxml\parser.pxi", line 1871, in lxml.etree._parseMemoryDocument
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

Could you please help me to fix this issue?

Setup.py install fail

Hey, hope you find this as I'm really using your tool for an important project of mine.

I've been trying many ways to run "pip install extract_emails" and I even tried running setup.py but they all give me this same error:

image

image

I tried so far with Python 3.7, then after 3.6 as I realized that's what the requirements showed, but still same results. I've tried many different solutions but none have worked so far, do you think you could help me out with this?

FileNotFoundError: [Errno 2] No such file or directory:

Hi,

Thanks for your work.

I installed and followed your instruction. I get these errors:

Traceback (most recent call last):
  File "C:/Users/Nino/PycharmProjects/EmailVerif/emailverif.py", line 8, in <module>
    em = ExtractEmails(url, depth=None, print_log=False, ssl_verify=True, user_agent=None, request_delay=0.0)
  File "C:\Users\Nino\PycharmProjects\EmailVerif\venv\lib\site-packages\extract_emails\extract_emails.py", line 31, in __init__
    self.extract_emails(url)
  File "C:\Users\Nino\PycharmProjects\EmailVerif\venv\lib\site-packages\extract_emails\extract_emails.py", line 38, in extract_emails
    self.get_emails(r.text)
  File "C:\Users\Nino\PycharmProjects\EmailVerif\venv\lib\site-packages\extract_emails\extract_emails.py", line 53, in get_emails
    domains = self.get_domains()
  File "C:\Users\Nino\PycharmProjects\EmailVerif\venv\lib\site-packages\extract_emails\extract_emails.py", line 61, in get_domains
    with open(DOMAINS_FAIL, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Nino\\PycharmProjects\\EmailVerif\\venv\\lib\\site-packages\\extract_emails\\top_level_domains.pkl'

Process finished with exit code 1

Could you provide the missing file please?

Kind regards

Extract social media accounts

Hi!

Thanks for this awesome project. I've already used it for the extraction of emails, but can this also be used for the extraction of social media (LinkedIn) accounts?

Thanks!

Error when running code

Hello,
after pip install extract_emails and trying to run your sample code i got the error below. What could be the problem. Thanks in advance and between pip install extract_emails[all] is not working, why?

Traceback (most recent call last):
File "/Users/user/My Drive/emailextractfromURL/olu.py", line 3, in
from extract_emails import DefaultFilterAndEmailFactory as Factory
File "/Users/user/My Drive/emailextractfromURL/venv/lib/python3.9/site-packages/extract_emails/init.py", line 2, in
from .factories import (
File "/Users/user/My Drive/emailextractfromURL/venv/lib/python3.9/site-packages/extract_emails/factories/init.py", line 1, in
from .base_factory import BaseFactory
File "/Users/user/My Drive/emailextractfromURL/venv/lib/python3.9/site-packages/extract_emails/factories/base_factory.py", line 6, in
from extract_emails.link_filters import LinkFilterBase
File "/Users/user/My Drive/emailextractfromURL/venv/lib/python3.9/site-packages/extract_emails/link_filters/init.py", line 1, in
from .contact_link_filter import ContactInfoLinkFilter
File "/Users/user/My Drive/emailextractfromURL/venv/lib/python3.9/site-packages/extract_emails/link_filters/contact_link_filter.py", line 7, in
class ContactInfoLinkFilter(LinkFilterBase):
File "/Users/user/My Drive/emailextractfromURL/venv/lib/python3.9/site-packages/extract_emails/link_filters/contact_link_filter.py", line 53, in ContactInfoLinkFilter
contruct_candidates: list[str] | None = None,
TypeError: unsupported operand type(s) for |: 'types.GenericAlias' and 'NoneType'

Depth scan

How does the depth scan parameter works? I assumed that a parameter value of 1 will mean search www.example.com and 2 will mean www.example.com/contactus. But it doesn't seem to work, even the log says URLs = 1. Can you please help? Many thanks

Quick Start

Need to add more examples of usage to Quick Start part

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.