Gatherer is a free alternative to Hunter.io. Users can input a domain in our website and we will search our database for
all the emails that we've found across the web that match that domain. For example, if a user is looking for the professional emails
of GitHub employees, they can input the domain github.com
into our service and emails that end in that domain like [email protected]
or [email protected]
will appear.
Our service can be broken down into a few main components: web scraping and data cleaning, data storage, and data serving.
We've built a custom scraper built in Python using the Scrapy framework. The scraper starts off with a list of URLs and then continues through every link it finds. It uses a custom algorithm to find all emails in each webpage and uses an external email verifying library to ensure the emails are valid before inputting them into the database.
We store all the emails in MongoDB. This database was used because it's easy to quickly iterate on the schema.
We used AWS Lambda functions written in Node.js to serve content from the database to our web client.