digininja / cewl Goto Github PK

CeWL is a Custom Word List Generator

Ruby 99.12% Dockerfile 0.88%

cewl's Introduction

Hi there 👋

If I ever get time I'll write something meaningful here. For now, I'm either hacking, coding or off doing something outdoors to get away from all the technology.

If you like my work, you can Buy me a smoothie*.

* I don't drink coffee

cewl's People

Contributors

Stargazers

Watchers

Forkers

pentestbox g0tmi1k johnjohnsp1 reedhein hughker malikvivek trypt1991 spsbb n0clues techlord-rce olivierh59500 lucabongiorni wflk vdbaan houcy reddmist d-demirci selfevo dynamicdesignz h0r57 skapunker kkirsche valerian38 5up3rc weeshlow brianelugan99 subzeroking stahnirockt ykankaya ro9ueadmin marcostolosa chinnidiwakar qbornet raymondseger zard777 jeffmcjunkin bucky67gto michiboo1 f0r34chb3t4 jp3l layamba25 m00zh33 tim1512 hubrisnxs greenmind-sec ncryptedwifkali peytondodd mohamedfci12 raimundojimenez zeerg i0ner0us aerobit johndoex1 wisdark ulrich29 trietptm-on-coding-algorithms timb-machine-mirrors qutorial modulexcite franka11en xy-sec cloakedsec blackcat-pentest goeryc dawnadvent 0xb4d1dea derco0n sudoaza ma5onic guiltengine lee-creator-tech evcuq4hggjd74lhz winminoo385 mazzzurik j1mm3br0dy 0x02null syam340 shakha421 ppzhoucl mmg1 nocflame an0nym0u5101 papadope-zz helcaraxeals cutff kinghavoc360 hackingbharat batamhacker fdlucifer 13957166977 harlanogilvy zeecka area71 jbernardoviana jacob2020baklas 2morales akunwin elamaran619 scriptkkiddie gprime31

cewl's Issues

Progress Indication

Pointing this tool at bigger sites/wikis can take quite a long time, and it can be difficult to be sure it's actually doing something until it finishes or I force-kill it.

It would be very useful to have some kind of progress indication, whether that would be something simple like displaying the current page being processed, a count of words found, etc. This would make it much easier to be sure it's working and to gauge how far along it is.

Perhaps this issue is moreso about it being slow with the way I'm running it, but pointing it at something like a wikipedia article with default depth took over a day and I ended up killing it. (I later realized a depth of 2 was likely too high, but the point remains)

Unable to connect to the site

Hello Robin Wood :)
Unfortunately I can't CeWL my site: :(
Do you have any ideas?

cewl http://a-wareness.fr --debug -v
CeWL 5.4.8 (Inclusion) Robin Wood ([email protected]) (https://digi.ninja/)
Starting at http://a-wareness.fr
Pushing {nil=>"http://a-wareness.fr"}
Checking page http://a-wareness.fr
Comparing http://a-wareness.fr with http://a-wareness.fr

Unable to connect to the site (http://a-wareness.fr:80/)

The following error may help:
execution expired
/usr/lib/ruby/2.7.0/net/http.rb:960:in initialize' /usr/lib/ruby/2.7.0/net/http.rb:960:in open'
/usr/lib/ruby/2.7.0/net/http.rb:960:in block in connect' /usr/lib/ruby/2.7.0/timeout.rb:105:in timeout'
/usr/lib/ruby/2.7.0/net/http.rb:958:in connect' /usr/lib/ruby/2.7.0/net/http.rb:943:in do_start'
/usr/lib/ruby/2.7.0/net/http.rb:932:in start' /usr/lib/ruby/2.7.0/net/http.rb:1483:in request'
/usr/bin/cewl:246:in get_page' /usr/bin/cewl:176:in block (2 levels) in start!'
/usr/bin/cewl:174:in each' /usr/bin/cewl:174:in block in start!'
/usr/bin/cewl:162:in each' /usr/bin/cewl:162:in start!'
/usr/bin/cewl:114:in start_at' /usr/bin/cewl:769:in block in

'
/usr/bin/cewl:759:in catch' /usr/bin/cewl:759:in '

Caller
/usr/bin/cewl:198:in get_page' /usr/bin/cewl:176:in block (2 levels) in start!'
/usr/bin/cewl:174:in each' /usr/bin/cewl:174:in block in start!'
/usr/bin/cewl:162:in each' /usr/bin/cewl:162:in start!'
/usr/bin/cewl:114:in start_at' /usr/bin/cewl:769:in block in

'
/usr/bin/cewl:759:in catch' /usr/bin/cewl:759:in '

End of main loop
Words found
End of wordlist loop
End of email loop
End of meta loop

Broken dependency: spider

It seems that on the latest version of Kali, CeWL is broken because of a depreciated function? in the spider library:

./cewl.rb -d 2 wikipedia.org
CeWL 5.4.2 (Break Out) Robin Wood ([email protected]) (https://digi.ninja/)
/usr/lib/ruby/vendor_ruby/spider/spider_instance.rb:125: warning: constant ::Fixnum is deprecated

Dependencies were installed today with gem install.

gem list --local | grep spider
spider (0.32, 0.5.1)

Anything I'm missing?

Not working for javascript rendered HTML pages

The target app uses react js and the html pages are dynamically generated from the front end. CeWl fails to collect words in this case. Any solutions?

Doesnt find phone numbers

Hi, it seems that cewl will not pull the phone number (that also happens to be the password) off of the site that I am attempting to crawl. I am sure that I have it set at a large enough depth. It does pull a wordlist that it normally would though.

any suggestions?

(update to clarify issue)

undefined method `chr' for nil:NilClass

Visiting: https://tvtropes.org:443/pmwiki/dmca.php referred from https://tvtropes.org/pmwiki/pmwiki.php/ComicStrip/ComicStrip, got response code 200
Attribute text found:
   TV Tropes   Quantcast Display Crucial Browsing Main/ActionAdventureTropes Main/ComedyTropes Main/CommercialsTropes Main/CrimeAndPunishmentTropes Main/DramaTropes Main/HorrorTropes Main/LoveTropes Main/NewsTropes Main/ProfessionalWrestling Main/SpeculativeFictionTropes Main/SportsStoryTropes Main/WarTropes Main/Media Main/AnimationTropes Main/Anime Main/ComicBookTropes FanFic/FanFics Main/Film Main/GameTropes Main/Literature Main/MusicAndSoundEffects Main/NewMediaTropes Main/PrintMediaTropes Main/Radio Main/SequentialArt Main/TabletopGames Main/Television Main/Theater Main/VideogameTropes Main/Webcomics Main/UniversalTropes Main/AppliedPhlebotinum Main/CharacterizationTropes Main/Characters Main/CharactersAsDevice Main/Dialogue Main/Motifs Main/NarrativeDevices Main/Paratext Main/Plots Main/Settings Main/Spectacle Main/BritishTellyTropes Main/TheContributors Main/CreatorSpeak Main/Creators Main/DerivativeWorks Main/LanguageTropes Main/LawsAndFormulas Main/ShowBusiness Main/SplitPersonalityTropes Main/StockRoom Main/TropeTropes Main/Tropes Main/TruthAndLies Main/TruthInTelevision Main/BetrayalTropes Main/CensorshipTropes Main/CombatTropes Main/DeathTropes Main/FamilyTropes Main/FateAndProphecyTropes Main/FoodTropes Main/HolidayTropes Main/MemoryTropes Main/MoneyTropes Main/MoralityTropes Main/PoliticsTropes Main/ReligionTropes Main/SchoolTropes Community Showcase TV Tropes

Unable to process URL
Message is undefined method `chr' for nil:NilClass
/usr/bin/cewl:296:in `construct_complete_url'
/usr/bin/cewl:251:in `get_page'
/usr/bin/cewl:176:in `block (2 levels) in start!'
/usr/bin/cewl:174:in `each'
/usr/bin/cewl:174:in `block in start!'
/usr/bin/cewl:162:in `each'
/usr/bin/cewl:162:in `start!'
/usr/bin/cewl:114:in `start_at'
/usr/bin/cewl:731:in `block in <main>'
/usr/bin/cewl:721:in `catch'
/usr/bin/cewl:721:in `<main>'
Visiting: https://tvtropes.org:443/pmwiki/privacypolicy.php referred from https://tvtropes.org/pmwiki/pmwiki.php/ComicStrip/ComicStrip, got response code 200
Attribute text found:
   TV Tropes   Quantcast Display Crucial Browsing Main/ActionAdventureTropes Main/ComedyTropes Main/CommercialsTropes Main/CrimeAndPunishmentTropes Main/DramaTropes Main/HorrorTropes Main/LoveTropes Main/NewsTropes Main/ProfessionalWrestling Main/SpeculativeFictionTropes Main/SportsStoryTropes Main/WarTropes Main/Media Main/AnimationTropes Main/Anime Main/ComicBookTropes FanFic/FanFics Main/Film Main/GameTropes Main/Literature Main/MusicAndSoundEffects Main/NewMediaTropes Main/PrintMediaTropes Main/Radio Main/SequentialArt Main/TabletopGames Main/Television Main/Theater Main/VideogameTropes Main/Webcomics Main/UniversalTropes Main/AppliedPhlebotinum Main/CharacterizationTropes Main/Characters Main/CharactersAsDevice Main/Dialogue Main/Motifs Main/NarrativeDevices Main/Paratext Main/Plots Main/Settings Main/Spectacle Main/BritishTellyTropes Main/TheContributors Main/CreatorSpeak Main/Creators Main/DerivativeWorks Main/LanguageTropes Main/LawsAndFormulas Main/ShowBusiness Main/SplitPersonalityTropes Main/StockRoom Main/TropeTropes Main/Tropes Main/TruthAndLies Main/TruthInTelevision Main/BetrayalTropes Main/CensorshipTropes Main/CombatTropes Main/DeathTropes Main/FamilyTropes Main/FateAndProphecyTropes Main/FoodTropes Main/HolidayTropes Main/MemoryTropes Main/MoneyTropes Main/MoralityTropes Main/PoliticsTropes Main/ReligionTropes Main/SchoolTropes Community Showcase TV Tropes

Writing words to file

please switch from zip to rubyzip

the ruby gem "zip" was last updated in 2010 while "rubyzip" was updated in 2020.

My real motive, zip isn't packaged in gentoo and I don't want to add an unmaintained gem.

Words with accented characters are ignored

Hello,
thanks for the tool !
I've made a small patch to match correclty accented chars:

diff --git a/cewl.rb b/cewl.rb
index 967b5ed..22ef574 100755
--- a/cewl.rb
+++ b/cewl.rb
@@ -939,9 +939,9 @@ catch :ctrl_c do
                                                if wordlist
                                                        # Remove any symbols
                                                        if words_with_numbers then
-                                                               words.gsub!(/[^a-z0-9]/i, " ")
+                                                               words.gsub!(/[^[[:alnum:]]]/i, " ")
                                                        else
-                                                               words.gsub!(/[^a-z]/i, " ")
+                                                               words.gsub!(/[^[[:alpha:]]]/i, " ")
                                                        end
 
                                                        # Add to the array

Which is needed for languages with non-ASCII chars :)

Issue with CeWL "grinding"?

Hi there,

I'm running the latest CeWL on a fresh Kali VM, and asked it to do a pretty light enumeration of a site (depth of 1 and minimum character count of 7). CeWL will just grind away for hours and never finish (and never write anything to the output .txt file). It also seems to drag down performance of the VM and make it unresponsive (even though top doesn't seem to indicate the ruby is crushing the VM). CeWL is the only thing running on the VM.

Curious if you've ever seen this before or have any troubleshooting ideas to try?

Thanks,
Brian

Fixnum is depricated

hi ,
i've problem when i use Cewl when i run it show me this "/usr/lib/ruby/vendor_ruby/spider/spider_instance.rb:125: warning: constant ::Fixnum is deprecated"
please can u tell me what should i do

Pull content from pages other than 200s

As requested (and then request removed) by @squid22.

See the new release, version 5.4.8.

-o fails nearly 100% of the time

Every time -o is used, eventually it hits a server which doesn't exist and cewl dies, producing no wordlist after hours or days of running. Here is the error from verbose mode:

Unable to connect to the site, run in verbose mode for more information

The following error may help:
getaddrinfo: Temporary failure in name resolution
/usr/lib64/ruby/2.1.0/net/http.rb:879:in initialize' /usr/lib64/ruby/2.1.0/net/http.rb:879:inopen'
/usr/lib64/ruby/2.1.0/net/http.rb:879:in block in connect' /usr/lib64/ruby/2.1.0/timeout.rb:75:intimeout'
/usr/lib64/ruby/2.1.0/net/http.rb:878:in connect' /usr/lib64/ruby/2.1.0/net/http.rb:863:indo_start'
/usr/lib64/ruby/2.1.0/net/http.rb:852:in start' /usr/lib64/ruby/2.1.0/net/http.rb:1375:inrequest'
/usr/bin/cewl:243:in get_page' /usr/bin/cewl:179:inblock (2 levels) in start!'
/usr/bin/cewl:177:in each' /usr/bin/cewl:177:inblock in start!'
/usr/bin/cewl:168:in each' /usr/bin/cewl:168:instart!'
/usr/bin/cewl:139:in start_at' /usr/bin/cewl:701:in

'
Caller
/usr/bin/cewl:199:in get_page' /usr/bin/cewl:179:inblock (2 levels) in start!'
/usr/bin/cewl:177:in each' /usr/bin/cewl:177:inblock in start!'
/usr/bin/cewl:168:in each' /usr/bin/cewl:168:instart!'
/usr/bin/cewl:139:in start_at' /usr/bin/cewl:701:in'

Not all websites return a set?

I was a bit confused by this. It seems that some websites are immune to using cewl.

while doing google works: cewl www.google.com -m 6 -w outfile.txt but then when I tried some other sites, it was giving me no results. Example: cewl www.spiderlabs.com -m 6 -w outfile.txt, I also noticed it has the same affect for sega.com and bentley.com

Did they developed rules to counteract the spidering, or am I doing something wrong?

Invalid word - Get tag option

While using the tool on a website, I look at the wordlist produce by CeWL and I could see that there is a lot of invalids words. For exemple: function, class, x3d0, isStatic, getElementsByTagName, etc.

The website is a french website, so there is not supposed to be word in english and they are words from the html code. I don't code in ruby, so I can't help really much.

Fearure request: append file

I didn't see it in the --help but I'd like to append to a file.
(ie: spider 4 sites into a single text file)

Facebook Profiles

Does CeWL work to scrape facebook profiles? What switches need to be set for that?

[Feature Request] add include/exclude spaces to -g

Could you add an option to include/exclude spaces to the -g option (or even both)?

Maybe it could work like:

-s , --spaces : must be used with -g. Option: 1- include spaces, 2- exclude spaces, 3- include both. Default 1

so the results would look like:
1-
so the
the results
results would
would look
look like

2-
sothe
theresults
resultswould
wouldlook
looklike

3-
so the
the results
results would
would look
look like
sothe
theresults
resultswould
wouldlook
looklike

Program doesn't collect anything

root@kali:# cewl -w customwordlist.txt -v -d 1 -m 5 www.amigos.com
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
Starting at http://www.amigos.com
Offsite link, not following: https://www.amigos.com/
Writing words to file
root@kali:# cewl -w customwordlist.txt -v -d 1 -m 5 www.gothub.com
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
Starting at http://www.gothub.com
Offsite link, not following: http://eepurl.com/ddJb_z
Writing words to file
root@kali:# cewl -w customwordlist.txt -v -d 1 -m 5 www.github.com
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
Starting at http://www.github.com
Offsite link, not following: https://www.github.com/
Writing words to file
root@kali:# cewl -w customwordlist.txt -d 5 -m 7 www.hak5.com
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
root@kali:# cewl -w customwordlist.txt -v -d 1 -m 5 www.github.com
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
Starting at http://www.github.com
Offsite link, not following: https://www.github.com/
Writing words to file
root@kali:# cewl -w customwordlist.txt -d 5 -m 7 www.hak5.com
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
root@kali:# ^C
root@kali:# cewl -w customwordlist.txt -d 2 www.hak5.com
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
root@kali:# cewl -w customwordlist.txt -d 2 www.hak5.com -v
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
Starting at http://www.hak5.com
Offsite link, not following: http://ww7.hak5.com
Writing words to file
root@kali:# cewl -w customwordlist.txt -d 2 www.cnn.com -v
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
Starting at http://www.cnn.com
Offsite link, not following: https://www.cnn.com/
Writing words to file
root@kali:# cewl -d 2 http://www.cnn.com/
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
root@kali:# cewl -d 2 http://www.cnn.com/ -v
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
Starting at http://www.cnn.com/
Offsite link, not following: https://www.cnn.com/
Words found

Gemfile and README.md conflict

README.md says "mime-types" and "rubyzip" are required, but the Gemfile actually calls for "zip" and "mime" which are different gems.

Afaik, the ones specified in README.md are the "preferred" gems to use. Can you please correct Gemfile, Gemfile.lock, and confirm the code actually uses the right gems? At least the zip gem seems to be called directly, I can't tell for mime because I know less about it.

Feature request: Follow Subdomains

I noticed that CeWL doesn't follow subdomains.

cewl http://www.domain.com

does not traverse into http://sub.domain.com

cewl http://domain.com

does not work. Neither does

cewl http://*.domain.com

Would be nice to have that as additional feature.

Thanks!
Christian

Offsite link

Hi i want to use cewl but dont scan anything.
CeWL 5.4.8 (Inclusion) Robin Wood ([email protected]) (https://digi.ninja/)
Starting at domain
Offsite link, not following: domain
Words found
i try so much pages but nothing.

I'm receiving a "NoMemoryError" when attempting to get words from a site.

OS:
Kali 2.0 Rolling Release

Command executed:
cewl -v http://website.com

Error:
/usr/bin/cewl:813:in block (2 levels) in <main>': failed to allocate memory (NoMemoryError) from /usr/lib/ruby/vendor_ruby/spider/spider_instance.rb:280:inblock in do_callbacks'
from /usr/lib/ruby/vendor_ruby/spider/spider_instance.rb:279:in each' from /usr/lib/ruby/vendor_ruby/spider/spider_instance.rb:279:indo_callbacks'
from /usr/bin/cewl:179:in block (3 levels) in start!' from /usr/bin/cewl:256:inget_page'
from /usr/bin/cewl:178:in block (2 levels) in start!' from /usr/bin/cewl:176:ineach'
from /usr/bin/cewl:176:in block in start!' from /usr/bin/cewl:167:ineach'
from /usr/bin/cewl:167:in start!' from /usr/bin/cewl:138:instart_at'
from /usr/bin/cewl:694:in `

grabbing possible passwords with numbers

hi,
i tried to grab some words from a site, but the words that contain numbers (p4ssw0rds,g0tmi1k ...)don't end up in my word list .
i am using cewl 5.3
command i am using: cewl http://192.168.1.10/personal -d 2 -m 6 -w /Desktop/wordlist.txt -v

is it possible to get these words also in my grabbed word list?

thx
rope

[Feature Request] Add Domain/Subdomain/Path to wordlist

Using a new command line flag, could we include the URL structure into the wordlist.
Make sure to include:

Domain (static)
Paths (dynamic)
Subdomains

nothing saved if Ctrl+C is pressed while running

It would be really nice if we could break a session and still getting the partial results saved

thank you

add the possibility to include numbers and kind of a sentence mode

I had the following ideas in mind and integrated them into my fork of your project.

the ability to integrate numbers so that I can create a list of words, numbers (e.g., for dates), or both.
the possibility of a 'sentence_mode'.
Often passwords are formed with the help of phrases. So words of a sentence combined, or lined up the initial letters. I thought it would be great to get a wordlist as a result. For example 'This is a simple test' becomes 'Tiast'.

Maybe you want to look at my fork and integrate this. I could also generate a pull request, but I think it's not well coded because some parts seem redundant and could maybe replaced by a function. For me coding is just a hobby.

problem with HTML entities

I create themed crossword puzzles. For that I need wordlists. So I use CeWL to gather words from websites. I use the German language with special characters äöüß.

I noticed that when retrieving words from a website with German words (which were written as HTML entities) CeWL split the words at the HTML entities, removing the HTML entities.

Example:
The plural of potatoes in (Austrian) German is

Erdäpfel or 
Erd&auml;pfel with HTML entitiy notation

CeWL retrieved the word and split it:

Erd
pfel

Please add an option to convert HTML entities, so that words in other languages than English can also correctly retrieved.

Multithreading

I can't tell for sure as I don't know Ruby but it seems to me that this is single-threaded and it would be a lot faster if it was multithreaded.

New tagged release?

Hello,

The latest tagged release is version 5.3 (15 Nov 2016).
On Debian / Kali we track the tagged releases and we use the release to package new version.
It would be appreciated that you make a new tagged release for the latest version 5.4.3 (version mentioned in the README).

Thank you

Feature Request: Add --with-numbers default

Feature Request: Add the '--with-numbers' by default, please change this parameter to --exclude-numbers to exclude the numbers if you do not need numbers.

Install cewl at ubuntu 14.04 Help

Hello trying to install cewl 'https://github.com/digininja/CeWL/releases/tag/5.2' at ubuntu 14.04 but not working.
I have follow all instructions from here: https://digi.ninja/projects/cewl.php
Some dependencies was require ruby version 2.0 and others version 2.5 however i have install a few versions and all dependencies have installed.
'bundle install' has success.
Also make directory executable but when I navigate to directory and execute:
cewl --help
I get:
No command 'cewl' found, did you mean: Command 'mewl' from package 'mew-beta-bin' (universe) Command 'mewl' from package 'mew-bin' (universe) cewl: command not found

Can you please tell me what I am doing wrong?
Thank you very much!

'rexml/document' not found

After installation CeWL generates an error:

Error: rexml/document gem not installed
	 use: "gem install rexml/document" to install the required gem

I fixed it by adding gem 'rexml' to the Gemfile.
My ruby version is: 3.0.1

Fixnum issue

Hi, I am trying to use Cewl on Kali Linux and I run into following error when I try to make a list.

CeWL 5.3 (Heading Upwards) Robin Wood ([email protected]) (https://digi.ninja/)
/usr/lib/ruby/vendor_ruby/spider/spider_instance.rb:125: warning: constant ::Fixnum is deprecated

Any clues please?

[Feature Request] add range to groups of words

Would it be possible to add a range to groups of words?

example:

cewl.rb -g 2-5
or
cewl.rb -g 2 -g 3 -g 4 -g 5
or
cewl.rb -g 2,3,4,5

Missing URL argument

~/CeWL $ ruby -W0 ./cewl.rb
CeWL 5.5.2 (Grouping) Robin Wood ([email protected]) (https://digi.ninja/)

Missing URL argument (try --help)

Lithuanian characters disappear in output

When scanning a page the output comes with no lithuanian characters such as ĄČĘĖĮŠŲŪŽ, they just disappear. For example when scanning www.google.lt:
paie
Paie
lapiai
Nar
ymo
rankiai
lygos

Which I THINK should be:
paieŠka
PaieŠka
Šlapiai
Nar #this one seems to be seperated in two words with the one below because of unrecognized character
ymo
Įrankiai
sĄlygos

Tested with newest 5.4.3 version, tried one from Kali repo and from github.

Inventory notification

Your tool/software has been inventoried on Rawsec's CyberSecurity Inventory.

What is Rawsec's CyberSecurity Inventory?

An inventory of tools and resources about CyberSecurity. This inventory aims to help people to find everything related to CyberSecurity.

Open source: Every information is available and up to date. If an information is missing or deprecated, you are invited to (help us).
Practical: Content is categorized and table formatted, allowing to search, browse, sort and filter.
Fast: Using static and client side technologies resulting in fast browsing.
Rich tables: search, sort, browse, filter, clear
Fancy informational popups
Badges / Shields
Static API
Twitter bot

More details about features here.

Note: the inventory is a FLOSS (Free, Libre and Open-Source Software) project.

Why?

Specialized websites: Some websites are referencing tools but additional information is not available or browsable. Make additional searches take time.
Curated lists: Curated lists are not very exhaustive, up to date or browsable and are very topic related.
Search engines: Search engines sometimes does find nothing, some tools or resources are too unknown or non-referenced. These is where crowdsourcing is better than robots.

Why should you care about being inventoried?

Mainly because this is giving visibility to your tool, more and more people are using the Rawsec's CyberSecurity Inventory, this helps them find what they need.

Badges

The badge shows to your community that your are inventoried. This also shows you care about your project and want it growing, that your tool is not an abandonware.

Feel free to claim your badge here: http://inventory.rawsec.ml/features.html#badges, it looks like that , but there are several styles available.

So what?

That's all, this message is just to notify you if you care.

Cewl does nothing.

hi,
i am trying to use the newest version of cewl but when i type smth in i allways get this message:

CeWL 5.4.8 (Inclusion) Robin Wood ([email protected]) (https://digi.ninja/)

and nothing more. And i have my terminal back.

How can i get cewl to work?

(i tryed for the example the following comand: cewl -m 6 -w testlist.txt -c bbc.com)

Runtime error

Hi,

I'm getting this while trying to use the tool. Kali2017.2

CLI: ./cewl.rb -d 4 -m 5 -v -w custom_dict.txt

The following error may help:
incorrect header check
/usr/lib/ruby/2.3.0/net/http/response.rb:380:in `inflate'
/usr/lib/ruby/2.3.0/net/http/response.rb:380:in `block in inflate_adapter'
/usr/lib/ruby/2.3.0/net/protocol.rb:411:in `call_block'
/usr/lib/ruby/2.3.0/net/protocol.rb:402:in `<<'
/usr/lib/ruby/2.3.0/net/protocol.rb:104:in `read'
/usr/lib/ruby/2.3.0/net/http/response.rb:402:in `read'
/usr/lib/ruby/2.3.0/net/http/response.rb:291:in `block in read_body_0'
/usr/lib/ruby/2.3.0/net/http/response.rb:262:in `inflater'
/usr/lib/ruby/2.3.0/net/http/response.rb:281:in `read_body_0'
/usr/lib/ruby/2.3.0/net/http/response.rb:202:in `read_body'
/usr/lib/ruby/2.3.0/net/http/response.rb:227:in `body'
/usr/lib/ruby/2.3.0/net/http/response.rb:164:in `reading_body'
/usr/lib/ruby/2.3.0/net/http.rb:1445:in `transport_request'
/usr/lib/ruby/2.3.0/net/http.rb:1407:in `request'
/usr/lib/ruby/2.3.0/net/http.rb:1400:in `block in request'
/usr/lib/ruby/2.3.0/net/http.rb:853:in `start'
/usr/lib/ruby/2.3.0/net/http.rb:1398:in `request'
./cewl.rb:279:in `get_page'
./cewl.rb:209:in `block (2 levels) in start!'
./cewl.rb:207:in `each'
./cewl.rb:207:in `block in start!'
./cewl.rb:195:in `each'
./cewl.rb:195:in `start!'
./cewl.rb:161:in `start_at'
./cewl.rb:757:in `block in <main>'
./cewl.rb:747:in `catch'
./cewl.rb:747:in `<main>'

Caller
./cewl.rb:231:in `get_page'
./cewl.rb:209:in `block (2 levels) in start!'
./cewl.rb:207:in `each'
./cewl.rb:207:in `block in start!'
./cewl.rb:195:in `each'
./cewl.rb:195:in `start!'
./cewl.rb:161:in `start_at'
./cewl.rb:757:in `block in <main>'
./cewl.rb:747:in `catch'
./cewl.rb:747:in `<main>'

Any idea how I should fix it up?

Thank you.

Add option to exclude multiple words/strings

Hey guys, there's a bug when processing offsite on a custom port

..it falls back to port 80.

thx.

"Display the usage" msg clean up

cewl.rb
between line 524 and 563

It looks a lot better and a lot less busy in the "options" area.
You always want to put the short version of an option before the long version. It should look uniformed and easy to look through.
Also fixed a few other things.

def usage
  puts "Usage: cewl [OPTIONS] ... <url>

    OPTIONS:
      -h, --help: Show help.
      -k, --keep: Keep the downloaded file.
      -d <x>,--depth <x>: Depth to spider to, default 2.
      -m, --min_word_length: Minimum word length, default 3.
      -o, --offsite: Let the spider visit other sites.
      -w, --write: Write the output to the file.
      -u, --ua <agent>: User agent to send.
      -n, --no-words: Don't output the wordlist.
      -a, --meta: include meta data.
      --meta_file file: Output file for meta data.
      -e, --email: Include email addresses.
      --email_file <file>: Output file for email addresses.
      --meta-temp-dir <dir>: The temporary directory used by exiftool when parsing files, default /tmp.
      -c, --count: Show the count for each word found.
      -v, --verbose: Verbose.
      --debug: Extra debug information.
      
      AUTHENTICATION:
      --auth_type: Digest or basic.
      --auth_user: Authentication username.
      --auth_pass: Authentication password.
      
      PROXY SUPPORT:
      --proxy_host: Proxy host.
      --proxy_port: Proxy port, default 8080.
      --proxy_username: Username for proxy, if required.
      --proxy_password: Password for proxy, if required.
      
      HEADERS:
      --header, -H: In format name:value - can pass multiple.
      
      <url>: The site to spider.
"
  exit 0
end

Feature Request: Quitting saves partial wordlist (or allows option to restore previous session)

It would be nice if when CeWL exit due to an interrupt it would stop spidering and finish writing the word list before it exited. For instance if I began spidering a very large site and I had the depth set pretty high, after an hour or two I might decide that it's probably collected enough words and hit Ctrl+C. Unfortunately when I do that, CeWL doesn't write anything to the output file and I have to start over with a lower depth.

Alternatively, maybe implementing some way to restore a previous run would be useful.

As an alternative approach I tried using wget to mirror a site and then using various html2text utilities, awk, grep, etc to parse out words into a list offline, which works ok but is really dependent on the parsing, which CeWL already seems to do pretty well.

using -d 0, I get no entries

When using the preinstalled cewl (version 5.3) on Kali, I can use -d 0 to get only results from the webpage I want. Cloning and using version 5.4.2 from GitHub I didn't get entries with -d 0, only with -d 1 but then I haven't the results of the wanted page only the "subpages".

[Feature Request] Command line option to set cookie

Please add a command line options to specify cookies. This way an user can crawl websites where you need to be logged in (e.g. Facebook would be a good place to crawl).

Can't install rexml/document

Hello! I can't get cewl to run because I can't install rexml/document. I installed rexml. Any advice??

╭─f00d4w0rm5@Garuda in ~ took 1s
╰─λ cewl --help
CeWL 5.5.1 (Grouping) Robin Wood ([email protected]) (https://digi.ninja/)

Error: rexml/document gem not installed
use: "gem install rexml/document" to install the required gem


╭─f00d4w0rm5@Garuda in ~ took 685ms
╰─λ gem install rexml/document
ERROR:  While executing gem ... (Gem::RemoteFetcher::FetchError)
bad response Not Found 404 (https://index.rubygems.org/info/rexml/document)

Removing common words?

Could there be a way to remove a list of common words from the generated list?

[Feature Request] Add frequency sort option

So there is:

-c, --count: Show the count for each word found.

Could this be expanded at all, so another option, would just sort the list by frequency?
The words/phrases that appear the most, at the top, least/unique values at the end? (and without the count?)

I know this can be done easy afterwards, just would be 'nice' to have it in-built

how do i create a wordlist that only have 2 numbers and the rest are lowercase alphabet and it's length should be 8 ?

Redirect to relative path not working without appending '/'

Snippet from lines 652-655. When i use the script to some sites that will send a "Location: /folder/page.html" the script would try http:/folder/page.html. After adding '/' to the URL, if missing, this is resolved.

# Must have protocol
url = "http://#{url}" if url !~ /^http(s)?:\/\//
url = url+"/" if url !~ /\/$/

The culprit, line 262:

base_url = uri.to_s[0, uri.to_s.rindex('/')]

p.s. my apologies if this is fixed in some version i have missed or break some other dependency. I've only tested with a few sites and parameters.