Giter Club home page Giter Club logo

cewl's Introduction

CeWL - Custom Word List generator

Copyright(c) 2024, Robin Wood [email protected]

Based on a discussion on PaulDotCom (episode 129) about creating custom word lists spidering a targets website and collecting unique words I decided to write CeWL, the Custom Word List generator. CeWL is a ruby app which spiders a given URL to a specified depth, optionally following external links, and returns a list of words which can then be used for password crackers such as John the Ripper.

By default, CeWL sticks to just the site you have specified and will go to a depth of 2 links, this behaviour can be changed by passing arguments. Be careful if setting a large depth and allowing it to go offsite, you could end up drifting on to a lot of other domains. All words of three characters and over are output to stdout. This length can be increased and the words can be written to a file rather than screen so the app can be automated.

CeWL also has an associated command line app, FAB (Files Already Bagged) which uses the same meta data extraction techniques to create author/creator lists from already downloaded.

For anyone running CeWL with Ruby 2.7, you might get some warnings in the style:

.../ruby-2.7.0/gems/mime-types-3.2.2/lib/mime/types/logger.rb:30: warning: `_1' is reserved for numbered parameter; consider another name

This is due to a new feature introduced in 2.7 which conflices with one line of code in the logger script from the mime-types gem. There is an update for it in the gem's repo so hopefully that will be released soon. Till then, as far as I can tell, the warning does not affect CeWL in any way. If, for asthetics, you want to hide the warning, you can run the script as follows:

ruby -W0 ./cewl.rb

Homepage: https://digi.ninja/projects/cewl.php

GitHub: https://github.com/digininja/CeWL

Pronunciation

Seeing as I was asked, CeWL is pronounced "cool".

Installation

CeWL needs the following gems to be installed:

  • mime
  • mime-types
  • mini_exiftool
  • nokogiri
  • rubyzip
  • spider

The easiest way to install these gems is with Bundler:

gem install bundler
bundle install

Alternatively, you can install them manually with:

gem install xxx

The gem mini_exiftool gem also requires the exiftool application to be installed.

Assuming you cloned the GitHub repo, the script should by executable by default, but if not, you can make it executable with:

chmod u+x ./cewl.rb

The project page on my site gives some tips on solving common problems people have encountered while running CeWL - https://digi.ninja/projects/cewl.php

Usage

./cewl.rb

CeWL 5.5.2 (Grouping) Robin Wood ([email protected]) (https://digi.ninja/)
Usage: cewl [OPTIONS] ... <url>

    OPTIONS:
	-h, --help: Show help.
	-k, --keep: Keep the downloaded file.
	-d <x>,--depth <x>: Depth to spider to, default 2.
	-m, --min_word_length: Minimum word length, default 3.
	-o, --offsite: Let the spider visit other sites.
	-w, --write: Write the output to the file.
	-u, --ua <agent>: User agent to send.
	-n, --no-words: Don't output the wordlist.
	-a, --meta: include meta data.
	--meta_file file: Output file for meta data.
	-e, --email: Include email addresses.
	--email_file <file>: Output file for email addresses.
	--meta-temp-dir <dir>: The temporary directory used by exiftool when parsing files, default /tmp.
	-c, --count: Show the count for each word found.
	-v, --verbose: Verbose.
	--debug: Extra debug information.

	Authentication
	--auth_type: Digest or basic.
	--auth_user: Authentication username.
	--auth_pass: Authentication password.

	Proxy Support
	--proxy_host: Proxy host.
	--proxy_port: Proxy port, default 8080.
	--proxy_username: Username for proxy, if required.
	--proxy_password: Password for proxy, if required.

	Headers
	--header, -H: In format name:value - can pass multiple.

    <url>: The site to spider.

Running CeWL in a Docker container

To quickly use CeWL with Docker, you can use the official ghcr.io/digininja/cewl image:

docker run -it --rm -v "${PWD}:/host" ghcr.io/digininja/cewl [OPTIONS] ... <url>

You can also build it locally:

docker build -t cewl .
docker run -it --rm -v "${PWD}:/host" cewl [OPTIONS] ... <url>

I am going to stress here, I am not going to be offering any support for this. The work was done by @loris-intergalactique who has offered to field any questions on it and give support. I don't use or know Docker, so please, don't ask me for help.

Licence

This project released under the Creative Commons Attribution-Share Alike 2.0 UK: England & Wales

http://creativecommons.org/licenses/by-sa/2.0/uk/

Alternatively, you can use GPL-3+ instead the of the original license.

http://opensource.org/licenses/GPL-3.0

cewl's People

Contributors

0daysecured avatar 5p1n avatar alibkaba avatar cbrunnkvist avatar dependabot[bot] avatar digininja avatar firefart avatar g0tmi1k avatar jeffmcjunkin avatar loris-intergalactique avatar r3motecontrol avatar trou avatar vdbaan avatar zeecka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cewl's Issues

Invalid word - Get tag option

While using the tool on a website, I look at the wordlist produce by CeWL and I could see that there is a lot of invalids words. For exemple: function, class, x3d0, isStatic, getElementsByTagName, etc.

The website is a french website, so there is not supposed to be word in english and they are words from the html code. I don't code in ruby, so I can't help really much.

add the possibility to include numbers and kind of a sentence mode

I had the following ideas in mind and integrated them into my fork of your project.

  1. the ability to integrate numbers so that I can create a list of words, numbers (e.g., for dates), or both.

  2. the possibility of a 'sentence_mode'.
    Often passwords are formed with the help of phrases. So words of a sentence combined, or lined up the initial letters. I thought it would be great to get a wordlist as a result. For example 'This is a simple test' becomes 'Tiast'.

Maybe you want to look at my fork and integrate this. I could also generate a pull request, but I think it's not well coded because some parts seem redundant and could maybe replaced by a function. For me coding is just a hobby.

Doesnt find phone numbers

Hi, it seems that cewl will not pull the phone number (that also happens to be the password) off of the site that I am attempting to crawl. I am sure that I have it set at a large enough depth. It does pull a wordlist that it normally would though.

any suggestions?

(update to clarify issue)

Gemfile and README.md conflict

README.md says "mime-types" and "rubyzip" are required, but the Gemfile actually calls for "zip" and "mime" which are different gems.

Afaik, the ones specified in README.md are the "preferred" gems to use. Can you please correct Gemfile, Gemfile.lock, and confirm the code actually uses the right gems? At least the zip gem seems to be called directly, I can't tell for mime because I know less about it.

grabbing possible passwords with numbers

hi,
i tried to grab some words from a site, but the words that contain numbers (p4ssw0rds,g0tmi1k ...)don't end up in my word list .
i am using cewl 5.3
command i am using: cewl http://192.168.1.10/personal -d 2 -m 6 -w /Desktop/wordlist.txt -v

is it possible to get these words also in my grabbed word list?

thx
rope

Fixnum issue

Hi, I am trying to use Cewl on Kali Linux and I run into following error when I try to make a list.

CeWL 5.3 (Heading Upwards) Robin Wood ([email protected]) (https://digi.ninja/)
/usr/lib/ruby/vendor_ruby/spider/spider_instance.rb:125: warning: constant ::Fixnum is deprecated

Any clues please?

Unable to connect to the site

Hello Robin Wood :)
Unfortunately I can't CeWL my site: :(
Do you have any ideas?


cewl http://a-wareness.fr --debug -v
CeWL 5.4.8 (Inclusion) Robin Wood ([email protected]) (https://digi.ninja/)
Starting at http://a-wareness.fr
Pushing {nil=>"http://a-wareness.fr"}
Checking page http://a-wareness.fr
Comparing http://a-wareness.fr with http://a-wareness.fr

Unable to connect to the site (http://a-wareness.fr:80/)

The following error may help:
execution expired
/usr/lib/ruby/2.7.0/net/http.rb:960:in initialize' /usr/lib/ruby/2.7.0/net/http.rb:960:in open'
/usr/lib/ruby/2.7.0/net/http.rb:960:in block in connect' /usr/lib/ruby/2.7.0/timeout.rb:105:in timeout'
/usr/lib/ruby/2.7.0/net/http.rb:958:in connect' /usr/lib/ruby/2.7.0/net/http.rb:943:in do_start'
/usr/lib/ruby/2.7.0/net/http.rb:932:in start' /usr/lib/ruby/2.7.0/net/http.rb:1483:in request'
/usr/bin/cewl:246:in get_page' /usr/bin/cewl:176:in block (2 levels) in start!'
/usr/bin/cewl:174:in each' /usr/bin/cewl:174:in block in start!'
/usr/bin/cewl:162:in each' /usr/bin/cewl:162:in start!'
/usr/bin/cewl:114:in start_at' /usr/bin/cewl:769:in block in

'
/usr/bin/cewl:759:in catch' /usr/bin/cewl:759:in '

Caller
/usr/bin/cewl:198:in get_page' /usr/bin/cewl:176:in block (2 levels) in start!'
/usr/bin/cewl:174:in each' /usr/bin/cewl:174:in block in start!'
/usr/bin/cewl:162:in each' /usr/bin/cewl:162:in start!'
/usr/bin/cewl:114:in start_at' /usr/bin/cewl:769:in block in

'
/usr/bin/cewl:759:in catch' /usr/bin/cewl:759:in '

End of main loop
Words found
End of wordlist loop
End of email loop
End of meta loop


Feature request: Follow Subdomains

I noticed that CeWL doesn't follow subdomains.

cewl http://www.domain.com

does not traverse into http://sub.domain.com

cewl http://domain.com

does not work. Neither does

cewl http://*.domain.com

Would be nice to have that as additional feature.

Thanks!
Christian

Program doesn't collect anything

root@kali:# cewl -w customwordlist.txt -v -d 1 -m 5 www.amigos.com
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
Starting at http://www.amigos.com
Offsite link, not following: https://www.amigos.com/
Writing words to file
root@kali:
# cewl -w customwordlist.txt -v -d 1 -m 5 www.gothub.com
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
Starting at http://www.gothub.com
Offsite link, not following: http://eepurl.com/ddJb_z
Writing words to file
root@kali:# cewl -w customwordlist.txt -v -d 1 -m 5 www.github.com
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
Starting at http://www.github.com
Offsite link, not following: https://www.github.com/
Writing words to file
root@kali:
# cewl -w customwordlist.txt -d 5 -m 7 www.hak5.com
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
root@kali:# cewl -w customwordlist.txt -v -d 1 -m 5 www.github.com
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
Starting at http://www.github.com
Offsite link, not following: https://www.github.com/
Writing words to file
root@kali:
# cewl -w customwordlist.txt -d 5 -m 7 www.hak5.com
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
root@kali:# ^C
root@kali:
# cewl -w customwordlist.txt -d 2 www.hak5.com
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
root@kali:# cewl -w customwordlist.txt -d 2 www.hak5.com -v
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
Starting at http://www.hak5.com
Offsite link, not following: http://ww7.hak5.com
Writing words to file
root@kali:
# cewl -w customwordlist.txt -d 2 www.cnn.com -v
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
Starting at http://www.cnn.com
Offsite link, not following: https://www.cnn.com/
Writing words to file
root@kali:# cewl -d 2 http://www.cnn.com/
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
root@kali:
# cewl -d 2 http://www.cnn.com/ -v
CeWL 5.4.4.1 (Arkanoid) Robin Wood ([email protected]) (https://digi.ninja/)
Starting at http://www.cnn.com/
Offsite link, not following: https://www.cnn.com/
Words found

Multithreading

I can't tell for sure as I don't know Ruby but it seems to me that this is single-threaded and it would be a lot faster if it was multithreaded.

[Feature Request] Add frequency sort option

So there is:

-c, --count: Show the count for each word found.

Could this be expanded at all, so another option, would just sort the list by frequency?
The words/phrases that appear the most, at the top, least/unique values at the end? (and without the count?)

I know this can be done easy afterwards, just would be 'nice' to have it in-built

Redirect to relative path not working without appending '/'

Snippet from lines 652-655. When i use the script to some sites that will send a "Location: /folder/page.html" the script would try http:/folder/page.html. After adding '/' to the URL, if missing, this is resolved.

# Must have protocol
url = "http://#{url}" if url !~ /^http(s)?:\/\//
url = url+"/" if url !~ /\/$/

The culprit, line 262:

base_url = uri.to_s[0, uri.to_s.rindex('/')] 

p.s. my apologies if this is fixed in some version i have missed or break some other dependency. I've only tested with a few sites and parameters.

Can't install rexml/document

Hello! I can't get cewl to run because I can't install rexml/document. I installed rexml. Any advice??

╭─f00d4w0rm5@Garuda in ~ took 1s
╰─λ cewl --help
CeWL 5.5.1 (Grouping) Robin Wood ([email protected]) (https://digi.ninja/)

Error: rexml/document gem not installed
use: "gem install rexml/document" to install the required gem


╭─f00d4w0rm5@Garuda in ~ took 685ms
╰─λ gem install rexml/document
ERROR:  While executing gem ... (Gem::RemoteFetcher::FetchError)
bad response Not Found 404 (https://index.rubygems.org/info/rexml/document)

I'm receiving a "NoMemoryError" when attempting to get words from a site.

OS:
Kali 2.0 Rolling Release

Command executed:
cewl -v http://website.com

Error:
/usr/bin/cewl:813:in block (2 levels) in <main>': failed to allocate memory (NoMemoryError) from /usr/lib/ruby/vendor_ruby/spider/spider_instance.rb:280:inblock in do_callbacks'
from /usr/lib/ruby/vendor_ruby/spider/spider_instance.rb:279:in each' from /usr/lib/ruby/vendor_ruby/spider/spider_instance.rb:279:indo_callbacks'
from /usr/bin/cewl:179:in block (3 levels) in start!' from /usr/bin/cewl:256:inget_page'
from /usr/bin/cewl:178:in block (2 levels) in start!' from /usr/bin/cewl:176:ineach'
from /usr/bin/cewl:176:in block in start!' from /usr/bin/cewl:167:ineach'
from /usr/bin/cewl:167:in start!' from /usr/bin/cewl:138:instart_at'
from /usr/bin/cewl:694:in `

'

please switch from zip to rubyzip

the ruby gem "zip" was last updated in 2010 while "rubyzip" was updated in 2020.

My real motive, zip isn't packaged in gentoo and I don't want to add an unmaintained gem.

[Feature Request] add include/exclude spaces to -g

Could you add an option to include/exclude spaces to the -g option (or even both)?

Maybe it could work like:

-s , --spaces : must be used with -g. Option: 1- include spaces, 2- exclude spaces, 3- include both. Default 1

so the results would look like:
1-
so the
the results
results would
would look
look like

2-
sothe
theresults
resultswould
wouldlook
looklike

3-
so the
the results
results would
would look
look like
sothe
theresults
resultswould
wouldlook
looklike

Fixnum is depricated

hi ,
i've problem when i use Cewl when i run it show me this "/usr/lib/ruby/vendor_ruby/spider/spider_instance.rb:125: warning: constant ::Fixnum is deprecated"
please can u tell me what should i do

Progress Indication

Pointing this tool at bigger sites/wikis can take quite a long time, and it can be difficult to be sure it's actually doing something until it finishes or I force-kill it.

It would be very useful to have some kind of progress indication, whether that would be something simple like displaying the current page being processed, a count of words found, etc. This would make it much easier to be sure it's working and to gauge how far along it is.

Perhaps this issue is moreso about it being slow with the way I'm running it, but pointing it at something like a wikipedia article with default depth took over a day and I ended up killing it. (I later realized a depth of 2 was likely too high, but the point remains)

-o fails nearly 100% of the time

Every time -o is used, eventually it hits a server which doesn't exist and cewl dies, producing no wordlist after hours or days of running. Here is the error from verbose mode:

Unable to connect to the site, run in verbose mode for more information

The following error may help:
getaddrinfo: Temporary failure in name resolution
/usr/lib64/ruby/2.1.0/net/http.rb:879:in initialize' /usr/lib64/ruby/2.1.0/net/http.rb:879:inopen'
/usr/lib64/ruby/2.1.0/net/http.rb:879:in block in connect' /usr/lib64/ruby/2.1.0/timeout.rb:75:intimeout'
/usr/lib64/ruby/2.1.0/net/http.rb:878:in connect' /usr/lib64/ruby/2.1.0/net/http.rb:863:indo_start'
/usr/lib64/ruby/2.1.0/net/http.rb:852:in start' /usr/lib64/ruby/2.1.0/net/http.rb:1375:inrequest'
/usr/bin/cewl:243:in get_page' /usr/bin/cewl:179:inblock (2 levels) in start!'
/usr/bin/cewl:177:in each' /usr/bin/cewl:177:inblock in start!'
/usr/bin/cewl:168:in each' /usr/bin/cewl:168:instart!'
/usr/bin/cewl:139:in start_at' /usr/bin/cewl:701:in

'
Caller
/usr/bin/cewl:199:in get_page' /usr/bin/cewl:179:inblock (2 levels) in start!'
/usr/bin/cewl:177:in each' /usr/bin/cewl:177:inblock in start!'
/usr/bin/cewl:168:in each' /usr/bin/cewl:168:instart!'
/usr/bin/cewl:139:in start_at' /usr/bin/cewl:701:in'

using -d 0, I get no entries

When using the preinstalled cewl (version 5.3) on Kali, I can use -d 0 to get only results from the webpage I want. Cloning and using version 5.4.2 from GitHub I didn't get entries with -d 0, only with -d 1 but then I haven't the results of the wanted page only the "subpages".

Facebook Profiles

Does CeWL work to scrape facebook profiles? What switches need to be set for that?

New tagged release?

Hello,

The latest tagged release is version 5.3 (15 Nov 2016).
On Debian / Kali we track the tagged releases and we use the release to package new version.
It would be appreciated that you make a new tagged release for the latest version 5.4.3 (version mentioned in the README).

Thank you

Words with accented characters are ignored

Hello,
thanks for the tool !
I've made a small patch to match correclty accented chars:

diff --git a/cewl.rb b/cewl.rb
index 967b5ed..22ef574 100755
--- a/cewl.rb
+++ b/cewl.rb
@@ -939,9 +939,9 @@ catch :ctrl_c do
                                                if wordlist
                                                        # Remove any symbols
                                                        if words_with_numbers then
-                                                               words.gsub!(/[^a-z0-9]/i, " ")
+                                                               words.gsub!(/[^[[:alnum:]]]/i, " ")
                                                        else
-                                                               words.gsub!(/[^a-z]/i, " ")
+                                                               words.gsub!(/[^[[:alpha:]]]/i, " ")
                                                        end
 
                                                        # Add to the array

Which is needed for languages with non-ASCII chars :)

Issue with CeWL "grinding"?

Hi there,

I'm running the latest CeWL on a fresh Kali VM, and asked it to do a pretty light enumeration of a site (depth of 1 and minimum character count of 7). CeWL will just grind away for hours and never finish (and never write anything to the output .txt file). It also seems to drag down performance of the VM and make it unresponsive (even though top doesn't seem to indicate the ruby is crushing the VM). CeWL is the only thing running on the VM.

Curious if you've ever seen this before or have any troubleshooting ideas to try?

Thanks,
Brian

undefined method `chr' for nil:NilClass

Visiting: https://tvtropes.org:443/pmwiki/dmca.php referred from https://tvtropes.org/pmwiki/pmwiki.php/ComicStrip/ComicStrip, got response code 200
Attribute text found:
   TV Tropes   Quantcast Display Crucial Browsing Main/ActionAdventureTropes Main/ComedyTropes Main/CommercialsTropes Main/CrimeAndPunishmentTropes Main/DramaTropes Main/HorrorTropes Main/LoveTropes Main/NewsTropes Main/ProfessionalWrestling Main/SpeculativeFictionTropes Main/SportsStoryTropes Main/WarTropes Main/Media Main/AnimationTropes Main/Anime Main/ComicBookTropes FanFic/FanFics Main/Film Main/GameTropes Main/Literature Main/MusicAndSoundEffects Main/NewMediaTropes Main/PrintMediaTropes Main/Radio Main/SequentialArt Main/TabletopGames Main/Television Main/Theater Main/VideogameTropes Main/Webcomics Main/UniversalTropes Main/AppliedPhlebotinum Main/CharacterizationTropes Main/Characters Main/CharactersAsDevice Main/Dialogue Main/Motifs Main/NarrativeDevices Main/Paratext Main/Plots Main/Settings Main/Spectacle Main/BritishTellyTropes Main/TheContributors Main/CreatorSpeak Main/Creators Main/DerivativeWorks Main/LanguageTropes Main/LawsAndFormulas Main/ShowBusiness Main/SplitPersonalityTropes Main/StockRoom Main/TropeTropes Main/Tropes Main/TruthAndLies Main/TruthInTelevision Main/BetrayalTropes Main/CensorshipTropes Main/CombatTropes Main/DeathTropes Main/FamilyTropes Main/FateAndProphecyTropes Main/FoodTropes Main/HolidayTropes Main/MemoryTropes Main/MoneyTropes Main/MoralityTropes Main/PoliticsTropes Main/ReligionTropes Main/SchoolTropes Community Showcase TV Tropes

Unable to process URL
Message is undefined method `chr' for nil:NilClass
/usr/bin/cewl:296:in `construct_complete_url'
/usr/bin/cewl:251:in `get_page'
/usr/bin/cewl:176:in `block (2 levels) in start!'
/usr/bin/cewl:174:in `each'
/usr/bin/cewl:174:in `block in start!'
/usr/bin/cewl:162:in `each'
/usr/bin/cewl:162:in `start!'
/usr/bin/cewl:114:in `start_at'
/usr/bin/cewl:731:in `block in <main>'
/usr/bin/cewl:721:in `catch'
/usr/bin/cewl:721:in `<main>'
Visiting: https://tvtropes.org:443/pmwiki/privacypolicy.php referred from https://tvtropes.org/pmwiki/pmwiki.php/ComicStrip/ComicStrip, got response code 200
Attribute text found:
   TV Tropes   Quantcast Display Crucial Browsing Main/ActionAdventureTropes Main/ComedyTropes Main/CommercialsTropes Main/CrimeAndPunishmentTropes Main/DramaTropes Main/HorrorTropes Main/LoveTropes Main/NewsTropes Main/ProfessionalWrestling Main/SpeculativeFictionTropes Main/SportsStoryTropes Main/WarTropes Main/Media Main/AnimationTropes Main/Anime Main/ComicBookTropes FanFic/FanFics Main/Film Main/GameTropes Main/Literature Main/MusicAndSoundEffects Main/NewMediaTropes Main/PrintMediaTropes Main/Radio Main/SequentialArt Main/TabletopGames Main/Television Main/Theater Main/VideogameTropes Main/Webcomics Main/UniversalTropes Main/AppliedPhlebotinum Main/CharacterizationTropes Main/Characters Main/CharactersAsDevice Main/Dialogue Main/Motifs Main/NarrativeDevices Main/Paratext Main/Plots Main/Settings Main/Spectacle Main/BritishTellyTropes Main/TheContributors Main/CreatorSpeak Main/Creators Main/DerivativeWorks Main/LanguageTropes Main/LawsAndFormulas Main/ShowBusiness Main/SplitPersonalityTropes Main/StockRoom Main/TropeTropes Main/Tropes Main/TruthAndLies Main/TruthInTelevision Main/BetrayalTropes Main/CensorshipTropes Main/CombatTropes Main/DeathTropes Main/FamilyTropes Main/FateAndProphecyTropes Main/FoodTropes Main/HolidayTropes Main/MemoryTropes Main/MoneyTropes Main/MoralityTropes Main/PoliticsTropes Main/ReligionTropes Main/SchoolTropes Community Showcase TV Tropes

Writing words to file

'rexml/document' not found

After installation CeWL generates an error:

Error: rexml/document gem not installed
	 use: "gem install rexml/document" to install the required gem

I fixed it by adding gem 'rexml' to the Gemfile.
My ruby version is: 3.0.1

Install cewl at ubuntu 14.04 Help

Hello trying to install cewl 'https://github.com/digininja/CeWL/releases/tag/5.2' at ubuntu 14.04 but not working.
I have follow all instructions from here: https://digi.ninja/projects/cewl.php
Some dependencies was require ruby version 2.0 and others version 2.5 however i have install a few versions and all dependencies have installed.
'bundle install' has success.
Also make directory executable but when I navigate to directory and execute:
cewl --help
I get:
No command 'cewl' found, did you mean: Command 'mewl' from package 'mew-beta-bin' (universe) Command 'mewl' from package 'mew-bin' (universe) cewl: command not found

Can you please tell me what I am doing wrong?
Thank you very much!

Cewl does nothing.

hi,
i am trying to use the newest version of cewl but when i type smth in i allways get this message:

CeWL 5.4.8 (Inclusion) Robin Wood ([email protected]) (https://digi.ninja/)

and nothing more. And i have my terminal back.

How can i get cewl to work?

(i tryed for the example the following comand: cewl -m 6 -w testlist.txt -c bbc.com)

Lithuanian characters disappear in output

When scanning a page the output comes with no lithuanian characters such as ĄČĘĖĮŠŲŪŽ, they just disappear. For example when scanning www.google.lt:
paie
Paie
lapiai
Nar
ymo
rankiai
lygos

Which I THINK should be:
paieŠka
PaieŠka
Šlapiai
Nar #this one seems to be seperated in two words with the one below because of unrecognized character
ymo
Įrankiai
sĄlygos

Tested with newest 5.4.3 version, tried one from Kali repo and from github.

Not all websites return a set?

I was a bit confused by this. It seems that some websites are immune to using cewl.

while doing google works: cewl www.google.com -m 6 -w outfile.txt but then when I tried some other sites, it was giving me no results. Example: cewl www.spiderlabs.com -m 6 -w outfile.txt, I also noticed it has the same affect for sega.com and bentley.com

Did they developed rules to counteract the spidering, or am I doing something wrong?

problem with HTML entities

I create themed crossword puzzles. For that I need wordlists. So I use CeWL to gather words from websites. I use the German language with special characters äöüß.

I noticed that when retrieving words from a website with German words (which were written as HTML entities) CeWL split the words at the HTML entities, removing the HTML entities.

Example:
The plural of potatoes in (Austrian) German is

Erdäpfel or 
Erd&auml;pfel with HTML entitiy notation

CeWL retrieved the word and split it:

Erd
pfel

Please add an option to convert HTML entities, so that words in other languages than English can also correctly retrieved.

Broken dependency: spider

It seems that on the latest version of Kali, CeWL is broken because of a depreciated function? in the spider library:

./cewl.rb -d 2 wikipedia.org
CeWL 5.4.2 (Break Out) Robin Wood ([email protected]) (https://digi.ninja/)
/usr/lib/ruby/vendor_ruby/spider/spider_instance.rb:125: warning: constant ::Fixnum is deprecated

Dependencies were installed today with gem install.

gem list --local | grep spider
spider (0.32, 0.5.1)

Anything I'm missing?

Runtime error

Hi,

I'm getting this while trying to use the tool. Kali2017.2

CLI: ./cewl.rb -d 4 -m 5 -v -w custom_dict.txt

The following error may help:
incorrect header check
/usr/lib/ruby/2.3.0/net/http/response.rb:380:in `inflate'
/usr/lib/ruby/2.3.0/net/http/response.rb:380:in `block in inflate_adapter'
/usr/lib/ruby/2.3.0/net/protocol.rb:411:in `call_block'
/usr/lib/ruby/2.3.0/net/protocol.rb:402:in `<<'
/usr/lib/ruby/2.3.0/net/protocol.rb:104:in `read'
/usr/lib/ruby/2.3.0/net/http/response.rb:402:in `read'
/usr/lib/ruby/2.3.0/net/http/response.rb:291:in `block in read_body_0'
/usr/lib/ruby/2.3.0/net/http/response.rb:262:in `inflater'
/usr/lib/ruby/2.3.0/net/http/response.rb:281:in `read_body_0'
/usr/lib/ruby/2.3.0/net/http/response.rb:202:in `read_body'
/usr/lib/ruby/2.3.0/net/http/response.rb:227:in `body'
/usr/lib/ruby/2.3.0/net/http/response.rb:164:in `reading_body'
/usr/lib/ruby/2.3.0/net/http.rb:1445:in `transport_request'
/usr/lib/ruby/2.3.0/net/http.rb:1407:in `request'
/usr/lib/ruby/2.3.0/net/http.rb:1400:in `block in request'
/usr/lib/ruby/2.3.0/net/http.rb:853:in `start'
/usr/lib/ruby/2.3.0/net/http.rb:1398:in `request'
./cewl.rb:279:in `get_page'
./cewl.rb:209:in `block (2 levels) in start!'
./cewl.rb:207:in `each'
./cewl.rb:207:in `block in start!'
./cewl.rb:195:in `each'
./cewl.rb:195:in `start!'
./cewl.rb:161:in `start_at'
./cewl.rb:757:in `block in <main>'
./cewl.rb:747:in `catch'
./cewl.rb:747:in `<main>'

Caller
./cewl.rb:231:in `get_page'
./cewl.rb:209:in `block (2 levels) in start!'
./cewl.rb:207:in `each'
./cewl.rb:207:in `block in start!'
./cewl.rb:195:in `each'
./cewl.rb:195:in `start!'
./cewl.rb:161:in `start_at'
./cewl.rb:757:in `block in <main>'
./cewl.rb:747:in `catch'
./cewl.rb:747:in `<main>'

Any idea how I should fix it up?

Thank you.

Feature Request: Quitting saves partial wordlist (or allows option to restore previous session)

It would be nice if when CeWL exit due to an interrupt it would stop spidering and finish writing the word list before it exited. For instance if I began spidering a very large site and I had the depth set pretty high, after an hour or two I might decide that it's probably collected enough words and hit Ctrl+C. Unfortunately when I do that, CeWL doesn't write anything to the output file and I have to start over with a lower depth.

Alternatively, maybe implementing some way to restore a previous run would be useful.

As an alternative approach I tried using wget to mirror a site and then using various html2text utilities, awk, grep, etc to parse out words into a list offline, which works ok but is really dependent on the parsing, which CeWL already seems to do pretty well.

"Display the usage" msg clean up

cewl.rb
between line 524 and 563

It looks a lot better and a lot less busy in the "options" area.
You always want to put the short version of an option before the long version. It should look uniformed and easy to look through.
Also fixed a few other things.

def usage
  puts "Usage: cewl [OPTIONS] ... <url>

    OPTIONS:
      -h, --help: Show help.
      -k, --keep: Keep the downloaded file.
      -d <x>,--depth <x>: Depth to spider to, default 2.
      -m, --min_word_length: Minimum word length, default 3.
      -o, --offsite: Let the spider visit other sites.
      -w, --write: Write the output to the file.
      -u, --ua <agent>: User agent to send.
      -n, --no-words: Don't output the wordlist.
      -a, --meta: include meta data.
      --meta_file file: Output file for meta data.
      -e, --email: Include email addresses.
      --email_file <file>: Output file for email addresses.
      --meta-temp-dir <dir>: The temporary directory used by exiftool when parsing files, default /tmp.
      -c, --count: Show the count for each word found.
      -v, --verbose: Verbose.
      --debug: Extra debug information.
      
      AUTHENTICATION:
      --auth_type: Digest or basic.
      --auth_user: Authentication username.
      --auth_pass: Authentication password.
      
      PROXY SUPPORT:
      --proxy_host: Proxy host.
      --proxy_port: Proxy port, default 8080.
      --proxy_username: Username for proxy, if required.
      --proxy_password: Password for proxy, if required.
      
      HEADERS:
      --header, -H: In format name:value - can pass multiple.
      
      <url>: The site to spider.
"
  exit 0
end

Inventory notification

Your tool/software has been inventoried on Rawsec's CyberSecurity Inventory.

What is Rawsec's CyberSecurity Inventory?

An inventory of tools and resources about CyberSecurity. This inventory aims to help people to find everything related to CyberSecurity.

  • Open source: Every information is available and up to date. If an information is missing or deprecated, you are invited to (help us).
  • Practical: Content is categorized and table formatted, allowing to search, browse, sort and filter.
  • Fast: Using static and client side technologies resulting in fast browsing.
  • Rich tables: search, sort, browse, filter, clear
  • Fancy informational popups
  • Badges / Shields
  • Static API
  • Twitter bot

More details about features here.

Note: the inventory is a FLOSS (Free, Libre and Open-Source Software) project.

Why?

  • Specialized websites: Some websites are referencing tools but additional information is not available or browsable. Make additional searches take time.
  • Curated lists: Curated lists are not very exhaustive, up to date or browsable and are very topic related.
  • Search engines: Search engines sometimes does find nothing, some tools or resources are too unknown or non-referenced. These is where crowdsourcing is better than robots.

Why should you care about being inventoried?

Mainly because this is giving visibility to your tool, more and more people are using the Rawsec's CyberSecurity Inventory, this helps them find what they need.

Badges

The badge shows to your community that your are inventoried. This also shows you care about your project and want it growing, that your tool is not an abandonware.

Feel free to claim your badge here: http://inventory.rawsec.ml/features.html#badges, it looks like that Rawsec's CyberSecurity Inventory, but there are several styles available.

So what?

That's all, this message is just to notify you if you care.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.