wummel / linkchecker Goto Github PK

check links in web documents or full websites

Home Page: http://wummel.github.io/linkchecker/

License: GNU General Public License v2.0

Python 70.64% JavaScript 0.05% CSS 0.03% Shell 0.14% Awk 0.02% C 22.34% PHP 0.02% Makefile 0.76% HTML 0.34% Lex 2.35% Yacc 2.77% Batchfile 0.54%

linkchecker's Introduction

LinkChecker

_ _ _

Check for broken links in web sites.

Features

recursive and multithreaded checking and site crawling
output in colored or normal text, HTML, SQL, CSV, XML or a sitemap graph in different formats
HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and local file links support
restrict link checking with regular expression filters for URLs
proxy support
username/password authorization for HTTP, FTP and Telnet
honors robots.txt exclusion protocol
Cookie support
HTML5 support
a command line and web interface
various check plugins available, eg. HTML syntax and antivirus checks.

Installation

See doc/install.txt in the source code archive. Python 2.7.2 or later is needed.

Usage

Execute linkchecker http://www.example.com. For other options see linkchecker --help.

linkchecker's People

Contributors

Stargazers

Watchers

Forkers

txstate-etc sciunto fchris82 pombredanne nadeems21 chmac bgwhite96 nikolavp hufman zearin tanamako xllgeek kyleskrinak mstock sarasan yoavshamir lemvavoulo prafullk12 srinivasb1980 parksangkil gilshwartz mark-hetherington ryankirkman harding theskumar ralphreid aperezm-vlex genba ericr86 erskineroberts xingjianpan mogui intfrr imclab lloydwatkin baconricky xinghul mephir jamesoram jsonck jssel khatryvi yogurito enriqueruiz jeschkec xvadim xuxian acrollet filipengberg selukov terryyin xarquid deveshmaheshwari shawnthompson deveshmittal tshi0912 vasili4 imovertheinternet artbit-pt-zz mikehm hhhaiai mjb4gis tobybatch jayanthsridhar msudak paul-baranowski lxghost mpacer vxsx jonhadfield flimm florianjacob daveferrara1 adamwolf fns4565 eskendertamrat rockyprz peterdavehello aucfan-ueshima jpriebe tophj-ibm davehunt kitsunde jawa9000 brayoh tacaswell bg6aer black4blade mihai-p fayazmiraz carlos-castaneda ii0 simonwebsite harry-wood spyroslangkos goblinintree jangins101 tk421 jenshnielsen browniebroke

linkchecker's Issues

Internal error

Converted from SourceForge issue 635596, submitted by int2000

******** LinkChecker internal error, bailing out ********
self.urlConnection = ftplib.FTP(self.urlTuple[1], _user,
_password)
File "/usr/lib/python2.2/ftplib.py", line 108, in init
self.connect(host)
File "/usr/lib/python2.2/ftplib.py", line 133, in connect
self.welcome = self.getresp()
File "/usr/lib/python2.2/ftplib.py", line 216, in getresp
if c not in '123':
TypeError: 'in <string>' requires character as left operand
System info:
LinkChecker 1.6.6
Python 2.2.2 (#4, Oct 15 2002, 04:21:28)
[GCC 2.95.4 20011002 (Debian prerelease)] on linux2

******** LinkChecker internal error, bailing out ********

Error: subkey DhcpDomain not found

Converted from SourceForge issue 863227, submitted by jimwright

I get this error:
subkey DhcpDomain not found

When running python linkchecker on Win 2000

Here's the complete error message

C:\Python23\Scripts>..\python linkchecker
Traceback (most recent call last):
File "linkchecker", line 34, in ?
import getopt, re, os, pprint, socket, linkcheck
File
"c:\Python23\Lib\site-packages\linkcheck__init__.py",
line 46, in ?
import UrlData
File
"c:\Python23\Lib\site-packages\linkcheck\UrlData.py",
line 27, in ?
DNS.DiscoverNameServers()
File
"c:\Python23\Lib\site-packages\linkcheck\DNS\Base.py",
line 35, in Discov
erNameServers
init_dns_resolver_nt()
File
"c:\Python23\Lib\site-packages\linkcheck\DNS\Base.py",
line 87, in init_d
ns_resolver_nt
for item in winreg.stringdisplay(key["DhcpDomain"]):
File
"c:\Python23\Lib\site-packages\linkcheck\DNS\winreg.py",
line 39, in ge
titem
raise IndexError, "subkey %s not found"%key
IndexError: subkey DhcpDomain not found

global name 'StringUtil' is not defined

Converted from SourceForge issue 573605, submitted by nobody

exceptions.NameError global name 'StringUtil' is not
defined
Traceback (most recent call last):
File
"/home/intranet/tools/lib/python2.2/site-packages/linkcheck/UrlData.py",

self._check()   File

"/home/intranet/tools/lib/python2.2/site-packages/linkc
self.logMe() File
"/home/intranet/tools/lib/python2.2/site-packages/linkch
self.config.log_newUrl(self)
File
"/home/intranet/tools/lib/python2.2/site-packages/linkcheck/Config.py",
l
File
"/home/intranet/tools/lib/python2.2/site-packages/linkcheck/log/ColoredLo
self.fd.write("| "+linkcheck._("Info")+Spaces["info"]+
NameError: global name 'StringUtil' is not defined
Systeminformation:
LinkChecker 1.5.4
Python 2.2.1 (#1, Jun 4 2002, 09:57:34)
[GCC 2.95.2 19991024 (release)] on linux2

******** LinkChecker interner Fehler, breche ab ********

Die geprüfte Seite ist eine php generierte Seite im
Firmeninternen Netz, die von ausserhalb nicht
erreichbar ist.

Fails to install on Windows (compilation error)

Converted from SourceForge issue 676017, submitted by nobody

"python setup.py install" fails with following compilation
error:

"C:\Program Files\Microsoft Visual Studio\VC98
\BIN\cl.exe" /c /nologo /Ox /MD /W3 /GX -
DYY_NO_UNISTD_H -Ilinkcheck/parser -IC:\Python22
\include /Tclinkcheck/parser/htmllex.c /Fobuild\temp.win
32-2.2\Release\htmllex.obj

I manually compiled the file adding

-DYY_NO_UNISTD_H

to list of compiler flags and all seemed well (apart from
a couple of warnings).

This is on WinXP with Python 2.2.1

Robert
[email protected]

socket.error (10056, 'Socket is already

Converted from SourceForge issue 481565, submitted by dvarner

I am getting the following errors running Linkchecker
on an NT box...

URL s/2690
Name Personal Email
Parent URL http://www.yahoo.com, line 28
Base http://www.yahoo.com/
Real URL http://www.yahoo.com/s/2690
Check Time 0.060 seconds
Result Error: (10056, 'Socket is already
connected')
socket.error (10056, 'Socket is already connected')

URL r/m5
Name Yahoo! Mail
Parent URL http://www.yahoo.com, line 28
Base http://www.yahoo.com/
Real URL http://www.yahoo.com/r/m5
Check Time 0.050 seconds
Result Error: (10056, 'Socket is already
connected')

...

Have you seen these errors running Linkchecker on a
Windoze platform? I always get a Socket error. Is this
just me? I am on an NT box.

Thanks,
Drew

internal error

Converted from SourceForge issue 654249, submitted by nobody

~$ linkchecker -r 2 -i ailab -qF html
http://www.ifi.unizh.ch/ailab/

********** Oops, I did it again. *************

You have found an internal error in LinkChecker. Please write a
bug report
at
http://sourceforge.net/tracker/?func=add&group_id=1913&atid=101913
or send mail to and include the following
information:

The URL or file you are testing
Your commandline arguments and/or configuration.
The system information below.

If you disclose some information because its too private to you
thats ok.
I will try to help you nontheless (but you have to give me
something
I can work with ;).

exceptions.TypeError putrequest() got an unexpected keyword
argument
'skip_host'
Traceback (most recent call last):
File "/usr/lib/python2.1/site-packages/linkcheck/UrlData.py",
line 260, in
check
self._check()
File "/usr/lib/python2.1/site-packages/linkcheck/UrlData.py",
line 317, in
_check
self.checkConnection()
File
"/usr/lib/python2.1/site-packages/linkcheck/HttpUrlData.py", line
95,
in checkConnection
response = self._getHttpResponse()
File
"/usr/lib/python2.1/site-packages/linkcheck/HttpUrlData.py", line
220,
in _getHttpResponse
self.urlConnection.putrequest(method, path, skip_host=1)
TypeError: putrequest() got an unexpected keyword argument
'skip_host'
System info:
LinkChecker 1.6.6
Python 2.1.3 (#1, Jul 29 2002, 22:34:51)
[GCC 2.95.4 20011002 (Debian prerelease)] on linux2

******** LinkChecker internal error, bailing out ********

----- The following addresses had permanent fatal errors -----
<[email protected]>
(reason: 550-Envelope sender verification failed)

----- Transcript of session follows -----
... while talking to mail.sourceforge.net.:
>>> DATA
<<< 550-Envelope sender verification failed
<<< 550 rejected: there is no valid sender in any header line
(envelope sender is <[email protected]>). Your mail
server returned: response from externalmx.valinux.com
[198.186.202.147] was 550 authentication required.
554 5.0.0 Service unavailable

Does not install

Converted from SourceForge issue 612030, submitted by nobody

Maybe this is not a bug, but anyway. I downloaded
LinkChecker 1.6.2. I followed the instructions on howto
install on a LINUX system. The command "python
setup.py install" bailed out with a not of missing
module "distutils.core". I then then inspected the python
module and found that its need several modules named
distutils.*. Then I searched the system for any module
with a leading name of "distutils". Nothing was found.

Is i a missing modules in the distribution?

Best regards

Bent Vangli

quoting error

Converted from SourceForge issue 776416, submitted by htrd

I seeing errors in version 1.8.19 when urls contain a space
character. The referring page has this correctly quoted as
%20, but linkchecker is sending a single space character in
its http request.

line 241 of HttpUrlData.py contains a comment:

quote parts before submit

but it uses map in a way that leaves self.urlparts
unchanged. (map returns a new list - it does not mutate
the parameter)

patch attached

TAB-Allingment Error in HttpUrlData.py

Converted from SourceForge issue 833419, submitted by volkerjaenisch

Checking a Site which using heavily the apache mod_rewrite module I
have found the following:
$ linkchecker -v http://unterhalt.selbmann-bergert.de
LinkChecker 1.9.4 Copyright © 2000-2003 Bastian
Kleineidam
[..]
URL http://unterhalt.selbmann-bergert.de
Tats. URL
http://www.selbmann-bergert.de/dynamic/gebiete/Familienrecht_u.
Erbrecht/03_gebiete_details_KK.html
Prüfzeit 0.908 Sekunden
Warnung URL Pfad ist leer, verwende '/'
Effektive URL
http://www.selbmann-bergert.
de/dynamic/gebiete/Familienrecht_u.
Erbrecht/03_gebiete_details_KK.html
Ergebnis Fehler: 404 Not Found

That result is definitivly wrong, since the effective URL exists.
Looking at the responsible code in HttpUrlData.py I think I found
an TAB-Allignment-Error:

The Block (starting line 243)
# check url warnings
effectiveurl = urlparse.urlunsplit(self.urlparts)
if self.url != effectiveurl:
self.setWarning(i18n._("Effective URL %s")
% effectiveurl)
self.url = effectiveurl
# check response
self.checkResponse(response)

is outside the while-loop - causing the line
self.url = effectiveurl
to be to be of no effect.

Yust indenting the block one TAB right solved the problem.

Thank you for your nice Program, I like it very much. Hope my hint
is valueable for you - let me know

Volker

inqbus it-consulting +49 ( 341 ) 5643800
Dr. Volker Jaenisch http://www.inqbus.de
Herloßsohnstr. 12 0 4 1 5 5 Leipzig

N O T - F Ä L L E +49 ( 170 ) 3113748

ssl.so build incorrectly

Converted from SourceForge issue 204146, submitted by stinnux

The setup.py builds a wrong ssl.so on my system
(SuSE Linux 6.4, Python 1.5.2, OpenSSL 0.9.4)

It seems that the libssl and libcrypto are not included correctly.

I fixed it by linking the ssl.so manually with the following command:

gcc -shared -L/usr/local/ssl/lib -lssl -lcrypto ssl.o -o build/platlib/ssl.so

In /usr/local/ssl/lib are libssl.a and libcrypto.a, no shared object files.

Trying to link them against the .so files results in undefined symbols from python.

I'm not a pyhton guru and don't have much knowledge about distutil, so i cannot really help more on this.

Error in Linkchecker when Using authentication

Converted from SourceForge issue 204050, submitted by nobody

At line 254 in the linkchecker file it says

congif["authentication"](and so on)

this should obviously be:

config["authentication"](and so on)

(This is Version 1.1.4).

robots.txt wild goose-chase results in 'nonnumeric port' err

Converted from SourceForge issue 836864, submitted by mike_j_brown

LinkChecker 1.9.5
Python 2.3.2 (#1, Oct 30 2003, 04:49:57)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4

One of the links in an HTML doc I was checking is
http://users.compaqnet.be/avalon/

Their server is misconfigured, such that trying to
get /robots.txt
results in a series of redirects that eventually become
recursive:

A request for /robots.txt results in a 302 with
Location: http://www.compaqnet.be/index.html?
l1=search&a=error404&error=http://users.compaqnet.be/r
obots.txt

A request for /index.html?
l1=search&a=error404&error=http://users.compaqnet.be/r
obots.txt
results in a 302 with
Location: http://www.compaqnet.be/index.html?
l1=search&a=error404&error=http://users.compaqnet.be/i
ndex.html

A request for /index.html?
l1=search&a=error404&error=http://users.compaqnet.be/i
ndex.html
results in a 302 with
Location: http://www.compaqnet.be/index.html?
l1=search&a=error404&error=http://users.compaqnet.be/i
ndex.html
(same as before).

Somehow, this leads to httplib thinking that it is dealing
with this
string as a host and port:
'www.belgacom.netindex.html?
l1=search&a=error404&error=http:'

I had to put a print into HTTPConnection._set_hostport
() to see this.

The traceback that Linkchecker produces reveals the
trouble
that httplib has in trying to parse that string.

Traceback (most recent call last):
File "/usr/local/lib/python2.3/site-
packages/linkcheck/UrlData.py", line 285, in check
self._check()
File "/usr/local/lib/python2.3/site-
packages/linkcheck/UrlData.py", line 346, in _check
self.checkConnection()
File "/usr/local/lib/python2.3/site-
packages/linkcheck/HttpUrlData.py", line 110, in
checkConnection
if not self.robotsTxtAllowsUrl():
File "/usr/local/lib/python2.3/site-
packages/linkcheck/HttpUrlData.py", line 396, in
robotsTxtAllowsUrl
rp.read()
File "/usr/local/lib/python2.3/site-
packages/linkcheck/robotparser2.py", line 70, in read
f = _opener.open(req)
File "/usr/local/lib/python2.3/urllib2.py", line 333, in
open
'_open', req)
File "/usr/local/lib/python2.3/urllib2.py", line 313, in
_call_chain
result = func(_args)
File "/usr/local/lib/python2.3/site-
packages/linkcheck/robotparser2.py", line 307, in
http_open
return decode(urllib2.HTTPHandler.http_open(self,
req))
File "/usr/local/lib/python2.3/urllib2.py", line 849, in
http_open
return self.do_open(httplib.HTTP, req)
File "/usr/local/lib/python2.3/urllib2.py", line 843, in
do_open
return self.parent.error('http', req, fp, code, msg,
hdrs)
File "/usr/local/lib/python2.3/urllib2.py", line 353, in error
result = self._call_chain(_args)
File "/usr/local/lib/python2.3/urllib2.py", line 313, in
_call_chain
result = func(_args)
File "/usr/local/lib/python2.3/urllib2.py", line 479, in
http_error_302
return self.parent.open(new)
File "/usr/local/lib/python2.3/urllib2.py", line 333, in
open
'_open', req)
File "/usr/local/lib/python2.3/urllib2.py", line 313, in
_call_chain
result = func(_args)
File "/usr/local/lib/python2.3/site-
packages/linkcheck/robotparser2.py", line 307, in
http_open
return decode(urllib2.HTTPHandler.http_open(self,
req))
File "/usr/local/lib/python2.3/urllib2.py", line 849, in
http_open
return self.do_open(httplib.HTTP, req)
File "/usr/local/lib/python2.3/urllib2.py", line 843, in
do_open
return self.parent.error('http', req, fp, code, msg,
hdrs)
File "/usr/local/lib/python2.3/urllib2.py", line 353, in error
result = self._call_chain(_args)
File "/usr/local/lib/python2.3/urllib2.py", line 313, in
_call_chain
result = func(_args)
File "/usr/local/lib/python2.3/urllib2.py", line 479, in
http_error_302
return self.parent.open(new)
File "/usr/local/lib/python2.3/urllib2.py", line 333, in
open
'_open', req)
File "/usr/local/lib/python2.3/urllib2.py", line 313, in
_call_chain
result = func(*args)
File "/usr/local/lib/python2.3/site-
packages/linkcheck/robotparser2.py", line 307, in
http_open
return decode(urllib2.HTTPHandler.http_open(self,
req))
File "/usr/local/lib/python2.3/urllib2.py", line 849, in
http_open
return self.do_open(httplib.HTTP, req)
File "/usr/local/lib/python2.3/urllib2.py", line 808, in
do_open
h = http_class(host) # will parse host:port
File "/usr/local/lib/python2.3/httplib.py", line 986, in
init
self._setup(self._connection_class(host, port, strict))
File "/usr/local/lib/python2.3/httplib.py", line 507, in
init
self._set_hostport(host, port)
File "/usr/local/lib/python2.3/httplib.py", line 519, in
_set_hostport
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
InvalidURL: nonnumeric port: ''

exceptions.AttributeError addinfourl instance has no attribu

Converted from SourceForge issue 674391, submitted by mkalastro

Hi Bastian,

Thanks for the great product! I've used previous
versions and this is the first error I've seen. You
can reproduce the error by using lconline from:
<http://www2.soe.ucsc.edu/linkchecker/>, which uses the
default configuration.

exceptions.AttributeError addinfourl instance has no
attribute 'readlines'
Traceback (most recent call last):
File
"/home/local/python-2.2.2/lib/python2.2/site-packages/linkcheck/UrlData.py",
line 202, in check
self._check()
File
"/home/local/python-2.2.2/lib/python2.2/site-packages/linkcheck/HttpsUrlData.py",
line 36, in _check
HttpUrlData._check(self)
File
"/home/local/python-2.2.2/lib/python2.2/site-packages/linkcheck/UrlData.py",
line 258, in _check
self.checkConnection()
File
"/home/local/python-2.2.2/lib/python2.2/site-packages/linkcheck/HttpUrlData.py",
line 91, in checkConnection
if self.config["robotstxt"] and not
self.robotsTxtAllowsUrl():
File
"/home/local/python-2.2.2/lib/python2.2/site-packages/linkcheck/HttpUrlData.py",
line 289, in robotsTxtAllowsUrl
rp.read()
File "/var/tmp/python/lib/python2.2/robotparser.py",
line 44, in read
lines = f.readlines()
AttributeError: addinfourl instance has no attribute
'readlines'
System info:
LinkChecker 1.8.5
Python 2.2.2 (#1, Jan 23 2003, 14:28:02)
[GCC 2.95.2 19991024 (release)] on sunos5
LC_MESSAGES = 'C'

exceptions.MemoryError

Converted from SourceForge issue 863220, submitted by nobody

linkchecker -r999 -s http://www.agrussell.com/
LinkChecker 1.9.3

System is 2 way SMP Dell, with 1 gig of mem
running FreeBSD 4.8 with security patches
Ran for over an hour, site has about 4800 links.

Thank you for your time.

A.G.

Start checking at 2003-12-19 15:32:35-005

********** Oops, I did it again. *************

exceptions.MemoryError
Traceback (most recent call last):
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 281, in check
Traceback (most recent call last):
File "/usr/local/bin/linkchecker", line 440, in ?
linkcheck.checkUrls(config)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/init.py",
line 62, in checkUrls
config.checkUrl(config.getUrl())
File
"/usr/local/lib/python2.3/site-packages/linkcheck/Config.py",
line 339, in checkUrl_Threads
self.threader.startThread(url.check, ())
File
"/usr/local/lib/python2.3/site-packages/linkcheck/Threader.py",
line 56, in startThread
t.start()
File "/usr/local/lib/python2.3/threading.py", line
410, in start
_start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread

Unhandled exception in thread started by <bound method
Thread.__bootstrap of <Thread(Thread-1343, stopped)>>
Traceback (most recent call last):
File "/usr/local/lib/python2.3/threading.py", line
444, in __bootstrap
_print_exc(file=s)
File "/usr/local/lib/python2.3/traceback.py", line
210, in print_exc
print_exception(etype, value, tb, limit, file)
File "/usr/local/lib/python2.3/traceback.py", line
123, in print_exception
print_tb(tb, limit, file)
File "/usr/local/lib/python2.3/traceback.py", line
68, in print_tb
line = linecache.getline(filename, lineno)
File "/usr/local/lib/python2.3/linecache.py", line
14, in getline
lines = getlines(filename)
File "/usr/local/lib/python2.3/linecache.py", line
40, in getlines
return updatecache(filename)
File "/usr/local/lib/python2.3/linecache.py", line
93, in updatecache
lines = fp.readlines()
MemoryError

HTTP Proxy Authentication fails

Converted from SourceForge issue 568099, submitted by calvin

If you set http_proxy="http://user:pass@host:port/" then
linkchecker reports an error for every http url.

This is unfortunately a Python bug: see
https://sourceforge.net/tracker/index.php?func=detail&aid=527518&group_id=5470&atid=305470

too much fetching

Converted from SourceForge issue 776851, submitted by htrd

I am seeing a problem when checking a site that contains:

An html page that contains a link to a second url
The second url responds with a 302 redirect back to the
first page.

linkchecker fetches both urls continuously until its recursion
limit is reached.

debug output attached. I will keep these pages available
for a while, but please dont hammer that server harder
than necessary.

Ignores BASE HREF in matching intern/extern

Converted from SourceForge issue 768661, submitted by nobody

The 'intern' / 'extern' matching appears to operate on
the 'URL' rather than 'Real URL' (after base has been
applied). This makes it impossible to properly check a
site that makes use of BASE HREF without also spidering
the rest of the internet.

no or incomplete caching of URLs where checking failed

Converted from SourceForge issue 205659, submitted by mschmitz

When I checked my intranet homepage today I saw the following lines
in linkcheckers output.

URL http://madeira/~mschmitz/usage/index.html
Parent URL http://madeira/HomeDirs/mschmitz/index.html, line 35
Real URL http://madeira/HomeDirs/mschmitz/usage/index.html
Check Time 0.413 seconds
Warning Effective URL
http://madeira/HomeDirs/mschmitz/usage/index.html
Result Error: 404 Not Found

URL http://madeira/~mschmitz/usage/index.html
Parent URL http://madeira/HomeDirs/mschmitz/, line 35
Real URL http://madeira/HomeDirs/mschmitz/usage/index.html
Check Time 0.153 seconds
Warning Effective URL
http://madeira/HomeDirs/mschmitz/usage/index.html
Result Error: 404 Not Found

This indicates that linkchecker did not cache the URL since "(cached)"
is missing in the output of the second visit of http://madeira/~mschmitz/usage/index.html

Need to print &quot;Content-Type&quot; earlier in lc.cgi

Converted from SourceForge issue 784331, submitted by hallcp

You need to print "Content-type: text/html" as the
first thing in lc.cgi. As it is now, when there is an
error, the error message is sent before the
Content-type, and the Apache server will not transmit
the page.

It's best to always print the Content-type as early as
possible.

Thanks,

Charles Hall

Problem when killing linkchecker

Converted from SourceForge issue 204205, submitted by stinnux

When i try to kill a running linkchecker (by using Ctrl-C for example) it doesn't disappear correctly.

It says "Stopped checking..." but stands there for a couple of seconds. It eventually really stops then.

This seems to be related to the number of threads that are running (the more threads the longer you wait).

1.8.19 fails to compile on FreeBSD 4.8.

Converted from SourceForge issue 769328, submitted by markjugg

It fails like this:

building 'linkcheck.parser.htmlsax' extension
cc -DNDEBUG -O -pipe -D_THREAD_SAFE -fPIC -Ilinkcheck/
parser -I/usr/local/include/python2.2 -c linkcheck/parser/
htmllex.c -o build/temp.freebsd-4.8-RELEASE-i386-2.2/
htmllex.o
htmllex.c:36: stdint.h: No such file or directory
error: command 'cc' failed with exit status 1

DNS problems with NT

Converted from SourceForge issue 479829, submitted by dvarner

I downloaded linkchecker then ran:

python setup.py config
python setup.py build
python setup.pu install

I used:

python linkchecker

on my NT box ...

Traceback (most recent call last):
File "linkchecker", line 24, in ?
from linkcheck import timeoutsocket
File "linkcheck__init__.py", line 48, in ?
import Config,UrlData,sys,lc_cgi
File "linkcheck\UrlData.py", line 472, in ?
from MailtoUrlData import MailtoUrlData
File "linkcheck\MailtoUrlData.py", line 32, in ?
DNS.init_dns_resolver()
File "DNS\Base.py", line 31, in init_dns_resolver
init_dns_resolver_nt()
File "DNS\Base.py", line 76, in init_dns_resolver_nt
defaults['nameserver'].append(server)
KeyError: nameserver

I changed line 76 in DNS\Base.py to ...
(4 tabs)defaults['server'].append(server)

I think this should be patched.

Re-ran it and I got:
C:\linkchecker-1.3.6>linkchecker
Traceback (most recent call last):
File "linkchecker", line 24, in ?
from linkcheck import timeoutsocket
File "linkcheck__init__.py", line 48, in ?
import Config,UrlData,sys,lc_cgi
File "linkcheck\UrlData.py", line 472, in ?
from MailtoUrlData import MailtoUrlData
File "linkcheck\MailtoUrlData.py", line 32, in ?
DNS.init_dns_resolver()
File "DNS\Base.py", line 31, in init_dns_resolver
init_dns_resolver_nt()
File "DNS\Base.py", line 84, in init_dns_resolver_nt
key = winreg.handle_key(winreg.HKEY_LOCAL_MACHINE,
AttributeError: 'DNS.winreg' module has no
attribute 'handle_key'

In DNS\winreg.py, it looks like this should be
key_handle? Is this right?

I replaced all occurrences of handle_key with
key_handle in DNS\winreg.py.

handle_key is found on lines 65, 69, 84 and 96 of
DNS\winreg.py

The program will execute once these are changed. Can
we patch this? DNS/Base.py distributed with
linkchecker appears different than the latest version
in CVS for PyDNS at SourceForge.

Thanks,
Drew

Bug report about "'Name or service not known"?

Converted from SourceForge issue 729007, submitted by doggy8088

The URL or file you are testing

http://www2.nestle-baby.com.tw

Your commandline arguments and/or configuration.

linkchecker -C --no-anchor-caching -r5 -ohtml
http://www2.nestle-baby.com.tw

The system information below.

URL /member/member5.asp
Parent URL http://www2.nestle-baby.com.tw/, line 113,
col 11
Real URL http://www2.nestle-
baby.com.tw/member/member5.asp
Check Time 7.652 seconds
Result Error: (-2, 'Name or service not known')

The URL "http://www2.nestle-
baby.com.tw/member/member5.asp" is a form's action
url. It's a valid link, but what's the "'Name or service not
known'"'s mean? I don't understand?! Is it a bug?

linkchecker-1.2.2 does not print correct line numbers

Converted from SourceForge issue 205658, submitted by mschmitz

Some of the HTML files I checked with linkchecker contain comments
like the following

Linkchecker does not handle this type of comments correctly since it
seems to assume that all comments are just single line comments. In
the example above all messages concerning lines following the block comment contain line numbers that differ by 3 from the correct line
number.

internal error message

Converted from SourceForge issue 765016, submitted by saadiq

linkchecker -e '^?' -i zipsell.com -o html -r 4 -W 500 http://
www.zipsell.com/ > public_html/index.html

********** Oops, I did it again. *************

exceptions.AttributeError SMTP instance has no attribute
'read'
Traceback (most recent call last):
File "/usr/lib/python2.2/site-packages/linkcheck/
UrlData.py", line 203, in check
self._check()
File "/usr/lib/python2.2/site-packages/linkcheck/
UrlData.py", line 271, in _check
try: self.checkContent(warningregex)
File "/usr/lib/python2.2/site-packages/linkcheck/
UrlData.py", line 398, in checkContent
match = warningregex.search(self.getContent())
File "/usr/lib/python2.2/site-packages/linkcheck/
UrlData.py", line 389, in getContent
self.data = self.urlConnection.read()
AttributeError: SMTP instance has no attribute 'read'
System info:
LinkChecker 1.8.18
Python 2.2.3 (#1, Jun 4 2003, 02:54:59)
[GCC 3.3 (Debian)] on linux2

******** LinkChecker internal error, bailing out ********

mailto: parsing is too restrictive

Converted from SourceForge issue 204875, submitted by calvin

it does not parse arbitrary chars in the ?subject= line

crash

Converted from SourceForge issue 663804, submitted by nobody

[email protected]

The URL or file you are testing
http://pcsiwa12.rett.polimi.it/~phdweb/eng/index.htm
Your commandline arguments and/or configuration.

linkchecker -a -r 3
http://pcsiwa12.rett.polimi.it/~phdweb/eng/index.htm

The system information below.

httplib.BadStatusLine Exception in thread Thread-421:
Traceback (most recent call last):
File "/usr/lib/python2.2/threading.py", line 408, in
__bootstrap
self.run()
File "/usr/lib/python2.2/threading.py", line 396, in run
apply(self.__target, self.__args, self.__kwargs)
File
"/usr/lib/python2.2/site-packages/linkcheck/UrlData.py",
line 202, in check
internal_error()
File
"/usr/lib/python2.2/site-packages/linkcheck/UrlData.py",
line 49, in internal_error
print >> sys.stderr, type, value
AttributeError: BadStatusLine instance has no attribute
'args'

ftp proxy ignored

Converted from SourceForge issue 622618, submitted by calvin

An FTP proxy setting will be ignored.

Fails to check links in Zope

Converted from SourceForge issue 202482, submitted by rasputen

If you run linkchecker on a Zope site, it returns without having checked anything. Here's the output for a run on zope.org:

./linkchecker -v http://www.zope.org/
LinkChecker
LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it
under certain conditions. Look at the file `LICENSE' whithin this
distribution.
Get the newest version at http://linkchecker.sourceforge.net/
Write comments and bugs to [email protected]

Start checking at 02.03.2000 15:10:18

URL http://www.zope.org/
Real URL http://www.zope.org/
Result Valid: 200 OK

Thats it. 0 warnings, 0 errors found.
Stopped checking at 02.03.2000 15:10:19

Bug report of LinkChecker

Converted from SourceForge issue 728315, submitted by doggy8088

The URL or file you are testing

http://www.nestle-baby.com.tw/member/register.htm

Your commandline arguments and/or configuration.

linkchecker -osql -r20 --no-anchor-caching --intern='!
^mailto:' -C http://www.nestle-
baby.com.tw/member/register.htm | mysql -u root test

The system information below.

exceptions.TypeError can only concatenate tuple
(not "list") to tuple
Traceback (most recent call last):
File "/usr/lib/python2.2/site-
packages/linkcheck/UrlData.py", line 202, in check
self._check()
File "/usr/lib/python2.2/site-
packages/linkcheck/UrlData.py", line 291, in _check
self.putInCache()
File "/usr/lib/python2.2/site-
packages/linkcheck/UrlData.py", line 304, in putInCache
cacheKey = self.getCacheKey()
File "/usr/lib/python2.2/site-
packages/linkcheck/UrlData.py", line 315, in
getCacheKey
return urlparse.urlunsplit(self.urlparts[:4]+[''])
TypeError: can only concatenate tuple (not "list") to tuple
System info:
LinkChecker 1.8.11
Python 2.2.2 (#1, Feb 24 2003, 19:13:11)
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-4)] on linux2

output shows "Error: len() of unsized object" w/anchor

Converted from SourceForge issue 204259, submitted by mherbene

Appears to dislike A tags that go to anchors elsewhere in the page.

Sample html file:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<html>
<head>
<title>LinkChecker len error test</title>
</head>

<body>

<a name="my_target"></a>
Here is the target anchor.

Here is some text. 
Here is some text. 
Here is some text. 
Here is some text. 
Here is some text. 
Here is some text. 
Here is some text. 
Here is some text. 
Here is some text. 
Here is some text. 
Here is some text. 
Here is some text. 
Here is some text. 
Here is some text. 
Here is some text. 
Here is some text. 
Here is some text. 
Here is some text. 
Here is some text. 
Here is some text. 
Here is some text. 
Here is some text. 
Here is some text.

<a href="#my_target">This</a> link goes back to the target.

</body>
</html>

Results file:

<html><head><title>LinkChecker</title></head><body bgcolor="#fff7e5" link="#191c83" vlink="#191c83" alink="#191c83"><center><h2>LinkChecker</center></h2> <blockquote>LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it
under certain conditions. Look at the file `LICENSE' whithin this
distribution. Start checking at 12.04.2000 19:03:49 <table align=left border="0" cellspacing="0" cellpadding="1" bgcolor="#000000"><tr><td><table align=left border="0" cellspacing="0" cellpadding="3" bgcolor="#fff7e5"><tr><td bgcolor="#dcd5cf">URL</td><td bgcolor="#dcd5cf">#my_target</td></tr>
<tr><td>Parent URL</td><td><a href="http://kdewebdev.kde.state.ky.us/martin/linkchecker_test.html">http://kdewebdev.kde.state.ky.us/martin/linkchecker_test.html</a> line 37</td></tr>
<tr><td>Real URL</td><td><a href="http://kdewebdev.kde.state.ky.us/martin/linkchecker_test.html#my_target">http://kdewebdev.kde.state.ky.us/martin/linkchecker_test.html#my_target</a></td></tr>
<tr><td>D/L Time</td><td>0.197 seconds</td></tr>
<tr><td>Check Time</td><td>0.424 seconds</td></tr>
<tr><td bgcolor="db4930">Result</td><td bgcolor="db4930">Error: len() of unsized object</td></tr>
</table></td></tr></table> Thats it. 0 warnings, 1 error found. Stopped checking at12.04.2000 19:03:50 (1.077 seconds)</blockquote> <hr noshade size=1>LinkChecker 1.2.1, Copyright © 2000 by Bastian Kleineidam Get the newest version at <a href="http://linkchecker.sourceforge.net/">http://linkchecker.sourceforge.net/</a>. Write comments and bugs to <a href="mailto:[email protected]">[email protected]</a>.</body></html>

Bug report about Internal Anchor

Converted from SourceForge issue 728709, submitted by doggy8088

Initial Comment:

The URL or file you are testing

http://www2.nestle-baby.com.tw

Your commandline arguments and/or configuration.

linkchecker -C --no-anchor-caching -r5 -ohtml
http://www2.nestle-baby.com.tw

The system information below.

URL http://www.nestle-baby.com.tw/about/about.htm#0
Parent URL http://www2.nestle-
baby.com.tw/mapabout/aboutus.asp, line 238, col 37
Real URL http://www.nestle-
baby.com.tw/about/about.htm#0
Check Time 0.420 seconds
Result Error: 404 Not Found

I think the "internal page anchor" ( such as #top or

somename ) should not be an error even the anchor is

not existed. Because the browser will jump to the top of
the page if the named anchor not exist by default.

LinkCheckerException in UrlData.py

Converted from SourceForge issue 203509, submitted by wavy

I did:
./linkchecker -o colored -r 99 -i ".whitbread.co" www.corporate.whitbread.co.uk

and linkchecker trolled through the site, but after a while I got loads of :

Exception in thread Thread-89076:
Traceback (innermost last):
File "/var/tmp/python-root/usr/lib/python1.5/threading.py", line 376, in __bootstrap
self.run()
File "/var/tmp/python-root/usr/lib/python1.5/threading.py", line 364, in run
apply(self.__target, self.__args, self.__kwargs)
File "./linkcheck/UrlData.py", line 107, in check
except LinkCheckerException, msg:
NameError: LinkCheckerException

for different threads, finally finishing with the message:

Fatal Python error: PyThreadState_Delete: invalid tstate
Aborted

I then tried to capture the entire output to a file:

./linkchecker -o colored -r 99 -i &quot;.whitbread.co&quot; www.corporate.whitbread.co.uk &gt; lc.log  2&gt;&amp;1

and linkchecker trolled through the site, but after a while I got:

Segmentation fault

I guess this is not your doing though :) - there where no errors in the log

i have:
Python 1.5.2 (#1, Sep 17 1999, 20:15:36) [GCC egcs-2.91.66 19990314/Linux (egcs- on linux-i386

Updated to latest 1.10.1, new error...

Converted from SourceForge issue 864516, submitted by arussell

bifrost:/home/arussell$ cat linkchecker
linkchecker -r999 -s http://www.agrussell.com/
LinkChecker 1.10.1 Copyright � 2000-2003
Bastian Kleineidam
LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to
redistribute it
under certain conditions. Look at the file `LICENSE'
within this
distribution.
Get the newest version at
http://linkchecker.sourceforge.net/
Write comments and bugs to [email protected]

Start checking at 2003-12-19 17:01:53-005

********** Oops, I did it again. *************

You have found an internal error in LinkChecker. Please
write a bug report
at
http://sourceforge.net/tracker/?func=add&group_id=1913&atid=101913
or send mail to [email protected] and
include the following information:

The URL or file you are testing
Your commandline arguments and/or configuration.
The system information below.

If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).

exceptions.MemoryError
Traceback (most recent call last):
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 287, in check

********** Oops, I did it again. *************

The URL or file you are testing
Your commandline arguments and/or configuration.
The system information below.

If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).

exceptions.MemoryError
Traceback (most recent call last):
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 287, in check
self._check()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 372, in _check
try: self.parseUrl()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 524, in parseUrl
self.parse_html();
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 543, in parse_html
h = LinkParser(self.getContent())
File
"/usr/local/lib/python2.3/site-packages/linkcheck/linkparse.py",
line 72, in init
self.feed(self.content)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/linkparse.py",
line 89, in startElement
name = linkname.href_name(self.content[self.pos():])
MemoryError
System info:
LinkChecker 1.10.1
Python 2.3.2 (#1, Oct 24 2003, 14:36:59)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4

******** LinkChecker internal error, bailing out ********
Traceback (most recent call last):
File "/usr/local/bin/linkchecker", line 449, in ?
linkcheck.checkUrls(config)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/init.py",
line 63, in checkUrls
config.checkUrl(config.getUrl())
File
"/usr/local/lib/python2.3/site-packages/linkcheck/Config.py",
line 339, in checkUrl_Threads
self.threader.startThread(url.check, ())
File
"/usr/local/lib/python2.3/site-packages/linkcheck/Threader.py",
line 56, in startThread
t.start()
File "/usr/local/lib/python2.3/threading.py", line
410, in start
_start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread

self._check()

File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 372, in _check
try: self.parseUrl()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 524, in parseUrl
self.parse_html();
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 555, in parse_html
line=line, column=column, name=name))
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 144, in GetUrlDataFrom
line=line, column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/HttpUrlData.py",
line 47, in init
column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 199, in init
self.extern = (1, 0)
MemoryError
System info:
LinkChecker 1.10.1
Python 2.3.2 (#1, Oct 24 2003, 14:36:59)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4

******** LinkChecker internal error, bailing out ********

********** Oops, I did it again. *************

The URL or file you are testing
Your commandline arguments and/or configuration.
The system information below.

If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).

******** LinkChecker internal error, bailing out ********

********** Oops, I did it again. *************

The URL or file you are testing
Your commandline arguments and/or configuration.
The system information below.

If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).

exceptions.MemoryError
Traceback (most recent call last):
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 287, in check
self._check()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 372, in _check
try: self.parseUrl()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 524, in parseUrl
self.parse_html();
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 555, in parse_html
line=line, column=column, name=name))
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 144, in GetUrlDataFrom
line=line, column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/HttpUrlData.py",
line 47, in init
column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 199, in init
self.extern = (1, 0)
MemoryError
System info:
LinkChecker 1.10.1
Python 2.3.2 (#1, Oct 24 2003, 14:36:59)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4

******** LinkChecker internal error, bailing out ********

********** Oops, I did it again. *************

The URL or file you are testing
Your commandline arguments and/or configuration.
The system information below.

If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).

******** LinkChecker internal error, bailing out ********

********** Oops, I did it again. *************

The URL or file you are testing
Your commandline arguments and/or configuration.
The system information below.

If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).

exceptions.MemoryError
Traceback (most recent call last):
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 287, in check
self._check()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 372, in _check
try: self.parseUrl()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 524, in parseUrl
self.parse_html();
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 555, in parse_html
line=line, column=column, name=name))
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 144, in GetUrlDataFrom
line=line, column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/HttpUrlData.py",
line 47, in init
column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 199, in init
self.extern = (1, 0)
MemoryError
System info:
LinkChecker 1.10.1
Python 2.3.2 (#1, Oct 24 2003, 14:36:59)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4

******** LinkChecker internal error, bailing out ********

********** Oops, I did it again. *************

The URL or file you are testing
Your commandline arguments and/or configuration.
The system information below.

If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).

exceptions.MemoryError
Traceback (most recent call last):
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 287, in check
self._check()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 372, in _check
try: self.parseUrl()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 524, in parseUrl
self.parse_html();
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 555, in parse_html
line=line, column=column, name=name))
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 144, in GetUrlDataFrom
line=line, column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/HttpUrlData.py",
line 47, in init
column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 199, in init
self.extern = (1, 0)
MemoryError
System info:
LinkChecker 1.10.1
Python 2.3.2 (#1, Oct 24 2003, 14:36:59)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4

******** LinkChecker internal error, bailing out ********

********** Oops, I did it again. *************

The URL or file you are testing
Your commandline arguments and/or configuration.
The system information below.

If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).

exceptions.MemoryError
Traceback (most recent call last):
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 287, in check
self._check()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 372, in _check
try: self.parseUrl()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 524, in parseUrl
self.parse_html();
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 555, in parse_html
line=line, column=column, name=name))
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 144, in GetUrlDataFrom
line=line, column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/HttpUrlData.py",
line 47, in init
column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 199, in init
self.extern = (1, 0)
MemoryError
System info:
LinkChecker 1.10.1
Python 2.3.2 (#1, Oct 24 2003, 14:36:59)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4

******** LinkChecker internal error, bailing out ********

********** Oops, I did it again. *************

The URL or file you are testing
Your commandline arguments and/or configuration.
The system information below.

If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).

******** LinkChecker internal error, bailing out ********

********** Oops, I did it again. *************

The URL or file you are testing
Your commandline arguments and/or configuration.
The system information below.

If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).

exceptions.MemoryError
Traceback (most recent call last):
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 287, in check
self._check()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 372, in _check
try: self.parseUrl()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 524, in parseUrl
self.parse_html();
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 555, in parse_html
line=line, column=column, name=name))
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 144, in GetUrlDataFrom
line=line, column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/HttpUrlData.py",
line 47, in init
column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 199, in init
self.extern = (1, 0)
MemoryError
System info:
LinkChecker 1.10.1
Python 2.3.2 (#1, Oct 24 2003, 14:36:59)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4

******** LinkChecker internal error, bailing out ********

bifrost:/home/arussell$

Internal Error on MacOS X 10.2.2

Converted from SourceForge issue 652560, submitted by chris01

I installed the new version of linkchecker on MacOS X 10.2.2 with the fink packaging system. Doing

linkchecker -v -ohtml -r2 -s http://www.eeh.ee.ethz.ch > linkcheck_eeh.html

i got the following errors:

exceptions.AttributeError addinfourl instance has no attribute 'readlines'
Traceback (most recent call last):
File "/sw/lib/python2.2/site-packages/linkcheck/UrlData.py", line 190, in check
self._check()
File "/sw/lib/python2.2/site-packages/linkcheck/HttpsUrlData.py", line 36, in _check
HttpUrlData._check(self)
File "/sw/lib/python2.2/site-packages/linkcheck/UrlData.py", line 246, in _check
self.checkConnection()
File "/sw/lib/python2.2/site-packages/linkcheck/HttpUrlData.py", line 91, in checkConnection
if self.config["robotstxt"] and not self.robotsTxtAllowsUrl():
File "/sw/lib/python2.2/site-packages/linkcheck/HttpUrlData.py", line 290, in robotsTxtAllowsUrl
rp.read()
File "/sw/src/root-python22-2.2.2-1/sw/lib/python2.2/robotparser.py", line 44, in read
lines = f.readlines()
AttributeError: addinfourl instance has no attribute 'readlines'
System info:
LinkChecker 1.8.1
Python 2.2.2 (#1, 11/12/02, 13:19:40)
[GCC Apple cpp-precomp 6.14] on darwin

Linkchecker keeps on running but sending out this error message again and again. Could anybody help me here?

Thanks, Chris.

Fails to check links in Zope

Converted from SourceForge issue 202482, submitted by rasputen

If you run linkchecker on a Zope site, it returns without having checked anything. Here's the output for a run on zope.org:

Start checking at 02.03.2000 15:10:18

URL http://www.zope.org/
Real URL http://www.zope.org/
Result Valid: 200 OK

Thats it. 0 warnings, 0 errors found.
Stopped checking at 02.03.2000 15:10:19

Windows not actually supported

Converted from SourceForge issue 857748, submitted by brianiac

Windows is listed as a supported OS for this project,
but there are no binaries available.
Windows does not include any compilers, and anyone
using ActivePython is given the error "Python was built
with version 6 of Visual Studio, and extensions need to
be built with the same version of the compiler, but it
isn't installed.".
This seems to indicate that the only way to install
LinkChecker (aside from shelling out >$1K) is to build
Python with MinGW, or find a binary built that way
(suboptimal).

Syntax error when running lc.cgi

Converted from SourceForge issue 783662, submitted by hallcp

I get this error when trying to run lc.cgi:

./lc.cgi
Traceback (innermost last):
File "./lc.cgi", line 28, in ?
import linkcheck.lc_cgi
File "./linkcheck/init.py", line 45, in ?
import UrlData
File "./linkcheck/UrlData.py", line 35
print >>sys.stderr, i18n._("""\n********** Oops, I
did it again. *************
^
SyntaxError: invalid syntax

I don't know Python, so this may be something obvious,
but I don't see it. I'm using Python 2.2.1.

linkchecker itself seems to run fine from the command line.

Thanks for any help,

Charles Hall
Raleigh, NC
USA

Not ignoring comments

Converted from SourceForge issue 440276, submitted by ajmitch

Great app, very useful, but I've found that it
doesn't ignore links in commented out sections of
html. Unfortunately I've taken over the maintenance
of a reasonable-size site, where some things were
commetned out, and linkchecker reports false
positives on broken links :)
Thanks
Andrew

RFE: Do not count 401s as broken

Converted from SourceForge issue 852627, submitted by nobody

I'd like an option to not count 401s as broken.

amprsand and other special characters

Converted from SourceForge issue 212504, submitted by jrmitche

It appears linkchecker doesn't understand the special character syntax that browsers do. It shows documents that have amprsands in their names as bad links after translating them to &

[BUG] id attribute not equivalent to named anchor

Converted from SourceForge issue 741131, submitted by nobody

hi,

take a deep breath, and bear with me...

if i have the following (valid and weblinted) html:

---8<---
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>

<head>
<title>named anchor id test</title>
</head>

<body>

<ol>
<li id="indexa">one</li>
<li>two</li>
<li>three</li>
</ol>

</body>

</html>
--->8---

linkchecker 1.8.15 complains: warning anchor #indexa
not found

(it can't find the anchor named "indexa".)

the html4 spec has this[1] to say about anchor names
(<a name="">):

---8<---
This attribute names the current anchor so that it may
be the
destination of another link. The value of this
attribute must be a
unique anchor name. The scope of this name is the
current document.

*** Note that this attribute shares the same name space
as the id
attribute. ***
--->8---

(emphasis mine.)

so i think the above should not cause a warning.
hope that helps,

p
(patricka at mkdoc dot com)

http://www.w3.org/TR/html4/struct/links.html#adef-name-A

TypeError: in init_dns_resolver_nt unpack non-sequence

Converted from SourceForge issue 864383, submitted by astern

hi,

C:\Python23\Scripts>c:\Python23\python.exe -O
c:\Python23\Scripts\linkchecker
Traceback (most recent call last):
File "c:\Python23\Scripts\linkchecker", line 34, in ?
import getopt, re, os, pprint, socket, linkcheck
File
"c:\Python23\lib\site-packages\linkcheck__init__.py",
line 46, in ?
import UrlData
File
"c:\Python23\lib\site-packages\linkcheck\UrlData.py",
line 27, in ?
DNS.DiscoverNameServers()
File
"c:\Python23\lib\site-packages\linkcheck\DNS\Base.py",
line 35, in DiscoverNameServers
init_dns_resolver_nt()
File
"c:\Python23\lib\site-packages\linkcheck\DNS\Base.py",
line 95, in init_dns_resolver_nt
count, counttype = subkey['DNSServerAddressCount']
TypeError: unpack non-sequence

C:\Python23\Scripts>

regards A.

Applet codebase incorrectly handled

Converted from SourceForge issue 636802, submitted by majid

I have the following code on one of my pages
(http://www.majid.info/radio/):
<applet code=panoapplet script language="JavaScript"
codebase="http://www.majid.info/images/"
height=266
name=FPViewer1 width=400
archive="panoapplet.jar">
<param name=file
value="http://www.majid.info/images/louvre.ivr">
<param name="autoSpin" value="-50">
</applet>

And linkchecker reports an error because it tries to
load http://www.majid.info/radio/panoapplet.jar instead
of codebase + archive =
http://www.majid.info/images/panoapplet.jar

There seems to be some minimal support for codebase in
UrlData.py, but not as far as prepending it
intelligently to the URL the way <A BASE> is handled.

URL panoapplet.jar
Parent URL http://www.majid.info/radio/, line 46
Real URL http://www.majid.info/radio/panoapplet.jar
Check Time 0.051 seconds
Result Error: 404 Not Found

problem following redirect to https url

Converted from SourceForge issue 784372, submitted by sbrauer

There are some urls on a site I'm checking that
redirect to some https urls. When linkchecker tries to
follow these redirections, it tries to connect to port
80 instead of 443.
The error I get is:

Error: Attempted connect to ('152.2.46.28', 80)
timed out.

Notice how the port in the error message is 80.

To duplicate this problem, you could create a simple
cgi script like this:

!/bin/sh

echo "Location: https://www-s3.ais.unc.edu/campus_dir/"
echo

and link to it from a static html page. Then run
linkchecker on the html page.

The strange thing is that some other links that
redirect to https urls (on other hosts) don't exhibit
this problem.

Problem with HEAD requests

Converted from SourceForge issue 680426, submitted by majid

When checking a site with Amazon.com links, reports
timeout errors. Apparently Amazon just hangs when a
HEAD request to a URL with an affiliate code is
submitted, such as
http://www.amazon.com/exec/obidos/ASIN/0393320928/fazalmajidswe-20
I don't know if this normal Amazon.com behavior or if
this is just a temporary situation.
A GET request works perfectly.

There is apparently some support for falling back from
HEAD to GET if an error code such as a 405 is
encountered, but not if a timeout is encountered.

There should be an option to use GET always instead of
HEAD (perhaps with the use of the If-Modified-Since
header, although that also makes Amazon hang)

I first encountered this issue when testing the new
1.8.7 release, but it may have been lurking for longer
than that.

Pages with ampersands reported broken.

Converted from SourceForge issue 749543, submitted by finnertyp

If a web page contains a link to a document or web page that contains an ampersand in its name, LinkChecker reports it as a broken link.

Pat.

Internal error

Converted from SourceForge issue 634679, submitted by nobody

linkchecker was called as:

linkchecker -r 3 http://www.math.lsu.edu

and the following error showed up after about an hour
of checking. I can be reached at [email protected] if
there are further questions.

exceptions.AttributeError 'None' object has no
attribute 'read'
Traceback (most recent call last):
File
"/usr/lib/python2.1/site-packages/linkcheck/UrlData.py",
line 260, in check
self._check()
File
"/usr/lib/python2.1/site-packages/linkcheck/UrlData.py",
line 339, in _check
try: self.parseUrl()
File
"/usr/lib/python2.1/site-packages/linkcheck/UrlData.py",
line 476, in parseUrl
bases = self.searchInForTag(BasePattern)
File
"/usr/lib/python2.1/site-packages/linkcheck/UrlData.py",
line 504, in searchInForTag
match =
pattern['pattern'].search(self.getContent(), index)
File
"/usr/lib/python2.1/site-packages/linkcheck/HttpUrlData.py",
line 259, in getContent
self.data = self.urlConnection.read()
AttributeError: 'None' object has no attribute 'read'
System info:
LinkChecker 1.6.3
Python 2.1.3 (#1, Sep 7 2002, 15:29:56)
[GCC 2.95.4 20011002 (Debian prerelease)] on linux2
LC_ALL = 'en_US'
http_proxy = 'http://erdos.math.lsu.edu:3128'
ftp_proxy = 'http://erdos.math.lsu.edu:3128'

**** Oops, I did it again. *

Converted from SourceForge issue 601707, submitted by nobody

linkchecker -v -ohtml -r2 -s http://www.servery.cz >
dd.html

exceptions.AttributeError addinfourl instance has no
attribute 'readlines'
Traceback (most recent call last):
File
"/usr/lib/python2.1/site-packages/linkcheck/UrlData.py",
line 253, in check
self._check()
File
"/usr/lib/python2.1/site-packages/linkcheck/HttpsUrlData.py",
line 35, in _check
HttpUrlData._check(self)
File
"/usr/lib/python2.1/site-packages/linkcheck/UrlData.py",
line 310, in _check
self.checkConnection()
File
"/usr/lib/python2.1/site-packages/linkcheck/HttpUrlData.py",
line 88, in checkConnection
if self.config["robotstxt"] and not
self.robotsTxtAllowsUrl():
File
"/usr/lib/python2.1/site-packages/linkcheck/HttpUrlData.py",
line 280, in robotsTxtAllowsUrl
rp.read()
File "/usr/lib/python2.1/robotparser.py", line 44, in
read
lines = f.readlines()
AttributeError: addinfourl instance has no attribute
'readlines'
System info:
LinkChecker 1.6.0
Python 2.1.3 (#1, Aug 25 2002, 10:07:39)
[GCC 2.95.4 20011002 (Debian prerelease)] on linux2
http_proxy = ''

robots.txt syntax parsing faulty

Converted from SourceForge issue 784977, submitted by nobody

A simple robots.txt file like

User-agent: *
Disallow:

is not understod by linkchecker. I get an error:

Warning Access denied by robots.txt, checked only syntax

and linkchecker aborts checking. For information about robots.txt
syntax, see http://www.robotstxt.org/wc/exclusion-admin.html

wummel / linkchecker Goto Github PK

linkchecker's Introduction

LinkChecker

Features

Installation

Usage

linkchecker's People

Contributors

Stargazers

Watchers

Forkers

linkchecker's Issues

quote parts before submit

N O T - F Ä L L E +49 ( 170 ) 3113748

somename ) should not be an error even the anchor is

!/bin/sh

Recommend Projects

Recommend Topics

Recommend Org