xroche / httrack Goto Github PK
View Code? Open in Web Editor NEWHTTrack Website Copier, copy websites to your computer (Official repository)
Home Page: http://www.httrack.com/
License: Other
HTTrack Website Copier, copy websites to your computer (Official repository)
Home Page: http://www.httrack.com/
License: Other
What steps will reproduce the problem?
1. Download a site redirecting to the same page, or to an already crawled page,
with different cookie settings
What is the expected output? What do you see instead?
The page should be downloaded again with the new cookie settings, but it is not
because of the page dedup, which is based on URI/GET parameters only.
Please use labels and text to provide additional information.
See http://forum.httrack.com/readmsg/31101
Original issue reported on code.google.com by xroche
on 6 Jun 2013 at 8:18
What steps will reproduce the problem?
1. Image with onXXX properties such as
<img onMouseOver="src='i/1_home.gif'" onMouseOut="src='i/2_home.gif'"
alt='Home' src='i/a_home.gif'>
2. Crawl the page
What is the expected output? What do you see instead?
The two OnXXX related images are not captured
Original issue reported on code.google.com by xroche
on 28 Feb 2013 at 8:25
What steps will reproduce the problem?
1. Try to mirror a website that dynamically serves .torrent files
What is the expected behavior ? What do you get instead?
HTTrack should treat the content as files, but it doesn't because it doesn't
recognize this particular MIME type.
What version of httrack are you using? On what operating system?
HTTrack version 3.47-21+libhtsjava.so.2 on Debian Wheezy
Trivial patch attached.
Original issue reported on code.google.com by [email protected]
on 14 Jul 2013 at 6:44
Attachments:
What steps will reproduce the problem?
1. Mirror a website with depth -r3 and with --debug-headers
What is the expected behavior ? What do you get instead?
All Referer headers in hts-ioinfo.txt have the same referer which is the
starting URL, even though not all pages are reachable in one hop from the
starting URL. Expected is for Referer to be set to the URL of the page from
which the link was followed.
What version of httrack are you using? On what operating system?
HTTrack version 3.47-21+libhtsjava.so.2 on Debian Wheezy.
For a moment there I thought the server in question was checking Referer
headers, so I made a patch for this, but it turned out that the Referer headers
did not matter in that case. Here's the one-line patch anyway.
Original issue reported on code.google.com by [email protected]
on 14 Jul 2013 at 6:40
Attachments:
What steps will reproduce the problem?
1. Open any txt file which was saved.
2. Middle click at the Notepad button of Taskbar.
3. The readme of HTTRACK will be appeared instead of the blank.
What is the expected behavior ? What do you get instead?
I expect the blank of Notepad. I get the readme of HTTRACK instead. I don't
know how to get back after trying. Luckily, uninstalling helps this.
What version of httrack are you using? On what operating system?
3.47.27 from the Product version of the installer exe file.
I'm using Windows 8.1 64-bit Professtional.
Please provide any additional information below.
Please don't change this shortcut. That's why I don't like to allow Privilege
Elevation to any app.
Original issue reported on code.google.com by [email protected]
on 17 Mar 2014 at 3:55
What steps will reproduce the problem?
1. Download a page containing links with filenames embedding non-ascii
characters
What is the expected output? What do you see instead?
Bad request sent to the server because of buggy encoding. According to RFC
3986, UTF-8 should be used with URL-encoding.
What version of the product are you using? On what operating system?
3.47.12
Please provide any additional information below.
Reported by Steven Hsiao (http://forum.httrack.com/readmsg/31050/index.html)
Original issue reported on code.google.com by xroche
on 18 May 2013 at 4:48
What steps will reproduce the problem?
'brew install httrack'
What is the expected behavior ? What do you get instead?
Expected download and installation of HTTrack software, following error was
received instead:
┌─[peter@foo] - [~] - [Thu Aug 01, 12:44]
└─[$] <> brew install httrack
==> Downloading http://download.httrack.com/httrack-3.46.1.tar.gz
######################################################################## 100.0%
Error: SHA1 mismatch
Expected: be6328d2ff3cbabd21426b7acc54edcf1ebb76e0
Actual: 2ba3da7784bcd67ff98ff09c419cfb700c97ba5b
Archive: /Library/Caches/Homebrew/httrack-3.46.1.tar.gz
(To retry an incomplete download, remove the file above.)
What version of httrack are you using? On what operating system?
httrack-3.46.1 on OS X (v10.8.4),
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 1 Aug 2013 at 7:49
What steps will reproduce the problem?
1. This rev introduced a change to try and prevent the WinHTTrack window from
stealing focus: https://code.google.com/p/httrack/source/detail?r=1290
2. There are two user cases to test this change. First, WinHTTrack is minimised
to the taskbar when it completes a mirror. Second, WinHTTrack is not minimised
to the taskbar - but is not the active window - when it completes a mirror.
3. The fix only works for user case one, the problem still exists with user
case two.
What is the expected behavior ? What do you get instead?
The WinHTTrack window will not force itself to the top when a mirror completes.
Instead, it still does so.
What version of httrack are you using? On what operating system?
v3.48.19, Win 7 64bit.
Please provide any additional information below.
Forum thread is here: http://forum.httrack.com/readmsg/33162/index.html
As stated, one user case has been fixed - so progress has been made, but the
other user case is still a problem.
Original issue reported on code.google.com by [email protected]
on 5 Aug 2014 at 11:58
What steps will reproduce the problem?
1. Downloading some Web-Adresses
2. WinHttrack crashes always after some Time
What is the expected behavior ? What do you get instead?
I'm expecting NO crash
What version of httrack are you using? On what operating system?
WinHttrack 3.48.3, german
Windows XP
Please provide any additional information below.
Message in crashes.txt"
HTTrack 3.48.3 closed at '..\httrack\htsinthash.c', line 788
Reason:
assert failed: ! "hashtable internal error: cuckoo/stash collision"
Original issue reported on code.google.com by [email protected]
on 4 May 2014 at 1:37
You probably already known that httrack is unable to emulate browser behaviour
in some kinds of javascripts, but I'm reporting this if you want to play a bit
on it:
URL: http://hemerotecadigital.bn.br/acervo-digital/norte-goyaz/120685
What is expected? Download the HTML page and all related .PDF files
What occurs? Download only the HTML page, removing the <base href/> tag with no
further retrievals
I'm running WinHTTrack Website Copier 3.47-27 on Win7 SP1. I can't share full
settings for this project, but it is set to "Get non-HTML files related to a
link" and with a bunch of +http://memoria.bn.br/* filters
Original issue reported on code.google.com by [email protected]
on 7 Apr 2014 at 10:41
What steps will reproduce the problem?
1. Try download http://gz.ifeng.com/zaobanche/detail_2014_06/14/2429049_0.shtml
What is the expected behavior ? What do you get instead?
Should download correctly. Stopped and error reported.
What version of httrack are you using? On what operating system?
HTTrack3.48-13+htsswf+htsjava
Please provide any additional information below.
HTTrack3.48-13+htsswf+htsjava launched on Tue, 17 Jun 2014 21:55:17 at
http://gz.ifeng.com/zaobanche/detail_2014_06/14/2429049_0.shtml -* +*.png
+*.gif +*.jpg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar
+http://gz.ifeng.com/zaobanche/detail_2014_06/14/2429049_*.shtml
(winhttrack -qwC2%Ps0u1%s%uN0%I0p3DaK0H0%kf2A25000%f#f -F "Mozilla/4.5
(compatible; HTTrack 3.0x; Windows 98)" -%F "<!-- Mirrored from %s%s by HTTrack
Website Copier/3.x [XR&CO'2013], %s -->" -%l "en, *"
http://gz.ifeng.com/zaobanche/detail_2014_06/14/2429049_0.shtml -O1 "C:\My Web
Sites\娘子军舞蹈用腿开枪网络爆红" -* +*.png +*.gif +*.jpg +*.css
+*.js -ad.doubleclick.net/* -mime:application/foobar
+http://gz.ifeng.com/zaobanche/detail_2014_06/14/2429049_*.shtml )
Information, Warnings and Errors reported for this mirror:
21:55:19 Error: "Error when decompressing" (-1) at link
gz.ifeng.com/zaobanche/detail_2014_06/14/2429049_0.shtml (from primary/primary)
21:55:19 Warning: No data seems to have been transferred during this session!
: restoring previous one!
similar to issue38 https://code.google.com/p/httrack/issues/detail?id=38
Not limited to this website. Many others also met with the same error.
Original issue reported on code.google.com by [email protected]
on 17 Jun 2014 at 7:57
Enclosed is a sample project "test-2" in which some files (not HTML) are
renamed locally with ".html" extension.
This prevents from showing them correctly to many browsers.
Many files with ".html"-extension are in reality PDF, at example in folder
"test-2\eco.uninsubria.it\webdocenti\amira\inferenza\":
"12set01_inf.html", "26-mar-02A.html", "2lug01_inf.html", ...
Observation: I have substituted all real-PDF files with 0-dimensional files,
excepted "12set01_inf.html".
A second less important problem:
Always in the same project, there are folders that are empty or contain only
other folders without files.
There are some HTML files (real HTML this time), which have links to not
existent local files, why not downloaded.
For a file does not exist locally, even for some error in downloading, the
corresponding links in the HTML file should be absolute like http://.
At example in project "test-2":
Folder "test-2\aim.unipv.it\" does not have data, only subfolders.
File "test-2\eco.uninsubria.it\webdocenti\amira\inferenza\prog.html" has link
href="../../../../aim.unipv.it/_anto/prog-andati.ps".
"prog.html" should instead have link
href="http://aim.unipv.it/~anto/prog-andati.ps", because the file
"prog-andati.ps" does not exist locally.
Version of httrack: 3.47-27
My operating system is Windows 7.
Original issue reported on code.google.com by [email protected]
on 16 Oct 2013 at 2:46
Attachments:
Hi,
Does httrack support aarch64 now?
Thanks.
Original issue reported on code.google.com by Cickumqt
on 13 Sep 2013 at 5:53
I'm trying to mirror websites powered by wordpress. But due to the very large
URLs some pages have (such as
www.ambiente.sp.gov.br/cea/guia-bibliografico/bases-para-conservacao-e-uso-suste
ntavel-do-cerrado-paulistasecretaria-de-estado-do-meio-ambiente-smaprograma-esta
dual-para-a-conservacao-da-biodiversidade-probio/ ) and the Windows limit on
folders+file names characters, it generates lot's of "serialize error"s
The build options available at http://www.httrack.com/html/fcguide.html on't
satisfy my needs, so I'm asking to add a new feature:
Place html pages on site_name/web/randonnames and all others files on the
default full url name
Original issue reported on code.google.com by [email protected]
on 4 May 2013 at 1:31
What steps will reproduce the problem?
1.help us
2.
3.
What is the expected behavior ? What do you get instead?
What version of httrack are you using? On what operating system?
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 25 Sep 2013 at 12:24
Some websites are available on more than one domain without it being a mirror,
only a website with more than one possible domain (example: somewebsite.com and
websitewebsite.com displays the exactly same content osted on exactly same
server)
Please add a feature that allows the users to indicate such cases to httrack
and to act according it without getting the same page/file in both URLs
Original issue reported on code.google.com by [email protected]
on 24 Jun 2013 at 1:46
What steps will reproduce the problem?
1. httrack https://twitter.com/TSBible -%N0
What is the expected behavior ? What do you get instead?
20:43:46 Warning: file not stored in cache due to bogus state (broken size,
expected 217392 got 888): https://twitter.com/TSBible?lang=gl
20:43:46 Warning: file not stored in cache due to bogus state (broken size,
expected 217525 got 908): https://twitter.com/TSBible?lang=it
What version of httrack are you using? On what operating system?
3.48.3
Please provide any additional information below.
Reported by http://forum.httrack.com/readmsg/32672/index.html
Original issue reported on code.google.com by xroche
on 14 Apr 2014 at 6:45
What steps will reproduce the problem?
1. mirror https://tw.money.yahoo.com/international-news
2. original url ex:"/美股指數期貨最新報價-13-37-060749261.html"
3. httrack get link
ex:"https://tw.money.yahoo.com/ގ股指數期貨最新報價-15-28-075053946.htm
l"
What is the expected behavior ? What do you get instead?
expected behavior:
https://tw.money.yahoo.com/美股指數期貨最新報價-15-28-075053946.html
=>correct url for download
error: get 404 error for wrong url
What version of httrack are you using? On what operating system?
3.48.13
Original issue reported on code.google.com by [email protected]
on 8 Jul 2014 at 8:29
1- Info
Choose a random link from
http://sistemasinter.cetesb.sp.gov.br/produtos/produto_consulta_completa.asp
that have any special characters (é, ó, ô, ê...) such, for example,
"ACETALDEÍDO" (first one started with A)
in some browsers you will get the expected page, but on some you will get one
with a 500 error.
Got corrected page on Chrome 27.0.1453.94 m @ Win7Home Premium SP1 64 bits but
WinHTTrack 3.47-14 on same O.S.
Firefox 21.0 on a WinXP machine generated the same error from httrack, but the
exactly same version on the current machine (specified in the previous line)
works well
2- Full error message from page
ADODB.Field error '80020009'
Either BOF or EOF is True, or the current record has been deleted. Requested
operation requires a current record.
/produtos/ficha_completa1.asp, line 0
Original issue reported on code.google.com by [email protected]
on 29 May 2013 at 10:16
Attachments:
On every run in a given project I'm getting too many "bogus state (incomplete
type)" errors. And on every run this issue happens on the exactly same files.
It is clear to me that this is a httrack fault, not exactly a server one.
My suggestion is to add a feature to try to resume download for binary files
with "bogus state (incomplete type)" errors.
Running WinHTTrack Website Copier 3.46 x64 on Win7 SP1.
Original issue reported on code.google.com by [email protected]
on 30 Mar 2013 at 4:07
Attachments:
What steps will reproduce the problem?
1. Install httrack-3.48.9.exe on Windows 2000
2. Start WinHTTrack.exe
What is the expected behavior ? What do you get instead?
Instead program start the error message dialog:
"The procedure entry point SetDllDirectoryA could not be located in the dynamic
link library KERNEL32.dll"
What version of httrack are you using? On what operating system?
httrack-3.48.9.exe on Windows 2000
Please provide any additional information below.
Unsatisfied reference to SetDllDirectoryA is found in libhttrack.dll
Original issue reported on code.google.com by [email protected]
on 4 Jun 2014 at 3:50
What steps will reproduce the problem?
1. install httrack via MacPorts
2. try to crawl https:// sites
Please provide any additional information below.
A link from /usr/lib/libssl.so to /usr/lib/libssl.dylib is a possible workaround
Original issue reported on code.google.com by xroche
on 5 Apr 2013 at 4:05
What steps will reproduce the problem?
1. Point HTTrack to download.crystalbuntu.com
What is the expected output? What do you see instead?
Expecting gptsync file to still be called gptsync, however it gets renamed to
gptsynchtml.html
What version of the product are you using? On what operating system?
3.46
Please provide any additional information below.
attempting to use HTTrack to mirror this website, download.crystalbuntu.com and
it works great except for that one issue.
note that I'm pointing IIS directly at the downloaded files area.
Original issue reported on code.google.com by [email protected]
on 25 Feb 2013 at 5:04
Attachments:
What steps will reproduce the problem?
1. HTTRACK use URL list file for mirror web(HTTrack.exe -%L startURL.4016)
2. hts-log.txt show Error:Could not include URL list: startURL.4016
What is the expected behavior ? What do you get instead?
expected behavior: HTTRACK add links from URL list file
Error: Could not include URL list: startURL.4016
What version of httrack are you using? On what operating system?
httrack 3.48-14 for windows xp
Original issue reported on code.google.com by [email protected]
on 9 Jul 2014 at 6:06
What steps will reproduce the problem?
1. Login to a site with authetication that uses a cookie where the domain is of
the form:
.domain.org TRUE /path FALSE 0000000000 key value
Note the leading dot, which, afaiu, means it should match domain.org and all
its subdomains.
2. Export the cookies.txt into the httrack mirror directory
3. Try to mirror the website with --debug-headers to see the cookies
What is the expected behavior ? What do you get instead?
HTTrack is expected to find the cookie in cookies.txt when requestiong URLs
like http://domain.org/path and appended that cookie to the request. The header
log shows that the cookie is not appended.
What version of httrack are you using? On what operating system?
CLI version on Debian Wheezy: HTTrack version 3.47-21+libhtsjava.so.2
I think this is because this condition does not hold in this case:
htsbauth.c:cookie_find: (int) strlen(chk_dom) <= (int) strlen(domain)
Tentative patch against debian source package is attached -- tested with my
particular website. I didn't read the RFC so perhaps its not what we want, but
it fixed my issue.
Original issue reported on code.google.com by [email protected]
on 14 Jul 2013 at 6:18
Attachments:
Check name restrictions wrt. "con", "aux", and friends.
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.as
px
Original issue reported on code.google.com by xroche
on 7 Oct 2013 at 6:38
And this isn't a server issue.
If I go directly to the URL in browser I get the same error, but if I click
from a page on the same server it downloads successfully. Maybe an issue with
URL refer on httrack side or encoding issue?
20:49:01 381/381 ---M-- 404 error
('Not%20Found') text/html date:Mon,%2012%20Aug%202013%2023:54:36%20GMT http://li
cenciamento.ibama.gov.br/Hidreletricas/Belo%20Monte/Outros%20Documentos/Acompanh
amento%20da%20LI_condicionantes/CE%20NE%20469_2011-DS_condicionante%202.4/03-Pro
jeto%20Basico%20Linhas%20de%20Transmissao/LT%2034.5%20KV%20SE%20-%20BM%20-%20PI/
LT-3#L~8.DWG N:/[Ambiente]/0-IBAMA/Hidr/web/delayed/lt-3#l_8.d.delayed (from
http://licenciamento.ibama.gov.br/Hidreletricas/Belo%20Monte/Outros%20Documentos
/Acompanhamento%20da%20LI_condicionantes/CE%20NE%20469_2011-DS_condicionante%202
.4/03-Projeto%20Basico%20Linhas%20de%20Transmissao/LT%2034.5%20KV%20SE%20-%20BM%
20-%20PI/)
Original issue reported on code.google.com by [email protected]
on 16 Aug 2013 at 7:20
What steps will reproduce the problem?
1. Open http://www.httrack.com/history.txt
2. Last update is for 3.46-1
3. 3.47.2 is missing
What is the expected output? What do you see instead?
Expected to see the full change log, but history.txt doesn't seem to have been
updated.
Please update accordingly.
Original issue reported on code.google.com by [email protected]
on 24 Apr 2013 at 12:56
ISSUE
When I use httrack to fetch a Python script, it warns that it couldn't create a
temporary reference file.
Whatever a reference file is, the download works without one.
Can you stop the warning message from being displayed?
REPRO
In an empty directory, do
$ httrack -g http://sebsauvage.net/python/html2csv.py
In the output you will see a message like
Warning: Could not create temporary reference file for
sebsauvage.net/python/html2csv.py
When the download is complete, check that the file is really there:
$ ls -l
-rw-r--r-- 1 sandport sandport 6021 Apr 4 2006 html2csv.py
So the file downloaded just fine.
I'm using HTTrack3.47-21+libhtsjava.so.2 on Xubuntu 13.04 64-bit.
I originally reported this on the forum:
http://forum.httrack.com/readmsg/32477/index.html
Google Code looks like a more appropriate place for bug reports.
EXAMPLE LOG
"""
$ httrack -g <http://sebsauvage.net/python/html2csv.py>
HTTrack3.47-21+libhtsjava.so.2 launched on Mon, 10 Feb 2014 20:47:02 at
<http://sebsauvage.net/python/html2csv.py>
(httrack -g <http://sebsauvage.net/python/html2csv.py> )
Information, Warnings and Errors reported for this mirror:
note: the hts-log.txt file, and hts-cache folder, may contain sensitive
information,
such as username/password authentication for websites mirrored in this
project
do not share these files/folders if you want these information to remain
private
Mirror launched on Mon, 10 Feb 2014 20:47:02 by HTTrack Website
Copier/3.47-21+libhtsjava.so.2 [XR&CO'2013]
mirroring <http://sebsauvage.net/python/html2csv.py> with the wizard help..
20:47:32 Warning: Could not create temporary reference file for
sebsauvage.net/python/html2csv.py
1/2: sebsauvage.net/python/html2csv.py (6021 bytes) - OK
HTTrack Website Copier/3.47-21 mirror complete in 31 seconds : 1 links
scanned, 1 files written (6021 bytes overall) [6328 bytes received at 204
bytes/sec]
(No errors, 1 warnings, 0 messages)
Done.
Thanks for using HTTrack!
$ ls
html2csv.py
"""
Original issue reported on code.google.com by [email protected]
on 12 Feb 2014 at 6:05
What steps will reproduce the problem?
1. Download
file:///C:/temp/websites/test%20accent/www.bbc.co.uk/dna/mbarchers/NF26939435a56
.html
2. View the page
What is the expected output? What do you see instead?
Accents are buggy and the correct charset is superseded by the original (buggy)
one in the html source code
Please use labels and text to provide additional information.
Reported by Justme at http://forum.httrack.com/readmsg/30487/index.html
Original issue reported on code.google.com by xroche
on 28 Feb 2013 at 2:48
Using the attached configuration file, httrack don't downloaded rar files,
pointing to the external.html instead
(such as
external.html?link=http://comitespcj.org.br/images/Download/SC_Dados-Ptos-Intere
sse_22-07-13.rar
external.html?link=http://comitespcj.org.br/images/Download/SC_Vazoes-1930-2012.
rar
)
Issues with .ar domain names?
Original issue reported on code.google.com by [email protected]
on 8 Nov 2013 at 2:44
Attachments:
Even with "No error pages" and "No external pages" set selected, and with "Do
not purge old files" unselected, WinHTTrack 3.47-19 keeps those files without
any purging attempt.
[project
root]/CETESB/sistemasinter.cetesb.sp.gov.br/emergencia/graf_regiao2.html
is 0 bytes and
http://sistemasinter.cetesb.sp.gov.br/emergencia/graf_regiao2.html
is a 404 error page
[project
root]/licenciamento.cetesb.sp.gov.br/legislacao/estadual/decretos/decreto_33499.
html
is a 404 error page
[project
root]\www.cetesb.sp.gov.br\userfiles\image\mudancasclimaticas\proclima\image\fot
os_eventos\seminario_impactos\images\IMG_9589_jpg_jpg.html.readme
was generated for
[project
root]/www.cetesb.sp.gov.br/userfiles/image/mudancasclimaticas/proclima/image/fot
os_eventos/seminario_impactos/images/IMG_9589_jpg_jpg.html
, a 404 page error. According to the system timestamp, it was on my HTTP /1.0
run (changed to "force old 1.0" because lots of binary files were renamed as
.html), but the previous examples where from my current, HTTP 1.0, run
The size on old.XX and new.XXX differs due to changes on my filters settings.
Original issue reported on code.google.com by [email protected]
on 25 Jun 2013 at 7:32
What steps will reproduce the problem?
1. Crawl a page with a long query string which include non-ascii characters
What is the expected output? What do you see instead?
The mirror ends abruptly.
Please use labels and text to provide additional information.
http://forum.httrack.com/readmsg/32749/index.html
Original issue reported on code.google.com by xroche
on 2 May 2014 at 6:31
What steps will reproduce the problem?
n/a
What is the expected behavior ? What do you get instead?
assertion failure at htscore.c:244 (len + liensbuf->string_buffer_size <
liensbuf->string_buffer_capa)
What version of httrack are you using? On what operating system?
3.48.10
Please provide any additional information below.
http://forum.httrack.com/readmsg/32922/index.html
Original issue reported on code.google.com by xroche
on 6 Jun 2014 at 3:47
What steps will reproduce the problem?
1. Add url: "www.tsw-builder.com"
2. Start grab
3. Images are not downloaded
What is the expected behavior ? What do you get instead?
Images should download. Instead, they do not.
What version of httrack are you using? On what operating system?
3.47-27
Windows 7 64-bit
Please provide any additional information below.
Open "www.tsw-builder.com" and click any of the weapon icons in the top left
corner. A list will appear below. Expand any section by clicking on it. The
ability icons that appear DO NOT download.
At first I thought it was because the lists do not appear until you select a
weapon and maybe the images are hidden. So I selected a weapon and copied the
new URL and input it into httrack. For example: http://www.tsw-builder.com/#15vp
When you do this, an error will appear in the error log:
23:48:37 Error: "Unable to get server's address: The requested name is valid,
but no data of the " (-5) after 2 retries at link primary/vp (from
primary/primary)
Maybe I'm reading it wrong, but it seems like httrack isn't properly handling
the pound sign (#) in the URL.
Original issue reported on code.google.com by [email protected]
on 27 Jul 2013 at 3:50
Reported at http://forum.httrack.com/readmsg/30327/30320/index.html
What steps will reproduce the problem?
1. Download glka.co.il/templates/new_default/new_default.css
What is the expected output? What do you see instead?
images/2222.jpg should be detected, and downloaded (but is not)
What version of the product are you using? On what operating system?
3.46
Original issue reported on code.google.com by xroche
on 25 Feb 2013 at 7:41
What steps will reproduce the problem?
1. mirror http://ut.httrack.com/unicode-links/idna_bogus.html
What is the expected behavior ? What do you get instead?
Expected not to crash.
What version of httrack are you using? On what operating system?
3.48.8
Please provide any additional information below.
http://forum.httrack.com/readmsg/32822/index.html
Original issue reported on code.google.com by xroche
on 19 May 2014 at 7:11
What steps will reproduce the problem?
1. goto set options, scan rules
2. look at the preset rules (with check boxes) only *.jpg is present in the
first one
What version of httrack are you using? On what operating system?
httrack 3.47-27 , windows 7
what i would like!
is it possible to add *.jpeg in to this list as different software use the 2
different file extensions for jpeg images and i dont think you software classes
jpg and jpeg as the same.
thanks
jon
Original issue reported on code.google.com by [email protected]
on 23 Jan 2014 at 3:26
What steps will reproduce the problem?
1. run "webhttrack" on the command line
2. fill out the forms, press "start" on the page where the radio button "Please
adjust connection parameters if necessary, then press FINISH to launch the
mirroring operation." is
3. server crashes
What is the expected output? What do you see instead?
On the command line, I see the following output:
$ webhttrack
/opt/homebrew/bin/webhttrack(58993): launching /usr/bin/open -W
/opt/homebrew/bin/webhttrack(58993): spawning regular browser..
/opt/homebrew/bin/webhttrack: line 166: 59007 Bus error: 10
${BINPATH}/htsserver "${DISTPATH}/" path "${HOME}/websites" lang "${LANGN}" $@
What version of the product are you using? On what operating system?
HTTrack version 3.47 (.11)
Mac OS X
Darwin vienna.local 12.3.0 Darwin Kernel Version 12.3.0: Sun Jan 6 22:37:10
PST 2013; root:xnu-2050.22.13~1/RELEASE_X86_64 x86_64
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 5 May 2013 at 8:18
This is certainly a server-side issue, but there isn't any kind of action to
get those files even with this issue? Maybe an advanced option to force to save
those files?
Running WinHTTrack Website Copier 3.47-27 on Win7 x64
20:05:51 Warning: file not stored in cache due to bogus state (broken size,
expected 40960 got 40962):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=1
20:05:52 Warning: file not stored in cache due to bogus state (broken size,
expected 25088 got 25090):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=2
20:05:52 Warning: file not stored in cache due to bogus state (broken size,
expected 27136 got 27138):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=3
20:05:52 Warning: file not stored in cache due to bogus state (broken size,
expected 34816 got 34818):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=4
20:05:53 Warning: file not stored in cache due to bogus state (broken size,
expected 32768 got 32770):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=5
20:07:45 Warning: file not stored in cache due to bogus state (broken size,
expected 333102 got 333104):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=1102
20:07:47 Warning: file not stored in cache due to bogus state (broken size,
expected 60129 got 60131):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=1092
20:08:13 Warning: file not stored in cache due to bogus state (broken size,
expected 58789 got 58791):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=892
20:08:18 Warning: file not stored in cache due to bogus state (broken size,
expected 72731 got 72733):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=693
20:08:19 Warning: file not stored in cache due to bogus state (broken size,
expected 66461 got 66463):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=695
20:14:00 Warning: file not stored in cache due to bogus state (broken size,
expected 33930 got 33932):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=1237
20:14:00 Warning: file not stored in cache due to bogus state (broken size,
expected 27734 got 27736):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=1238
20:14:01 Warning: file not stored in cache due to bogus state (broken size,
expected 135521 got 135523):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=1098
20:14:01 Warning: file not stored in cache due to bogus state (broken size,
expected 104016 got 104018):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=1100
20:14:02 Warning: file not stored in cache due to bogus state (broken size,
expected 113775 got 113777):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=1101
20:14:03 Warning: file not stored in cache due to bogus state (broken size,
expected 115238 got 115240):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=1099
20:15:32 Warning: file not stored in cache due to bogus state (broken size,
expected 646567 got 646569):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=883
20:16:05 Warning: file not stored in cache due to bogus state (broken size,
expected 105845 got 105847):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=698
20:16:12 Warning: file not stored in cache due to bogus state (broken size,
expected 132168 got 132170):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=600
20:16:12 Warning: file not stored in cache due to bogus state (broken size,
expected 849282 got 849284):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=601
20:16:15 Warning: file not stored in cache due to bogus state (broken size,
expected 294110 got 294112):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=598
20:16:15 Warning: file not stored in cache due to bogus state (broken size,
expected 1221725 got 1221727):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=599
20:16:21 Warning: file not stored in cache due to bogus state (broken size,
expected 266148 got 266150):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=557
20:16:22 Warning: file not stored in cache due to bogus state (broken size,
expected 42635 got 42637):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=558
20:16:22 Warning: file not stored in cache due to bogus state (broken size,
expected 94748 got 94750):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=556
20:16:23 Warning: file not stored in cache due to bogus state (broken size,
expected 73377 got 73379):
www.comiteps.sp.gov.br/erapido/plugins/erapido.link/download.php?id=40
Original issue reported on code.google.com by [email protected]
on 11 Oct 2013 at 11:24
whenever i am trying to download: http://www.computerhope.com/
a dialog box appears saying following:
**MIRROR ERROR!**
HTTrack has detected that the curent mirror is empty. if it was an
update,the previos mirror has been restored.
reason: the first page(s) either could not found, or a connection
problem occured.
=> Ensure that the website stil exits, and/or check your proxy settings
<=
i am using 3.47-27 version of httrack in chrome browser on windows 8.
one additional information.......
log file says following.....
HTTrack3.47-27+htsswf+htsjava launched on Wed, 05 Feb 2014 11:58:17 at
http://www.computerhope.com/ +*.png +*.gif +*.jpg +*.css +*.js
-ad.doubleclick.net/* -mime:application/foobar
(winhttrack -qwC2%Ps2u1%s%uN0%I0p3DaK0H0%kf2A25000%f#f -F "Mozilla/4.5
(compatible; HTTrack 3.0x; Windows 98)" -%F "<!-- Mirrored from %s%s by HTTrack
Website Copier/3.x [XR&CO'2013], %s -->" -%l "en, *"
http://www.computerhope.com/ -O1 "C:\My Web Sites\aboutcomputer" +*.png +*.gif
+*.jpg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar )
Information, Warnings and Errors reported for this mirror:
note: the hts-log.txt file, and hts-cache folder, may contain sensitive
information,
such as username/password authentication for websites mirrored in this project
do not share these files/folders if you want these information to remain private
11:58:18 Error: "Forbidden" (403) at link www.computerhope.com/ (from
primary/primary)
11:58:18 Warning: No data seems to have been transferred during this session!
: restoring previous one!
Original issue reported on code.google.com by [email protected]
on 5 Feb 2014 at 6:45
What steps will reproduce the problem?
1. Download a site with non-ascii filenames (such as BIG5 at
http://fms.cto.doh.gov.tw/DOH/Office/procquerypopularize)
What is the expected output? What do you see instead?
Filenames are expected to be correctly encoded
What version of the product are you using? On what operating system?
3.47-12
Please provide any additional information below.
Reported by Steven Hsiao (http://forum.httrack.com/readmsg/31050/index.html)
Original issue reported on code.google.com by xroche
on 18 May 2013 at 3:00
What steps will reproduce the problem?
1. Download a site providing a "Content-Range: bytes 0-NNN/NNN" header with a
200 code
What is the expected output? What do you see instead?
The file can not be downloaded, with the error:
"bogus state (broken size, expected NNN got 0)"
What version of the product are you using? On what operating system?
3.47-1
Original issue reported on code.google.com by xroche
on 14 Apr 2013 at 5:50
What steps will reproduce the problem?
1. Download ".htm" URL
What is the expected output? What do you see instead?
.htm files are expected, but httrack renamed files into ".html"
See http://forum.httrack.com/readmsg/31839/index.html
Original issue reported on code.google.com by xroche
on 15 Sep 2013 at 11:05
SUMMARY
-------
Some sites fail to download when you set the structure type to "%h/%p/%N".
I tested the behavior with ubuntuforums.org and red-gate.com/messageboards.
REPRO FOR UBUNTUFORUMS.ORG
--------------------------
1. Download ubuntofurums.org starting from an arbitrary page. Set site
structure to "%h/%p/%N" and enable debug logging.
"""
$ httrack 'http://ubuntuforums.org/showthread.php?t=1903782' -N "%h/%p/%N" -Z
"""
httrack exists quickly.
2. Inspect hts-log.txt to find some error messages.
"""
19:54:07 Info: engine: transfer-status: link error (-1, 'Error when
decompressing'): ubuntuforums.org/showthread.php?t=1903782
19:54:07 Debug: File checked by cache: ubuntuforums.org
19:54:07 Info: engine: warning: serialize error for
ubuntuforums.org/showthread.php?t=1903782 to /showthreadhtml.tmp: open error
(directory exists, file does not exist): Permission denied
19:54:07 Info: engine: warning: serialize error for
ubuntuforums.org/showthread.php?t=1903782 to /showthreadhtml.tmp: open error
(directory exists, file does not exist): Permission denied
19:54:07 Info: engine: warning: serialize error for
ubuntuforums.org/showthread.php?t=1903782 to /showthreadhtml.tmp: open error
(directory exists, file does not exist): Permission denied
19:54:07 Debug: File confirmed (size test): ubuntuforums.org/robots.txt (0)
19:54:07 Info: engine: warning: serialize error for
ubuntuforums.org/showthread.php?t=1903782 to /showthreadhtml.tmp: open error
(directory exists, file does not exist): Permission denied
"""
There is one more error at the end.
"""
19:54:07 Error: "Error when decompressing" (-1) at link
ubuntuforums.org/showthread.php?t=1903782 (from primary/primary)
19:54:07 Info: No data seems to have been transferred during this session! :
restoring previous one!
19:54:07 Info: engine: end
19:54:07 Debug: engine: free
"""
The complete log is attached as hts-log_with_error.txt
EXPECTED BEHAVIOR
-----------------
I expect httrack to download the files and save them in the specified structure:
- a folder called ubuntuforums.org
- a series of folders for the path
- files named showthread.php-1, showthread.php-2, showthread.php-3, etc
SYSTEM DETAILS
--------------
I am using httrack 3.47-21+libhtsjava.so.2 on Xubuntu 13.04 64-bit.
Original issue reported on code.google.com by [email protected]
on 13 Feb 2014 at 8:44
Attachments:
I'm sorry I think I've requested too many today. But I think only here can help
me.
We now have 2 bugs reported in RH bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=923880
https://bugzilla.redhat.com/show_bug.cgi?id=995206
Please help take a look if possible.
Thanks!
Original issue reported on code.google.com by Cickumqt
on 13 Sep 2013 at 4:55
I just checked httrack thoroughly, found some issues:
1. httrack-devel.i686: E: incorrect-fsf-address /usr/include/httrack/htsglobal.h
httrack-devel.i686: E: incorrect-fsf-address /usr/include/httrack/htsbasenet.h
httrack-devel.i686: E: incorrect-fsf-address /usr/include/httrack/htsmodules.h
httrack-devel.i686: E: incorrect-fsf-address /usr/include/httrack/htsdefines.h
httrack-devel.i686: E: incorrect-fsf-address
/usr/include/httrack/httrack-library.h
httrack-devel.i686: E: incorrect-fsf-address /usr/include/httrack/htsbauth.h
httrack-devel.i686: E: incorrect-fsf-address /usr/include/httrack/htsconfig.h
httrack-devel.i686: E: incorrect-fsf-address /usr/include/httrack/htswrap.h
httrack-devel.i686: E: incorrect-fsf-address /usr/include/httrack/htsopt.h
httrack.i686: E: incorrect-fsf-address /usr/share/doc/httrack/license.txt
so I hope you can update the license. FSF changed its address long time ago.
2. httrack.i686: E: missing-call-to-setgroups /usr/lib/libhttrack.so.2.0.47
httrack.i686: E: missing-call-to-chdir-with-chroot /usr/lib/libhttrack.so.2.0.47
Seems your coding style is not recommended by nist.
This executable is calling setuid and setgid without setgroups or initgroups.
There is a high probability this mean it didn't relinquish all groups, and this
would be a potential security issue to be fixed. Seek POS36-C on the web for
details about the problem.
Ref POS36-C:
https://www.securecoding.cert.org/confluence/display/seccode/POS36-C.+Observe+co
rrect+revocation+order+while+relinquishing+privileges
3.
httrack.i686: W: spurious-executable-perm /usr/share/doc/httrack/AUTHORS
Seems it should be 644 only.
4. We've found obsoleted m4 macros in your package, see:
https://fedorahosted.org/FedoraReview/wiki/AutoTools
5. I found 2 folders:
src/minizip and src/mmsrip. Seems they are bundled libs? I just took this
pacakge over in Fedora, I don't know when you added them, but due to policy
mmsrip is not accepted in Fedora:
https://bugzilla.redhat.com/show_bug.cgi?id=219112
And minizip is Ok. But I'm not sure if I can unbundle it or not.
Original issue reported on code.google.com by Cickumqt
on 13 Sep 2013 at 1:19
What steps will reproduce the problem?
1. start webhttrack on a local machine where the hostname is unresolvable
What is the expected behavior ? What do you get instead?
webhttrack should fallback to localhost
What version of httrack are you using? On what operating system?
3.47
Original issue reported on code.google.com by xroche
on 26 May 2013 at 8:28
Files that contain "+" in the name give "Not Found" error.
The version of httrack that gives this error is 3.47.20, instead in version
3.46.1 this problem does not appear.
Please see the attachment, which contains two simplified and identical projects
made with the two versions of httrack.
Read ReadMe.txt that gives some more explanation.
My operating system is Windows 7.
Original issue reported on code.google.com by [email protected]
on 4 Jul 2013 at 9:05
Attachments:
What steps will reproduce the problem?
1. Build: debuild -uc -us
2. Clean: debuild clean
3. Build again: debuild -uc -us
What is the expected behavior ? What do you get instead?
Expected is the second build to succeed, but it fails with:
dpkg-source: info: local changes detected, the modified files are:
httrack-3.47.21/Makefile.in
httrack-3.47.21/aclocal.m4
httrack-3.47.21/configure
httrack-3.47.21/html/Makefile.in
httrack-3.47.21/lang/Makefile.in
httrack-3.47.21/libtest/Makefile.in
httrack-3.47.21/m4/Makefile.in
httrack-3.47.21/man/Makefile.in
httrack-3.47.21/src/Makefile.in
httrack-3.47.21/templates/Makefile.in
httrack-3.47.21/tests/Makefile.in
httrack-3.47.21/tests/check-network_sh.cache
dpkg-source: error: aborting due to unexpected upstream changes, see
/tmp/httrack_3.47.21-1ac1.diff.BOw0fx
dpkg-source: info: you can integrate the local changes with dpkg-source --commit
dpkg-buildpackage: error: dpkg-source -b httrack-3.47.21 gave error exit status
2
Manually removing all those autogenerated files is a workaround. Probably there
is some automake foo that will delete them, other than perhaps the one
byproduct of tests.
What version of httrack are you using? On what operating system?
Debian source package httrack 3.47.21-1
Original issue reported on code.google.com by [email protected]
on 14 Jul 2013 at 6:49
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.