poeml / mirrorbrain Goto Github PK
View Code? Open in Web Editor NEWMirrorBrain
Home Page: http://mirrorbrain.org/
License: Other
MirrorBrain
Home Page: http://mirrorbrain.org/
License: Other
[ ]
Title cosmetical improvement of per-file mirror lists, when used without
mod_asn
Priority wish Status chatting
Superseder Nosy List poeml
Assigned poeml Keywords
To
When MirrorBrain is used without mod_asn, the per-file mirror lists contain places
with missing information which looks a little silly:
List of best mirrors for IP address 83.133.126.38, located in country DE, -- (AS--
).
To avoid this, the mod_mirrorbrain Apache module, which generates the listing,
should check whether the data is available first, and if not, leave out the empty
placeholder.
NewInterface
History
Date User Action Args
nosy: - funnycafeteria6
2014-01-03 00:02:26 poeml set title: Monitor -> cosmetical
improvement of per-file mirror
lists, when used without mod_asn
2014-01-03 00:02:01 poeml set files: - sa13.html
2014-01-03 00:01:59 poeml set files: - lmr13.html
2013-10-08 14:50:37 funnycafeteria6 set files: + lmr13.html
title: NewInterface -> Monitor
files: + sa13.html
nosy: + funnycafeteria6
2013-08-26 14:20:09 funnycafeteria6 set messages: + msg429
title: cosmetical improvement of
per-file mirror lists, when used
without mod_asn -> NewInterface
2010-02-21 16:56:32 poeml create
no content (issue created during migration from old issue tracker, as placeholder)
no content (issue created during migration from old issue tracker, as placeholder)
[ ]
Title "mb file ls" doesn't insert slash in URLs where it might be missing
Priority bug Status resolved
Superseder Nosy List poeml
Assigned To poeml Keywords
When an URL doesn't end in a slash, the output of "mb file ls" prints broken URLs
(that lack a slash).
Fixed in trunk (r7943).
http://svn.mirrorbrain.org/viewvc/mirrorbrain?view=revision&revision=7943
History
Date User Action Args
2010-01-20 18:46:05 poeml set status: testing -> resolved
messages: + msg118
2010-01-20 18:40:25 poeml create
Title testing the bug tracking
Priority wish Status resolved
Superseder Nosy List poeml, poemltest
Assigned To Keywords
by poeml.
msg62 (view) Author: poemltest Date: 2009-11-04.23:12:21
fresh comment
msg63 (view) Author: poeml Date: 2009-11-04.23:28:13
some words
msg78 (view) Author: poeml Date: 2009-12-01.21:48:06
siehe issue 11
oder auch http://mirrorbrain.org/issues/issue11
History
Date User Action Args
2009-12-01 22:11:36 poeml set status: chatting ->
resolved
status: resolved ->
2009-12-01 21:48:14 poeml set chatting
messages: + msg78
2009-12-01 20:57:08 poeml set status: chatting ->
resolved
status: resolved ->
2009-11-04 23:28:13 poeml set chatting
messages: + msg63
2009-11-04 23:23:34 poeml set status: done-cbb ->
resolved
2009-11-04 23:23:23 poeml set status: deferred ->
done-cbb
2009-11-04 23:23:06 poeml set status: chatting ->
deferred
2009-11-04 23:12:22 poemltest set messages: + msg62
2009-11-04 23:11:41 poeml set messages: - msg57
2009-11-04 23:11:40 poeml set messages: - msg59
2009-11-04 23:11:38 poeml set messages: - msg61
2009-11-04 23:11:21 poeml set messages: - msg58
2009-11-04 23:11:19 poeml set messages: - msg60
2009-11-04 23:10:54 poeml set messages: + msg61
2009-11-04 23:10:20 poemltest set messages: + msg60
2009-11-04 23:09:11 poemltest set messages: + msg59
2009-11-04 23:08:36 poemltest set nosy: + poemltest
messages: + msg58
status: unread ->
2009-11-04 23:04:03 poeml set chatting
nosy: + poeml
messages: + msg57
2009-11-04 23:03:24 poemltest set priority: bug ->
wish
2009-11-04 23:02:27 poemltest create
Title testing bug tracking
Priority wish Status resolved
Superseder Nosy List myorinah, poeml
Assigned To anonymous Keywords
by poeml.
msg50 (view) Author: myorinah Date: 2009-11-04.21:23:26
testtest
msg51 (view) Author: poeml Date: 2009-11-04.21:25:23
a comment
msg52 (view) Author: poeml Date: 2009-11-04.21:30:14
another comment
msg53 (view) Author: myorinah Date: 2009-11-04.21:34:11
just another comment
msg54 (view) Author: myorinah Date: 2009-11-04.21:36:06
just another comment Part 2
msg55 (view) Author: poeml Date: 2009-11-04.21:38:15
a comment from me
msg56 (view) Author: poeml Date: 2009-11-04.22:41:28
another comment
msg74 (view) Author: poeml Date: 2009-12-01.20:57:25
closing...
History
Date User Action Args
status: chatting ->
2009-12-01 20:57:26 poeml set resolved
messages: + msg74
2009-11-04 22:41:28 poeml set messages: + msg56
2009-11-04 22:30:56 poeml set nosy: - anonymous
2009-11-04 21:38:15 poeml set messages: + msg55
2009-11-04 21:36:06 myorinah set messages: + msg54
2009-11-04 21:35:22 poeml set nosy: + poeml
2009-11-04 21:34:11 myorinah set messages: + msg53
2009-11-04 21:30:14 poeml set messages: + msg52
status: unread ->
2009-11-04 21:25:23 poeml set chatting
messages: + msg51
2009-11-04 21:23:26 myorinah create
[ ]
Title mod_mirrorbrain fails when Apache configuration is for a symlink,
not a directory
Priority bug Status resolved
Superseder Nosy List dfarning, poeml
Assigned poeml Keywords
To
When the Apache config for mod_mirrorbrain is done on a block that really is a
symlink, the module fails to redirect correctly. The symptom is (thanks David Farning for the
report, for digging into it and finding the culprit):
Judging form the log files it looks like either mod_mirrorbrain of the
is chopping the several character form the beginning of the string it
is searching in the database.
...
Once the the apache config was working pointing at a normal directory
it worked correctly.
I wonder how we can make this more foolproof. I can think of two ways:
add documentation that explains this -- but how to make sure that it is not missed?
add a check to mod_mirrorbrain that is performed at Apache start time, which resolves the
directory to its canonical path and checks whether they match. If the check fails, either
prevent starting or log an error message.
The config typically looks like this:
<Directory /srv/mirrors/openoffice>
MirrorBrainEngine On
...
Looking at the code, I actually see a comment that confirms the above assumptions:
/* XXX we should forbid symlinks in mirror_base */
filename = apr_pstrdup(r->pool, ptr + strlen(cfg->mirror_base));
I think 3) is the way to go to prevent us from running into this again!
Checking for a symlink could be done by stating the path and looking at st_mode with the
POSIX macro S_ISLNK. But it might be a waste of resources to do this check too often (the
MirrorBrainEngine directive needs to be merged for each request).
I can reproduce this bug by creating a symlink "/srv/ooo" pointing to "/srv/ooo.off"
which I moved away -- if I set the option FollowSymlinks on the directory "/srv":
[mod_mirrorbrain] MirrorBrainEngine On, mirror_base '/srv/ooo/'
[mod_mirrorbrain] URI: '/extended/iso/de/foo.txt.mirrorlist'
[mod_mirrorbrain] filename: '/srv/ooo/extended/iso/de/foo.txt.mirrorlist'
[mod_mirrorbrain] File does not exist according to r->finfo
[mod_mirrorbrain] Representation chosen by .mirrorlist extension
[mod_mirrorbrain] r->uri: '/extended/iso/de/foo.txt.mirrorlist'
[mod_mirrorbrain] r->uri: '/extended/iso/de/foo.txt'
[mod_mirrorbrain] could not resolve country
[mod_mirrorbrain] could not resolve continent
[mod_mirrorbrain] Country '--', Continent '--'
[mod_mirrorbrain] AS '--', Prefix '--'
[mod_mirrorbrain] Canonicalized file on disk: /srv/ooo.off/extended/iso/de/foo.txt
[mod_mirrorbrain] SQL file to look up: off/extended/iso/de/foo.txt
[mod_mirrorbrain] Successfully acquired database connection.
[mod_mirrorbrain] classifying 0 mirrors: 0 prefix, 0 AS, 0 country, 0 region, 0 elsewhere
(-1)Unknown error 4294967295: [mod_mirrorbrain] Could not retrieve row from database for
off/extended/iso/de/foo.txt (size: 18, mtime 1283190051): Likely cause: there are no
hashes in the database (yet).
[mod_mirrorbrain] no hashes found in database
[mod_mirrorbrain] Sending mirrorlist
Note the wrong path being looked up.
I can not reproduce the bug if the FollowSymlinks option is not set, because then
something different happens:
[error] [client 127.0.0.1] Symbolic link not allowed or link target not accessible:
/srv/ooo
[error] [client 127.0.0.1] no acceptable variant:
/usr/share/apache2/error/HTTP_FORBIDDEN.html.var
The latter will probably lead the admin to change the configuration to achieve the former
situation.
I'm tossing/testing a fix...
It might happen that users change the file system layout also during runtime. A directory might be moved away
and replaced with a symlink. Or a filesystem might be accidentally unmounted at Apache start, or mounted at
the wrong place, and the user might try a symlink. In this case, it isn't sufficient if Apache checks only at
start.
One might argue that canonicalizing the base directory shouldn't be done for each request.
One might also argue that the non-obviousness of the symptom, as well as the potentially saved hair is a good
reason to do the check each time.
The following patch deals with symlinks that exist already when Apache starts, but doesn't deal with later
changes, because the check runs only during merging the directory configuration:
@@ -471,8 +471,25 @@
{
mb_dir_conf *cfg = (mb_dir_conf *) config;
cfg->engine_on = flag;
ap_log_error(APLOG_MARK, APLOG_WARNING, 0, NULL,
"[mod_mirrorbrain] Document root \'%s\' does not seem to "
"exist. Filesystem not mounted?",
cmd->path);
/\* if a symlinks is used, it must exist at Apache's start already */
cfg->mirror_base = apr_pstrdup(cmd->pool, cmd->path);
return NULL;
cfg->mirror_base = apr_pstrcat(cmd->pool, cn, "/", NULL);
return NULL; /\* works in both cases */
/\* when could this occur? */
return apr_psprintf(cmd->pool, "symlink? path: %s vs. %s", cmd->path, cn);
static const char *mb_cmd_debug(cmd_parms *cmd, void *config, int flag)
The following patch would be the version that checks with each request:
@@ -1475,18 +1475,29 @@
/* prepare the filename to look up */
setenv_give(r, "file");
return HTTP_INTERNAL_SERVER_ERROR;
/\* this should never happen, because the MirrorBrainEngine directive would never
\* be applied to a non-existing directories */
ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
"[mod_mirrorbrain] Document root \'%s\' does not seem to "
"exist. Filesystem not mounted?", cfg->mirror_base);
return HTTP_INTERNAL_SERVER_ERROR;
Committed in r8114.
http://svn.mirrorbrain.org/viewvc/mirrorbrain?view=revision&revision=8114
Will be included in 2.13.0.
David, thanks again for you help (and patience)!
History
Date User Action Args
status: testing -> resolved
2010-09-06 14:24:49 poeml set nosy: + dfarning
messages: + msg230
2010-09-06 14:18:36 poeml set status: in-progress -> testing
messages: + msg229
2010-09-06 13:26:10 poeml set messages: + msg228
2010-09-06 01:38:39 poeml set status: chatting -> in-progress
messages: + msg226
2009-10-26 20:04:47 poeml create
[ ]
Title Could the metalink-hasher create Torrent files in the same step?
Priority wish Status resolved
Superseder Nosy List poeml
Assigned To poeml Keywords
Vittorio suggested that the metalink-hasher might create torrent files along the way.
The metalink-hasher is a Python program which walks the file tree and looks what needs
to be done; it calls the "metalink" binary which is a standalone binary.
Darix suggests to look at make_torrent example from libtorrent-rasterbar and either
call that or even link libtorrent-rasterbar into metalink. That way you only need to
read each block once for hashing.
http://www.rasterbar.com/products/libtorrent/
http://www.rasterbar.com/products/libtorrent/client_test.html
Maybe metalink could be extended easily to do the same. Or, maybe the output of
"metalink" could be used - after all, it outputs a series of hashes. However, maybe the
blocks would not fit?
Darix suggests to talk to hydri in #libtorrent.
Darix suggests as first approach to hack make_torrent calls into the python
thingie (metalink-hasher).
See http://www.rasterbar.com/products/libtorrent/examples.html
Looking further, this is all straightforward - metalink-hasher could provide the
hashes in a form digestible by Apache, and they would contain all that's needed.
Apache could easily generated the torrent live.
However the question arises whether it might be better to have real .torrent files
on disk (and maybe even next to the original files), simply for two reasons:
For practical reasons, the latter might be more important than the former.
Links about generation of torrents:
http://en.wikipedia.org/wiki/BitTorrent_(protocol)#Creating_and_publishing_torrent
s
http://www.bittorrent.org/beps/bep_0003.html
http://wiki.theory.org/BitTorrentSpecification
http://en.wikipedia.org/wiki/Bencode
http://pypi.python.org/pypi/BitTorrent-bencode/5.0.8
This requires a hash cache redesign -- tracked in issue 40.
Hash cache redesign largely finished, and in testing now.
Torrent support working already in trunk. Now figuring out things like updating
the tracker servers, and exact bittorrent format that is suitable for as many
clients as possible.
another interesting link, regarding hashes to be included:
http://wiki.depthstrike.com/index.php/P2P:Protocol:Specifications:Optional_Hashes
This issue can be closed as resolved. Way cool!
History
Date User Action Args
2010-09-05 23:45:03 poeml set assignedto: poeml
2010-09-05 23:44:56 poeml set status: testing -> resolved
messages: + msg219
2010-03-31 20:04:21 poeml set messages: + msg173
2010-03-31 19:29:05 poeml set status: deferred -> testing
messages: + msg169
2010-03-08 21:20:32 poeml set status: chatting -> deferred
2010-03-08 20:48:00 poeml set messages: + msg138
2010-02-25 00:02:38 poeml set messages: + msg134
2010-02-24 23:55:12 poeml set messages: + msg133
2010-02-04 11:43:28 poeml set messages: + msg121
2010-02-03 22:48:44 poeml set messages: + msg120
2010-02-03 22:45:30 poeml create
[ ]
Title scanner doesn't switch to next protocol in some cases
Priority bug Status deferred
Superseder Nosy List poeml
Assigned To Keywords scanner
Normally, the scanner falls back to another protocol if an URL doesn't work. For some reason, this
doesn't seem to happen when the (broken) opensuse-linuxmigratio.at mirror is being scanned. There,
rsync is unreachable and HTTP is reachable (although it doesn't serve any files anymore).
The scanner should fall back to HTTP and clean up the file list from there, but it doesn't happen:
% mb scan migratio
Thu Oct 8 10:24:47 2009 opensuse-linuxmigratio.at: starting
Thu Oct 8 10:24:49 2009 opensuse-linuxmigratio.at: total files before scan: 98701
DIE: (=> /usr/bin/scanner 312 main::rsync_readdir => /usr/bin/scanner 971 main::rsync_get_filelist)
opensuse-linuxmigratio.at: connect: Connection refused
Completed in 1 seconds
As a result, the file list is not cleaned up and the mirror continues to be used.
issue 28 seems to be another case where the fallback to the next protocol doesn't
happen as expected.
Slightly different reason, same symptom:
mirror.pacific.net.au: @error: max connections (15) reached -- try again later
Fri Dec 11 20:53:44 2009 mirror.pacific.net.au: starting
Fri Dec 11 20:53:45 2009 mirror.pacific.net.au: total files before scan: 2516
DIE: (=> /usr/bin/scanner 310 main::rsync_readdir => /usr/bin/scanner 972
main::rsync_get_filelist)
mirror.pacific.net.au: [#2, id=102 pid=9324 exit: 65280]
History
Date User Action Args
2014-01-22 20:05:20 poeml set files: - neli15.html
2013-10-17 17:28:58 funnycafeteria6 set files: + neli15.html
2009-12-11 22:17:23 poeml set messages: + msg110
2009-12-11 21:20:19 poeml set messages: + msg107
2009-12-07 03:15:54 poeml set status: unread -> deferred
2009-12-01 20:55:59 poeml set keyword: + scanner
2009-10-08 08:38:23 poeml create
[ ]
Title scanner should handle mirrors, which cant be scanned better
Priority wish Status chatting
Make an option to ignore timeouts during darix,
Superseder scans. Nosy List poeml
View: 107
Assigned To Keywords scanner
scanner should handle mirrors, which cant be scanned better
options:
Thank you for the report!
I can reproduce it with the mirror that you mentioned in IRC:
Tue Dec 1 18:52:12 2009 www.muug.mb.ca: starting
Tue Dec 1 18:52:17 2009 www.muug.mb.ca: total files before scan: 0
www.muug.mb.ca: Moved to ftp location, assuming success if followedwww.muug.mb.ca: Moved to
ftp location, assuming success if followedwww.muug.mb.ca: Moved to ftp location, assuming
success if followedwww.muug.mb.ca: Moved to ftp location, assuming success if
followedwww.muug.mb.ca: Moved to ftp location, assuming success if followed__DIE__:
(/usr/bin/scanner 1026 main::sread => /usr/bin/scanner 989 (eval) => /usr/bin/scanner 989
main::ANON)
DIE: (/usr/bin/scanner 968 main::rsync_get_filelist => /usr/bin/scanner 1134
main::muxread => /usr/bin/scanner 1026 main::sread)
rsync timeout...
Completed in 10.8 minutes
This is related to issue 11.
I agree with the suggestion, that it might be useful if the mirror is disabled when scanning
problems occur, especially if they persist. A good notification about the problem could be a
good replacement for automatic action, fully agreed.
On the other hand, it is also important to find out wether the scanner crashes for some
reason, because it would also be good if it continues its work. In the above case, the scan
via rsync ran into a timeout. There's not much to do though, and it wouldn't be useful if the
scan cannot be takes days either.
History
Date User Action Args
2012-04-16 04:51:59 poeml set superseder: + Make an option to ignore
timeouts during scans.
status: unread -> chatting
2009-12-01 21:53:42 poeml set nosy: + poeml
messages: + msg79
2009-12-01 20:54:43 poeml set keyword: + scanner
2009-12-01 18:13:28 darix create
[ ]
Title Apache's directory indexes could include hashes - or a link to them
Priority wish Status resolved
Superseder Nosy List poeml
Assigned To poeml Keywords
(blocked by issue 40)
The directory indexes that Apache generates could include links to hashes, like
md5 and sha1. In fact, they could even display the hashes directly, if that would
make sense.
From the old TODO file: 'add "md5" and "sha1" or "checksums" link to
mod_autoindex_mb. Make it configurable so it can be switched off for the very
large directories that are already slow to show anyway.'
I think it doesn't make sense to display the hashes directly. That would work for
one hash maybe, but not for several, long hashes. All in all, there is a lot of
additional information about files that MirrorBrain now stores.
The current solution is to display a "Details" link, nothing else. And that links
to the page which shows the complete known metadata.
As such, this issue can be closed as resolved.
History
Date User Action Args
2010-09-06 00:14:32 poeml set status: deferred -> resolved
messages: + msg225
2010-03-08 21:23:01 poeml set messages: + msg143
2010-03-08 21:20:14 poeml create
[ ]
Title new "mb db shell" command can't be resumed after suspension to
background
Priority bug Status resolved
Superseder Nosy List poeml
Assigned poeml Keywords
To
(#:~)- mb db shell
psql (8.4.1)
Type "help" for help.
mb=>
[1] + 6290 suspended mb db shell
(root@mbwork)(235/pts/0)(03:46am:12/07/09)-
(#:~)- fg
fg
[1] + 6290 continued mb db shell
It hangs then. Probably psql is spawned in the wrong way, gets the SIGTSTP but not
the SIGCONT.
Indeed, exactly that happens:
(suspended)
root 6330 2.1 1.8 14096 9268 pts/0 T 03:49 0:00 | _
/usr/bin/python /usr/bin/mb db shell
root 6332 0.1 0.4 7560 2396 pts/0 T 03:49 0:00 | _
psql
(resumed)
root 6330 0.7 1.8 14096 9268 pts/0 S+ 03:49 0:00 | _
/usr/bin/python /usr/bin/mb db shell
root 6332 0.0 0.4 7560 2396 pts/0 T+ 03:49 0:00 | _
psql
fixed in r7925
http://svn.mirrorbrain.org/viewvc/mirrorbrain/trunk/mirrordoctor/mb/dbmaint.py?
r1=7925&r2=7924&pathrev=7925
closing, since it's fixed and works.
History
Date User Action Args
2009-12-11 21:17:56 poeml set status: testing -> resolved
messages: + msg106
2009-12-07 03:14:46 poeml set status: chatting -> testing
messages: + msg99
2009-12-07 02:50:16 poeml set messages: + msg98
2009-12-07 02:48:23 poeml set status: unread -> chatting
messages: + msg97
2009-12-07 02:47:40 poeml create
[ ]
Title add support for mirrors and checksums in Link headers (RFC 6249)
Priority wish Status resolved
Superseder Nosy List ant, poeml
Assigned To poeml Keywords
There is a proposal for transmitting information about mirrors and checksums to clients using Link
headers which looks like:
Link: <http://www2.example.com/example.ext>; rel="duplicate"
Link: <ftp://ftp.example.com/example.ext>; rel="duplicate"
Link: <http://example.com/example.ext.torrent>; rel="describedby";
type="application/x-bittorrent"
Link: http://example.com/example.ext.metalink; rel="describedby";
type="application/metalink4+xml"
Link: http://example.com/example.ext.asc; rel="describedby";
type="application/pgp-signature"
Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=
See IETF draft http://tools.ietf.org/html/draft-bryan-metalinkhttp
Most of this can be easily added, there is one thing that I need to change first though:
At the moment, Metalink hashes are cached to disk in a form that is suitable for direct inclusion
into Metalinks:
<verification>
<hash type="md5">e8ad5924dcef6c25a3455230c46a4caa</hash>
<hash type="sha1">8094b506b9115abc2eb174a35e8bc84b8f72f0a9</hash>
<hash type="sha256">2104ed8aa2f4af920c1669585eeaabb0c94ace6cb92e67cbd3ab04b2bb7356b5</hash>
<pieces length="262144" type="sha1">
<hash piece="0">8094b506b9115abc2eb174a35e8bc84b8f72f0a9</hash>
</pieces>
</verification>
Or, an example with PGP signature:
<verification>
<signature type="pgp" file="openSUSE-11.2-NET-i586.iso.asc">
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
iD8DBQBK+ATuqE7a6JyACsoRArqBAJ0ViDK4IUQPKYz1qbXivJielVCkDACf
VCZ4fiIU8640lArqhzu9QuTRL0s=
=2F9I
-----END PGP SIGNATURE-----
</signature>
<hash type="md5">bfb98c4b2e079f9d147b53d3fc9495c5</hash>
<hash type="sha1">8e5854c6e00b7a0f124c3060da4184e6d5f8d6b2</hash>
<hash type="sha256">9de4a0b44f7c474929ece46481a783500078fb3b2f05b885069a74aff198fc7f</hash>
<pieces length="524288" type="sha1">
<hash piece="0">8fc60d0c4918bf53ad7858196633b05a4ca4b060</hash>
<hash piece="1">a4860b0e708063900253be974e7ebcd9a50c660b</hash>
[...]
<hash piece="214">11ddc7c352d9f017f6c00822c3d4be540dc9ad38</hash>
<hash piece="215">e5ec3ff4693175b7da90f3bc1fdf1ee5d7f3f20a</hash>
<hash piece="216">897256b6709e1a4da9daba92b6bde39ccfccd8c1</hash>
</pieces>
</verification>
That was fine so far because Apache just has to open the file and can directly write it to the
network, while sending the metalink.
Now, we'll need access to the individual data in that snippet. Thus, the format of storing the data
needs to be changed (or the XML parsed by Apache, but that sounds an ugly option). I'm thinking of a
text-based format. It should be optimized for parsing with low overhead.
Maybe a simple series of null-terminated strings, and no newlines (because then we can store the
multi-line PGP signature string without modifications):
hash \0
hash \0
hashpieces \0
hashpiece 0 \0
hashpiece 1 \0
hashpiece 2 \0
pgp \0
EOF
Maybe, if looking around a bit in Apache, something else springs to eye which is suitable for
reading in the data quickly.
Once the data can be read in once & quickly during the request processing phase, it's available to
do the most wonderful things. In particular we can also easily implement a "checksum server" that
returns a checksum for any file when .md5 or .sha1 or .sha256 is appended to an URL. And of course
we can send instance digests, as requested here.
Additional thought about the hash store file format: An identifier and version
number should added to the beginning.
Cf. issue #40, where the hash cache redesign is tracked now.
Issue 40 which was blocking this issue is mostly done.
A lot of groundwork for this has been done:
Now it is just a matter of using the data and writing it to HTTP headers.
This should be made configurable maybe - because it causes Apache nee a little bit
more resources, which may or may not be desired. (Of course, it would be cool if
it just "happens", and the default should probably be that the headers are
included.)
just to update this bug,
RFC 6249 ( http://tools.ietf.org/html/rfc6249 ) describes this, Metalink/HTTP.
poeml, what should a student know about this bug? where to start within the source?
what configurable options do you need?
On (default) /Off
Amount of Mirrors to emit over HTTP?
which full file hashes to include? all known full file hashes, or just some?
status of this feature:
low-hanging fruit; all the hard work should be done already.
The following needs to be done:
regarding the number of mirrors, it is important to limit it, or the HTTP
response could easily become huge. The top ten mirrors should be more than
enough. Luckily, there is a convenient get_n_best_mirrors() function in
mod_mirrorbrain to get the desired list of best mirrors.
So, HTTP replies can get long now -- here's what it looks in my testing now:
HTTP/1.1 302 Found
Date: Wed, 11 Apr 2012 19:40:39 GMT
Server: Apache/2.2.17 (Linux/SUSE)
X-Prefix: 87.78.0.0/15
X-AS: 8422
X-MirrorBrain-Mirror: ftp.fernuni-hagen.de
X-MirrorBrain-Realm: country
Link: http://10.0.0.17/du.list.meta4; rel=describedby; type="application/metalink4+xml"
Link: http://10.0.0.17/du.list.asc; rel=describedby; type="application/pgp-signature"
Link: http://10.0.0.17/du.list.torrent; rel=describedby; type="application/x-bittorrent"
Link: http://ftp.fernuni-hagen.de/ftp-dir/pub/mirrors/www.openoffice.org/du.list; rel=duplicate;
pri=1; geo=de
Link: http://sunsite.informatik.rwth-aachen.de/ftp/pub/mirror/OpenOffice/du.list; rel=duplicate;
pri=2; geo=de
Link: ftp://ftp.uni-muenster.de/pub/software/OpenOffice/du.list; rel=duplicate; pri=3; geo=de
Link: http://ftp5.gwdg.de/pub/openoffice/du.list; rel=duplicate; pri=4; geo=de
Link: http://ftp-stud.hs-esslingen.de/pub/Mirrors/ftp.openoffice.org/du.list; rel=duplicate; pri=5;
geo=de
Digest: MD5=mertNzkLoFcfjShYKf9j/A==
Digest: SHA=SXw8fhX2ZMHasmbFbSWjpeUn/bQ=
Digest: SHA-256=WVwzYHQVWTdFBKJacO4Bz2Fz60XHjtpLf0IG9KRuOjM=
Location: http://ftp.fernuni-hagen.de/ftp-dir/pub/mirrors/www.openoffice.org/du.list?
time=1334173239&stamp=a4cdbfe80df5c0f28621b68e0a3ade69
Content-Type: text/html; charset=iso-8859-1
I think this bug can be closed. Code will be included in the next release.
History
Date User Action Args
2012-04-14 21:46:18 poeml set status: testing -> resolved
messages: + msg379
2012-04-11 19:41:53 poeml set status: chatting -> testing
messages: + msg376
2012-03-30 12:12:28 poeml set messages: + msg367
messages: + msg366
2012-03-28 16:05:56 ant set title: add support for mirrors and checksums
in Link headers -> add support for mirrors and
checksums in Link headers (RFC 6249)
2010-09-05 23:53:14 poeml set assignedto: poeml
2010-09-05 23:52:30 poeml set messages: + msg220
2010-03-12 02:51:00 poeml set messages: + msg159
2010-03-08 20:46:21 poeml set messages: + msg136
2009-12-11 21:43:08 poeml set messages: + msg109
2009-12-11 21:41:21 poeml set messages: + msg108
2009-11-04 16:32:24 ant set nosy: + ant
2009-10-09 00:49:17 poeml create
[ ]
Title error looking up file in database
Priority urgent Status resolved
Superseder Nosy List poeml, theuni
Assigned To poeml Keywords
Not quite sure what happened here...
Everything was fine yesterday, now none of my links are getting redirected. I
upgraded to 2.11.0, I suspect that's the cause. More specifically, i've got my
money on r7887.
Each click logs one of these in error.log:
(-1)Unknown error 18446744073709551615: [mod_asn] Error retrieving row from
database for 67.191.204.232, referer: http://mirrors.xbmc.org/addons/
(-1)Unknown error 18446744073709551615: [mod_asn] Error retrieving row from
database for 67.191.204.232, referer: http://mirrors.xbmc.org/addons/skin/
[mod_mirrorbrain] Error looking up addons/skin/Basics-Vision.tar.gz in database,
referer: http://mirrors.xbmc.org/addons/skin/
after a 'mb scan -a -j 4', note that:
'mb probefile addons/skin/Basics-Vision.tar.gz' turns up a bunch of servers.
as does
'mb file ls addons/skin/Basics-Vision.tar.gz'
All links are exhibiting this behavior, not just this one.
Other info:
Debian Lenny
MirrorBrain 2.11.0 (from repo)
Linux mirrors.xbmc.org 2.6.26-2-amd64 #1 SMP Thu Nov 5 02:23:12 UTC 2009 x86_64
GNU/Linux
I was about to ask you to run with "MirrorBrainDebug On" in Apache's directory
context, but I just learned in IRC that you already figured out what's broken:
due to a missing include, the new compile fix for APR < 1.3 is not effective.
New release to be done asap...
Fixed in trunk
http://svn.mirrorbrain.org/viewvc/mirrorbrain?view=revision&revision=7898
Fixed packages are available.
On another note, some of the error messages in the log are from mod_asn, which
should not be directly influenced by the recent regression. If it is, it might be
a side effect of the database adapter being misused and becoming confused. Do
those messages go away?
mod_asn might need a similar APR 1.2 fix as mod_mirrorbrain, even though it might
not be immediatly surfacing as a bug because mod_asn always retrieves only a
single database row. I noticed that ASN lookups were working correctly (telling
from the HTTP headers of your server) even with the broken 2.11.0.
The mod_asn messages are also gone. Great!
So mod_asn seems to work either way. Probably because it retrieves only one row,
and after that it clears the cursor by accessing the next row. Also the latter
seems to work in either case, because if clearing the cursor does not return -1,
an error message would be logged.
I just reproduced locally that the lookup is correct in both cases. However, I'm
thinking that a fix might be sensible, for whatever side effect this could have in
the database adapter.
So, I committed this now:
http://svn.mirrorbrain.org/viewvc/mod_asn/trunk/mod_asn.c?r1=73&r2=72&pathrev=73
(For the record, issue 7 is where the bug was dealt with first.)
History
Date User Action Args
2009-12-03 23:04:33 poeml set status: chatting -> resolved
messages: + msg87
2009-12-03 12:39:16 poeml set status: resolved -> chatting
messages: + msg86
2009-12-03 12:14:40 poeml set status: in-progress -> resolved
messages: + msg85
2009-12-03 11:24:16 poeml set messages: + msg84
2009-12-03 10:03:04 poeml set priority: bug -> urgent
status: unread -> in-progress
2009-12-03 07:08:50 theuni create
[ ]
Title mirrorprobe sometimes times out with an exception
Priority bug Status resolved
Superseder Nosy List poeml
Assigned To poeml Keywords
There is a certain exception that the mirrorprobe might run into, which is not caught by the code,
even though it should. Symptom:
Exception in thread probeThread-273:
Traceback (most recent call last):
File "/usr/lib/python2.5/threading.py", line 486, in __bootstrap_inner
self.run()
File "/usr/lib/python2.5/threading.py", line 446, in run
self.__target(_self.__args, *_self.__kwargs)
File "/usr/bin/mirrorprobe", line 97, in probe_http
mirror.response = e.read()
File "/usr/lib/python2.5/socket.py", line 291, in read
data = self._sock.recv(recv_size)
File "/usr/lib/python2.5/httplib.py", line 509, in read
return self._read_chunked(amt)
File "/usr/lib/python2.5/httplib.py", line 544, in _read_chunked
line = self.fp.readline()
File "/usr/lib/python2.5/socket.py", line 331, in readline
data = recv(1)
timeout: timed out
The code does have exception handling for this case. However, the exception, which happens deep
in the socket module, is not correctly passed to upper layers. Thus, there doesn't seem to be a way
to get hold of it.
In trunk, I have now wrapped the entire thread code into another try-except block. That is ugtly,
but helps.
r8022 | poeml | 2010-03-27 16:19:13 +0100 (Sat, 27 Mar 2010) | 8 lines
mirrorprobe:
History
Date User Action Args
2010-03-27 15:20:57 poeml set status: testing -> resolved
messages: + msg166
2010-03-27 15:10:54 poeml create
[ ]
Title add magnet links to Metalinks
Priority wish Status in-progress
Superseder Nosy List ant, poeml
Assigned To poeml Keywords
Files
File name Uploaded Type Edit Remove
dax33.html funnycafeteria6, 2013-10-18.20:49:40 html
The planned redesign of the hash cache will (should) also allow to trivially add
magnet links into metalinks.
See http://en.wikipedia.org/wiki/Magnet_URI_scheme
For now, this is blocked by issue 40.
Issue #40 is resolved far enough so to not block this change anymore.
For Magnet links, we need to base32-encode stuff in Apache.
h = hashlib.sha1()
h.update('asdf')
base64.b32encode(h.digest())
'HWSUCVMZDCUARQSAFO5FAEXWYYFSOZQ4'
Ah, I'm relieved:
http://wiki.bitcomet.com/Peers,_seeds,_torrent,_tracker,_DHT,_Peer_Exchange_(PEX),_Magnet_Links
The first implementations for Magnet Links required that the
BitTorrent hash values contained in Magnet Links be Base32
encoded.^[7] Later that was changed to hex encoding, which is the
format recommended at this point for magnet links by the official
BitTorrent specifications.^[8]
[8] http://www.bittorrent.org/beps/bep_0009.html#magnet-uri-format
Basic magnet links implemented in new metalinks now. (r7988)
I think the magnet links should be shown in the mirror list as well.
Also there should be a handler that replies with the magnet link (or
redirects to one??) when .magnet is appended to an URL. Is that
possible?
Example:
% curl -s "http://192.168.0.115/extended/3.1.1rc2/OOo_3.1.1rc2_20090820_Win32Intel_langpack_sw-TZ.exe?meta4" | grep magnet
magnet:&xl=24822941&dn=OOo_3.1.1rc2_20090820_Win32Intel_langpack_sw-TZ.exe?xt=urn:sha1:0e1aefc1df0ba4c147fb36d3c62d000441e6b945?xt=urn:bith:0e1aefc1df0ba4c147fb36d3c62d000441e6b945&xt=urn:md5:c7feb5365c2e9ce205ba212c6fa19aa0
mediatype changed to "torrent". See metalink-discussion list for further
discussion.
The hashes that our current magnet links contain are not optimal.
(Info by Harold Feit following below.)
On the URL http://opensource.depthstrike.com/, several examples can be found.
Further note:
as= and xs= are compatible with the sha1 Magnets. But not btih ones.
Minimal example for an SHA1 Magnet:
xt=urn:sha1:<file's base32 encoded sha1 hash>&xl=<file's size>&dn=<file's
name>&as=
History
Date User Action Args
2013-10-18 20:49:40 funnycafeteria6 set files: + dax33.html
2010-04-23 02:53:24 poeml set messages: + msg180
2010-03-31 20:30:46 poeml set messages: + msg176
2010-03-29 06:45:04 ant set nosy: + ant
2010-03-12 02:47:41 poeml set messages: + msg157
2010-03-12 02:01:53 poeml set messages: + msg156
2010-03-12 02:00:40 poeml set status: deferred -> in-progress
messages: + msg155
2010-03-12 00:52:44 poeml set messages: + msg154
2010-03-12 00:33:54 poeml set messages: + msg153
2010-03-12 00:02:46 poeml set messages: + msg152
2010-03-10 00:18:06 poeml set assignedto: poeml
2010-03-10 00:16:17 poeml set status: chatting -> deferred
messages: + msg147
2010-02-16 17:29:37 poeml create
[ ]
Title The documentation needs to be better structured
Priority feature Status resolved
Superseder Nosy List poeml
Assigned To Keywords docs
Here's a (German) proposal from Lars:
Da fรคllt mir noch auf, dass http://mirrorbrain.org/docs/ noch ein wenig
differenzierter gegliedert sein kรถnnte.
Vorschlag zur Umstrukturierung - die รberschriften/Inhalte bleiben
gleich:
Introduction
Installation
Upgrading
Configuring MirrorBrain
Maintaining the mirror database
Tuning guide
Known problems
Documentation for Developers
Release Notes/Change```
History
Dann bliebe aus meiner Sicht noch, die Schritte in "Installation"
etwas "genauer" zu definieren. So das man z.B. direkt Informationen zu
den einzelnen Komponenten findet. (Das Stichwort "geoip" sollte z.B.
nicht nur zu "Using mod_mirrorbrain without GeoIP" fรผhren.) Dann kann
man evtl. auch in den Sektionen "Installation on ...." nur auf
Besonderheiten eingehen?
Mein Vorschlag:
Installation => Prerequirements
=> Components => apache
=> postgresl
=> asn
=> geoip
=> ...
Was hรคlst du davon?
CU,
Lars
A current problem is that some configuration steps are scattered between the platform-
dependent installation sections. This should be cleaned up so that the platform-dependent
sections contain only what's needed to install there (and possibly platform-specific
configuration), while the general configuration instructions should go to the own
chapter.
A separate configuration chapter is warranted I think because the configuration is
generally more work than the installation, and it is generally complex enough be worth
spending more words on it (which would result in considerable duplication otherwise).
I received valuable feedback from a user who installed the software (which worked
satisfactorily), and then the next steps weren't clear at all:
02:19 < TheUni> poeml: if i may suggest, it would be great to have a "I have mirrorbrain
installed... now what" faq, or explanation in the guide
02:19 < TheUni> after it was all up and running, and i had added all of the sites to the
db, i had no idea how to actually USE them.
02:20 < TheUni> it wasn't clear that i had to have a local copy, and even then, that the
mirrors would intercept that local link.
02:37 < TheUni> and secondly, it would be nice to have the option of disabling that
fallback. reason being, if we were flooded suddenly with downloads before our mirrors
sync'd, it would be
possible for those downloads to bring the server down. also possible that
we hit our limits and the host takes the server down, thus it would no longer be able to
act as the
face of the mirrors
02:38 < TheUni> so in that case, it may in fact be better to have broken links for an
hour or so.
And another related suggestion:
It's not clear from the documentation what needs to be done on a regular basis. as far as
probing/scanning the mirrors, as well as the geoip data. Maybe a few words on scheduling.
I think the above would be catered for by the following sequence of documentation
sections:
Introduction
Installation
Prerequirements
Platform-specific instructions
Debian
openSUSE
install from source
Basic setup
What next?
Getting started
Maintaining the mirror database
Configuring Reference
Tuning guide
Special setups
Upgrading
Known problems
In the avove, it should be possible to read the intro, skim the prereqs, then choose the
platform-specific section. There should be clear guiding in the form of "Now, skip to
section XY for your operating system". In the end of the os-specific section, there
should be a direct reference to the place where the documentation picks up again into the
platform-independent part. Very important.
All issues mentioned here have been fixed around r8000-r8004.
History
Date User Action Args
2010-03-14 21:59:53 poeml set status: chatting -> resolved
2010-03-14 21:59:18 poeml set messages: + msg162
2009-12-02 02:31:29 poeml set messages: + msg81
2009-10-26 19:49:13 poeml set messages: + msg41
2009-10-08 11:55:43 poeml create
[ ]
Title Hash cache needs to be more flexible
Priority feature Status resolved
Superseder Nosy List ant, poeml
Assigned To poeml Keywords
The hash cache is too inflexible, in its current on-disk format. It was fine in the past,
where Apache included the ocntents into v3 Metalinks. The snippets on disk were prepared
just for that. However, it's difficult to add further features like
So this is blocking several good things that could be done.
Issue 15 contains some ramblings about this, but let's track this change here.
I currently think that moving the hash into the database might be best. It would definitely
a flexible option without the need to invent an on-disk format and write parsers for it.
Also, it would make the data available to a web frontend.
Before the on-disk format is dropped, we can try how well it works with the database.
As a first step, I have now transferred all functionality from the external metalink-hasher
script into the "mb" tool. Thus, now the database functionality is available for no cost.
In svn trunk, there is now working code that saves the hashes also to the
database. Seems like a good step forward. The code needs more testing to become
robust enough to be used by mod_mirrorbrain.
This is largely done.
Code in metalink-hasher seems to work well, and creates hashes in the
database in addition to the on-disk storage which we keep available for
transition.
The new hashes in the database are not cleaned up yet, if they become
obsolete. Maybe "mb db vacuum" should become involved in the cleanup,
but it would need to look into the file tree for that. It's probably
needed to let mb makehashes clean up per directory. Otherwise files
could very quickly accumulate.
mod_mirrorbrain uses the new hashes from the database and falls back to
on-disk hashes for transition. The new hashes are already used in old
Metalinks, new Metalinks, and also in the mirror lists!
Note to self: need to check whether empty files (0 byte size) are still
handled correctly, or if they need a special case.
What's also missing is a way to switch off (or on) (per /etc/mirrorbrain.conf)
generation of the "expensive" hashes, like torrents and zsync. Maybe with a file
mask or list of directories.
Generation of hashes for zsync and torrents can now be (separately) switched off
in /etc/mirrorbrain.conf.
For the zsync hashes, the default is "off", because Apache currently allocates
large amounts of memory for these large data.
On another note, empty files seem to be handled as they should.
Hence, I regard this bug resolved.
History
Date User Action Args
2010-09-01 16:13:33 poeml set status: testing -> resolved
messages: + msg204
2010-04-23 03:05:37 poeml set status: in-progress -> testing
2010-04-23 03:03:42 poeml set messages: + msg182
2010-03-29 06:44:31 ant set nosy: + ant
2010-03-12 02:57:45 poeml set messages: + msg160
2010-03-11 23:53:06 poeml set messages: + msg150
2010-03-10 00:17:41 poeml set messages: + msg148
2010-03-08 20:44:37 poeml create
[ ]
Title memory leak in the new mb makehashes command
Priority urgent Status resolved
Superseder Nosy List poeml
Assigned To poeml Keywords
The new command 'mb makehashes' accumulates memory.
17622 mirrorbr 20 0 42136 35m 3088 R 65.6 7.3 0:47.82 /usr/bin/python
/usr/bin/mb makehashes /srv/ooo -t /srv/metalink-hashes/srv/ooo
This is after a minute or so. Saw 160m resident set size after half an hour
running before.
Easy fix by cleaning up the hash object, which has a backreference on the parent
object.
Fixed in trunk, r8034.
--- mirrordoctor/mb/hashes.py (revision 8033)
+++ mirrordoctor/mb/hashes.py (working copy)
@@ -194,6 +194,8 @@
c.execute('commit')
self.hb = None
History
Date User Action Args
2010-03-29 05:25:39 poeml set status: chatting -> resolved
messages: + msg168
2010-03-29 05:16:25 poeml create
no content (issue created during migration from old issue tracker, as placeholder)
[ ]
Title mirrorprobe doesn't detect certain errors, where HTTP server replies
with 'OK'
Priority bug Status chatting
Superseder Nosy List poeml
Assigned Keywords
To
The mirror probe doesn't detect certain error scenarios. The most common case of failure of a mirror
is that the HTTP server doesn't reply in time, or returns an unusual HTTP status code (like 500).
This is detected by the mirror probe. This check is currently done only on the base URL (at the top
of the tree).
However, if a mirror administrator reconfigures a mirror e.g. by installing a machine from scratch,
often the base URL will continue to work, it can happen that a fresh webserver install will reply
with a 200 OK response, which will make the mirror probe believe that everything is fine, however the
file tree served before is gone.
So happened with the mirror http://opensuse-linuxmigratio.at/ resp. rsync://opensuse-
linuxmigratio.at/opensuse/
(As an additional bug that occured with this mirror, the scanner didn't clean up the file list. rsync
wasn't reachable anymore, and normally the scanner should fall back to HTTP then for scanning. But
that doesn't happen for this mirror. Will report that in a separate issue.)
What could be done to fix this is an additional check on some actual file in the tree. A file could
be chosen randomly from the database.
Checking on an actual file would mean that a HEAD request would be in order. Right now (when checking
on the base URL), we use a GET request; in the past it happened that a broken mirror would still
reply seemingly okay to HEAD requests but couldn't deliver files when requested with the GET method.
Maybe the "deeper" check could be performed less frequently than the base URL check (not every
minute), but include more thorough functional checks therefore. It would be a great improvement,
because with simple plausibility checks a lot more error conditions could be detected.
Another such check would be a consistency check between the filelists seen through the different
protocols (HTTP/FTP/rsync), to rule out misconfiguration with broken URLs.
The scanner issue is tracked separately in issue 11.
The same happened for the openSUSE mirror operated pop.com.br, who stopped
mirroring openSUSE, and HTTP was still reachable while rsync stopped being
reachable.
History
Date User Action Args
2014-01-22 20:05:04 poeml set files: - wf10.html
2013-11-05 19:26:02 funnycafeteria6 set files: + wf10.html
2009-10-08 23:39:07 poeml set messages: + msg34
2009-10-08 08:38:48 poeml set status: unread -> chatting
messages: + msg26
2009-10-08 08:34:37 poeml create
[ ]
Title "mb scan" may hang, because rsync's "--timeout" option is not
effective at all times
Priority bug Status resolved
Superseder Nosy List poeml
Assigned poeml Keywords scanner
To
rsync should never hang longer than 20 seconds when we probe for files -- since we run it with --timeout=20.
But this timeout doesn't seem to apply to the connect phase of rsync:
\_ /usr/bin/python /usr/bin/mb scan -j 8 -a -d distribution/11.2-Milestone7/iso
| \_ sh -c { rsync -d --timeout=20 rsync://ftp.novell.co.jp/opensuse/distribution/11.2-Milestone7/iso
/tmp/mb_probefile_PChalR/ --list-only; } 2>&1
| _ rsync -d --timeout=20 rsync://ftp.novell.co.jp/opensuse/distribution/11.2-Milestone7/iso
/tmp/mb_probefile_PChalR/ --list-only
strace shows:
connect(3, {sa_family=AF_INET6, sin6_port=htons(873), inet_pton(AF_INET6, "2001:278:101f:1::2", &sin6_addr), sin6_flowinfo=0,
sin6_scope_id=0}, 28
The impact is that this may cause the whole "mb scan" command to hang in the
initial phase, where probing for directories is done.
Incidentally, I discovered in the rsync man page that there's an additional timeout setting:
--contimeout=SECONDS set daemon connection timeout in seconds
...exactly what we need.
It was new in rsync 3.0.0.
Thus, we can use it, except on platforms that still have rsync 2.6. Debian 5.0 is such a case.
So it could be worthwhile to detect the rsync version in the beginning of a scan. The detection can be used
to also make sure that rsync is installed at all, and give a meaningful error message if it isn't.
% python -c "import commands; status, output = commands.getstatusoutput('rsync --version'); print status;
ver = output.splitlines()[0].split()[2]; print ver"
0
3.0.4
Something along these lines needs to be added to mb/testmirror.py.
I committed a fix to trunk (r7953).
http://svn.mirrorbrain.org/viewvc/mirrorbrain?view=revision&revision=7953
History
Date User Action Args
2010-02-10 04:41:58 poeml set status: testing -> resolved
2010-02-10 04:31:22 poeml set status: in-progress -> testing
messages: + msg129
2009-12-11 21:16:58 poeml set status: chatting -> in-progress
messages: + msg105
2009-12-01 20:55:44 poeml set keyword: + scanner
2009-10-08 11:50:55 poeml set status: unread -> chatting
messages: + msg32
2009-10-08 11:50:06 poeml create
no content (issue created during migration from old issue tracker, as placeholder)
no content (issue created during migration from old issue tracker, as placeholder)
[ ]
Title Torrents should contain a nodes key
Priority wish Status resolved
Superseder Nosy List poeml
Assigned To poeml Keywords
Torrents should contain a nodes key.
Harold: uTorrent uses the nodes key for backup DHT bootstrapping information
incase other "more primary" information fails
It sould be made configurable with different servers.
There is documentation at http://www.bittorrent.org/beps/bep_0005.html that
suggests that nodes keys should be added only for trackerless torrents, but Harold
said that information is outdated since the core change (BitTorrent.com switched
to the uTorrent core).
Note: remove source code comment in mod_mirrorbrain.c about this.
Fixed in trunk (r8068).
http://svn.mirrorbrain.org/viewvc/mirrorbrain?view=revision&revision=8068
I used http://opensource.depthstrike.com/torrent.php/MPlayer-
1.0rc2.tar.bz2.torrent as (hopefully valid) example.
To configure the nodes, a MirrorBrainDHTNode config directive was added, which
takes two arguments (hostname and port).
History
Date User Action Args
2010-05-29 12:07:18 poeml set status: in-progress -> resolved
messages: + msg194
priority: feature -> wish
status: unread -> in-progress
2010-03-31 19:46:27 poeml set messages: + msg172
nosy: + poeml
assignedto: poeml
2010-03-31 19:44:12 poeml create
[ ]
Title include hashes into mirror lists
Priority wish Status resolved
Superseder Nosy List poeml
Assigned To poeml Keywords
It would be really cool to include file hashes into mirror lists.
This requires a hash cache redesign -- which is tracked in issue 40.
Other useful details to be included in the mirror lists are file size and
modification time.
Fixed in trunk (r7984).
History
Date User Action Args
2010-03-12 00:01:17 poeml set status: done-cbb -> resolved
2010-03-11 23:59:03 poeml set status: deferred -> done-cbb
messages: + msg151
2010-03-10 15:11:44 poeml set messages: + msg149
2010-03-08 20:49:02 poeml create
[ ]
Title make MirrorBrain a "hash server"
Priority wish Status resolved
Superseder Nosy List poeml
Assigned To poeml Keywords
MirrorBrain could become a "hash server": Whenever a suffix like ".md5" or
"sha256" is appended to an URL, MirrorBrain could return the corresponding hash.
Would be really cool.
Easy to implement, once issue 40 is done.
Serving the hashes by appending the corresponding file suffix, like .md5, could
shadow existing .md5 files; likewise for sha1 files. But I think it would still be
worthwhile and it wouldn't matter at all.
Implemented in trunk, r7991:
curl -s http://192.168.0.115/extended/3.1.1rc2/OOo_3.1.1rc2_20090820_Win32Intel_langpack_sw-TZ.exe.sha256
b8ac00bedff1e85fae9b3ca1af8ed616d1d177010156a97d1275d8b6035959d6 OOo_3.1.1rc2_20090820_Win32Intel_langpack_sw-TZ.exe
curl -s http://192.168.0.115/extended/3.1.1rc2/OOo_3.1.1rc2_20090820_Win32Intel_langpack_sw-TZ.exe.sha1
0e1aefc1df0ba4c147fb36d3c62d000441e6b945 OOo_3.1.1rc2_20090820_Win32Intel_langpack_sw-TZ.exe
curl -s http://192.168.0.115/extended/3.1.1rc2/OOo_3.1.1rc2_20090820_Win32Intel_langpack_sw-TZ.exe.md5
c7feb5365c2e9ce205ba212c6fa19aa0 OOo_3.1.1rc2_20090820_Win32Intel_langpack_sw-TZ.exe
History
Date User Action Args
2010-03-12 16:28:48 poeml set status: deferred -> resolved
messages: + msg161
2010-03-08 21:21:56 poeml set messages: + msg142
2010-03-08 20:51:24 poeml create
[ ]
Title mod_mirrorbrain victim of incompatibility between apr-util 1.2 and
1.3
Priority urgent Status resolved
Superseder Nosy List dfarning, poeml
Assigned To poeml Keywords
the next issues....
I am getting the following error report when trying to download
Ah. Interesting one.
[Tue Oct 06 11:53:46 2009] [warn] [client 96.42.62.145] [mod_mirrorbrain] MirrorBrainEngine On,
mirror_base '/var/www/downloads/', referer: http://mirrorbrain-
testing.sugarlabs.org/sources/honey/InfoSlicer/
[Tue Oct 06 11:53:46 2009] [warn] [client 96.42.62.145] [mod_mirrorbrain] URI:
'/sources/honey/InfoSlicer/InfoSlicer-4.tar.bz2', referer: http://mirrorbrain-
testing.sugarlabs.org/sources/honey/InfoSlicer/
[Tue Oct 06 11:53:46 2009] [warn] [client 96.42.62.145] [mod_mirrorbrain] filename:
'/var/www/downloads/sources/honey/InfoSlicer/InfoSlicer-4.tar.bz2', referer: http://mirrorbrain-
testing.sugarlabs.org/sources/honey/InfoSlicer/
[Tue Oct 06 11:53:46 2009] [error] [client 96.42.62.145] [mod_mirrorbrain] could not resolve continent,
referer: http://mirrorbrain-testing.sugarlabs.org/sources/honey/InfoSlicer/
[Tue Oct 06 11:53:46 2009] [warn] [client 96.42.62.145] [mod_mirrorbrain] Country 'US', Continent '--',
referer: http://mirrorbrain-testing.sugarlabs.org/sources/honey/InfoSlicer/
[Tue Oct 06 11:53:46 2009] [warn] [client 96.42.62.145] [mod_mirrorbrain] AS '--', Prefix '--', referer:
http://mirrorbrain-testing.sugarlabs.org/sources/honey/InfoSlicer/
[Tue Oct 06 11:53:46 2009] [warn] [client 96.42.62.145] [mod_mirrorbrain] Successfully acquired database
connection., referer: http://mirrorbrain-testing.sugarlabs.org/sources/honey/InfoSlicer/
[Tue Oct 06 11:53:46 2009] [warn] [client 96.42.62.145] [mod_mirrorbrain] SQL lookup for (canonicalized)
'sources/honey/InfoSlicer/InfoSlicer-4.tar.bz2', referer: http://mirrorbrain-
testing.sugarlabs.org/sources/honey/InfoSlicer/
[Tue Oct 06 11:53:46 2009] [warn] [client 96.42.62.145] [mod_mirrorbrain] Found 2 mirrors, referer:
http://mirrorbrain-testing.sugarlabs.org/sources/honey/InfoSlicer/
Apache successcully connects and talks to the database, and it does find the file on two
mirrors. That's great, and everything should work but...
[Tue Oct 06 11:53:46 2009] [error] client 96.42.62.145Unknown error 18446744073709551615:
[mod_mirrorbrain] Error looking up sources/honey/InfoSlicer/InfoSlicer-4.tar.bz2 in database, referer:
http://mirrorbrain-testing.sugarlabs.org/sources/honey/InfoSlicer/
...something else seems wrong, and I'm quite puzzled by this one.
In fact, after reading mod_mirrorbrain.c back and forth I'm quite sure
that this means that reading result lines from the database query fails.
The module correctly gets (and logs) the number of rows in the result,
but iterating over the result set fails (apr_dbd_get_row() call).
I suspect that the Apache Portable Runtime version of Ubuntu 9.04, and
the shipped PostgreSQL database adapter are a little bit too old (or
missing a bug fix) so it results in different behaviour.
Yes, this is probably the reason. On my Ubuntu 9.04 system I see a
libaprutil1 package with version 1.2.12, and in the ChangeLog I see:
That would explain the error.
I can probably come up with a workaround. Sorry about the inconvenience...
The following patch seemingly makes mod_mirrorbrain work with the old libaprutil:
--- mod_mirrorbrain.c (revision 7798)
+++ mod_mirrorbrain.c (working copy)
@@ -1127,7 +1127,7 @@
const char val = NULL;
short col = 0; / incremented for the column we are reading out */
rv = apr_dbd_get_row(dbd->driver, r->pool, res, &row, i);
rv = apr_dbd_get_row(dbd->driver, r->pool, res, &row, i-1);
if (rv != 0) {
ap_log_rerror(APLOG_MARK, APLOG_ERR, rv, r,
"[mod_mirrorbrain] Error looking up %s in database", filename);
I think we can quite easily change mod_mirrorbrain to work with both APR versions,
either by detecting the APR version at build time, or by adding the patch to the Ubuntu
packages.
I don't know whether this fix works well, I can't really promise that at the
moment, without looking into the database adapter (and its changelog)
again.
I'm adding the proposed fix to the Ubuntu/Debian packages.
I have not learnt yet how to properly add the change as patch when packaging Debian packages in the openSUSE build
service, but a simple & stupid
sed -i 's/&row, i);$$/\&row, i-1);/' mod_mirrorbrain.c
in the debian/rules file has the desired effect at the moment. Not nice, but for pragmatical reasons (also because
I'll be away for vacation for a while) I'm fixing it this way now.
The packages with the fix are scheduled to be built in the openSUSE build service. They might
turn up later today, but as the build service is extremely slow currently, I have put manually
built packages here:
http://www.poeml.de/~poeml/DEBS/
After sugarlabs is running with the fix for a month already, I think we can close
this as fixed.
And since it is easy to do compile time detection of the APR version, I'll commit
a fix that does just that, so we can drop the patch from the Debian and Ubuntu
packages.
Fixed in trunk:
http://svn.mirrorbrain.org/viewvc/mirrorbrain/trunk/mod_mirrorbrain/mod_mirrorbrai
n.c?r1=7887&r2=7886&pathrev=7887
Will be in the next release.
See also issue 29.
The check could be done at runtime, rather than at compile time.
That would protect the user in scenarios where a wrong package is installed (one
that is compiled against a different APR version that what the system has).
There is apr_version() which returns the runtime APR version. See apr_version.h.
Apache uses it:
Server loaded: APR 1.4.2, APR-Util 1.3.9
Compiled using: APR 1.4.2, APR-Util 1.3.9
It would also allow us to log the versions to the error log at Apache's start.
mod_asn got a fix for this now (r86).
http://svn.mirrorbrain.org/viewvc/mod_asn?view=revision&revision=86
mod_mirrorbrain now has the same fix (r8107).
http://svn.mirrorbrain.org/viewvc/mirrorbrain?view=revision&revision=8107
There is one remaining compile-time if-branch, which is not about database access
though. It takes care of choosing %/d%lld as format string for (potentially) large
integers in the SQL query (the one that retrieves hashes). apr-1.2 knew only %d,
while apr-1.3 knows %lld for 64-bit integers.
I'm leaving that as is, since the consequences are much less relevant (if it
results in breakage at all). Hence, I close this issue as resolved.
History
Date User Action Args
2010-09-05 23:36:13 poeml set status: chatting -> resolved
messages: + msg218
2010-09-05 20:10:07 poeml set messages: + msg217
2010-09-05 19:01:23 poeml set messages: + msg215
title: mod_mirrorbrain incompatible with
2010-09-03 12:36:16 poeml set libaprutil 1.2 -> mod_mirrorbrain victim of
incompatibility between apr-util 1.2 and 1.3
2010-09-03 12:13:01 poeml set status: resolved -> chatting
messages: + msg212
2009-12-03 23:05:36 poeml set status: chatting -> resolved
2009-12-03 23:05:30 poeml set status: resolved -> chatting
messages: + msg88
2009-12-01 21:34:13 poeml set status: chatting -> resolved
2009-12-01 21:32:28 poeml set status: resolved -> chatting
messages: + msg77
status: testing -> resolved
2009-12-01 21:24:40 poeml set nosy: + dfarning
messages: + msg76
2009-10-08 10:59:50 poeml set messages: + msg28
2009-10-08 07:52:36 poeml set status: in-progress -> testing
messages: + msg22
2009-10-06 23:19:52 poeml set messages: + msg19
2009-10-06 22:28:51 poeml create
[ ]
Title Tiger hash for torrents
Priority wish Status chatting
Superseder Nosy List poeml
Assigned To poeml Keywords torrents
A Tiger (top) hash is actively used in Torrents, and we could add it.
See
http://wiki.depthstrike.com/index.php/P2P:Protocol:Specifications:Optional_Hashes
We would need to use the reference implementation from
http://www.cs.technion.ac.il/~biham/Reports/Tiger/ and it looks simple enough that
it would be possible to cook up native Python bindings for it.
http://www.cs.technion.ac.il/~biham/Reports/Tiger/tiger-src.tar.gz
Or maybe there is already another implementation that could be used.
Priority of this feature is not very high, though.
History
Date User Action Args
2014-01-03 00:01:39 poeml set title: Summary Command #6123 -> Tiger hash for torrents
2014-01-03 00:01:25 poeml set files: - wc31.html
2014-01-03 00:01:23 poeml set files: - test13.html
2014-01-03 00:01:21 poeml set files: - rk34.html
2014-01-03 00:01:18 poeml set files: - ll39.html
2014-01-03 00:01:15 poeml set files: - kc34.html
2014-01-03 00:01:09 poeml set files: - ihi32.html
2014-01-03 00:01:06 poeml set files: - gc31.html
2014-01-03 00:01:03 poeml set files: - g-c39.html
2014-01-03 00:01:01 poeml set files: - cmr33.html
2014-01-03 00:00:59 poeml set files: - 998012514.html
2014-01-03 00:00:57 poeml set files: - 998-10316-24510.html
2014-01-03 00:00:55 poeml set files: - 957815518.html
2014-01-03 00:00:53 poeml set files: - 9135-5273-32555.html
2014-01-03 00:00:51 poeml set files: - 897469956.html
2014-01-03 00:00:49 poeml set files: - 855331029.html
2014-01-03 00:00:47 poeml set files: - 839771880.html
2014-01-03 00:00:45 poeml set files: - 7885-15286-18961.html
2014-01-03 00:00:42 poeml set files: - 808610981.html
2014-01-03 00:00:38 poeml set files: - 783251696.html
2014-01-03 00:00:35 poeml set files: - 727193920.html
2014-01-03 00:00:33 poeml set files: - 683443011.html
2014-01-03 00:00:31 poeml set files: - 6733-5689-14107.html
2014-01-03 00:00:27 poeml set files: - 655595725.html
2014-01-03 00:00:21 poeml set files: - 5873-20429-24995.html
2014-01-03 00:00:19 poeml set files: - 575185326.html
2014-01-03 00:00:16 poeml set files: - 5451-15709-6907.html
2014-01-03 00:00:12 poeml set files: - 506424319.html
2014-01-03 00:00:10 poeml set files: - 453205451.html
2014-01-03 00:00:07 poeml set files: - 452968870.html
2014-01-03 00:00:04 poeml set files: - 444542170.html
2014-01-03 00:00:02 poeml set files: - 416444568.html
2014-01-03 00:00:00 poeml set files: - 414094241.html
2014-01-02 23:59:58 poeml set files: - 32578-27743-26456.html
2014-01-02 23:59:56 poeml set files: - 32147-11853-3993.html
2014-01-02 23:59:53 poeml set files: - 32045-19479-12469.html
2014-01-02 23:59:51 poeml set files: - 298196731.html
2014-01-02 23:59:49 poeml set files: - 29344-8096-4793.html
2014-01-02 23:59:47 poeml set files: - 291638095.html
2014-01-02 23:59:44 poeml set files: - 26608-5325-6549.html
2014-01-02 23:59:40 poeml set files: - 25510-19616-18060.html
2014-01-02 23:59:38 poeml set files: - 25357-17571-31533.html
2014-01-02 23:59:36 poeml set files: - 223575510.html
2014-01-02 23:59:34 poeml set files: - 22346-5370-28480.html
2014-01-02 23:59:32 poeml set files: - 21781-1413-28807.html
2014-01-02 23:59:30 poeml set files: - 20222-26555-16039.html
2014-01-02 23:59:27 poeml set files: - 14543-21643-29780.html
2014-01-02 23:59:25 poeml set files: - 14021-30714-10528.html
2014-01-02 23:59:20 poeml set files: -
13869598845f9d76e37d5824e08b0b6eaabde159b89486048572.html
2014-01-02 23:59:16 poeml set files: -
1386879328e7dca28a0976594090d7dbf3d5b25ca93139613306.html
2014-01-02 23:59:13 poeml set files: -
13868784716389c092c368a367e894f1e131fd07f89247722895.html
2014-01-02 23:59:10 poeml set files: -
1386878164ef98fa6a69ee53ca45c552aae83d54e120999912514.html
2014-01-02 23:59:08 poeml set files: -
1386877205b1d2ddc58a8b76a98176b8c55fe4996790818630211.html
2014-01-02 23:59:05 poeml set files: -
1386875647bb02731327aacf2324a15400f4770c1b3807135923.html
2014-01-02 23:59:03 poeml set files: -
138687540358a0681f4e81d397c77419a34366a8e74960536218.html
2014-01-02 23:59:01 poeml set files: -
138687375353f6cdf6f027e6c7ad3b2fcd7fe6b25d5352536263.html
2014-01-02 23:58:59 poeml set files: -
13868735195d2e5bfb22c63db16b2dc91a3d87ad232606498340.html
2014-01-02 23:58:55 poeml set files: -
138687056446491bb535f612f56f43a6f26efcbc783616418747.html
2014-01-02 23:58:49 poeml set files: -
138687056446491bb535f612f56f43a6f26efcbc783616418746.html
2014-01-02 23:58:47 poeml set files: -
13867033626d679a0676c219359542b45aed998ad96872477218.html
2014-01-02 23:58:45 poeml set files: -
13867033626d679a0676c219359542b45aed998ad96872477219.html
2014-01-02 23:58:43 poeml set files: -
13867033626d679a0676c219359542b45aed998ad96872477217.html
2014-01-02 23:58:41 poeml set files: -
13867033626d679a0676c219359542b45aed998ad96872477216.html
2014-01-02 23:58:39 poeml set files: -
13867033626d679a0676c219359542b45aed998ad96872477215.html
2014-01-02 23:58:37 poeml set files: -
13867033626d679a0676c219359542b45aed998ad968724772114.html
2014-01-02 23:58:34 poeml set files: -
13867033626d679a0676c219359542b45aed998ad968724772113.html
2014-01-02 23:58:30 poeml set files: -
13867033626d679a0676c219359542b45aed998ad968724772112.html
2014-01-02 23:58:18 poeml set files: -
13867033626d679a0676c219359542b45aed998ad968724772111.html
2014-01-02 23:58:16 poeml set files: -
13867033626d679a0676c219359542b45aed998ad968724772110.html
2014-01-02 23:58:13 poeml set files: -
13866134844d3925faa098d575925f4d0258bedce29500906624.html
2014-01-02 23:49:59 poeml set files: -
13866134844d3925faa098d575925f4d0258bedce29500906623.html
2014-01-02 23:49:56 poeml set files: -
13866134844d3925faa098d575925f4d0258bedce29500906622.html
2014-01-02 23:49:52 poeml set files: -
13866134844d3925faa098d575925f4d0258bedce29500906621.html
2014-01-02 23:49:49 poeml set files: -
13866134844d3925faa098d575925f4d0258bedce29500906620.html
2014-01-02 23:49:46 poeml set files: -
13866125768fe1e7d23a11931442822ebaf1090d4793212269514.html
2014-01-02 23:49:43 poeml set files: -
13866125768fe1e7d23a11931442822ebaf1090d4793212269513.html
2014-01-02 23:49:41 poeml set files: -
13866125768fe1e7d23a11931442822ebaf1090d4793212269512.html
2014-01-02 23:49:38 poeml set files: -
13866125768fe1e7d23a11931442822ebaf1090d4793212269511.html
2014-01-02 23:49:36 poeml set files: -
13866125768fe1e7d23a11931442822ebaf1090d4793212269510.html
2014-01-02 23:49:32 poeml set files: - 13693-19664-9535.html
2014-01-02 23:48:44 poeml set files: - 1028-18503-18390.html
2014-01-02 23:48:38 poeml set files: - wl35.html
2013-12-27 21:56:43 funnycafeteria6 set files: + 20222-26555-16039.html
title: Summary Command #9412 -> Summary Command #6123
2013-12-27 21:56:12 funnycafeteria6 set files: + 9135-5273-32555.html
title: Summary Command #2726 -> Summary Command #9412
2013-12-27 21:55:00 funnycafeteria6 set files: + 1028-18503-18390.html
title: Summary Command #3545 -> Summary Command #2726
2013-12-27 21:54:28 funnycafeteria6 set files: + 25510-19616-18060.html
title: Summary Command #116 -> Summary Command #3545
2013-12-27 21:53:58 funnycafeteria6 set files: + 14543-21643-29780.html
title: Summary Command #105 -> Summary Command #116
2013-12-27 21:53:27 funnycafeteria6 set files: + 13693-19664-9535.html
title: Summary Command #1607 -> Summary Command #105
2013-12-27 21:52:56 funnycafeteria6 set files: + 998-10316-24510.html
title: Summary Command #6237 -> Summary Command #1607
2013-12-27 21:52:27 funnycafeteria6 set files: + 22346-5370-28480.html
title: Summary Command #4962 -> Summary Command #6237
2013-12-27 21:51:57 funnycafeteria6 set files: + 26608-5325-6549.html
title: Summary Command #6729 -> Summary Command #4962
2013-12-27 21:51:27 funnycafeteria6 set files: + 6733-5689-14107.html
title: Summary Command #7473 -> Summary Command #6729
2013-12-17 21:28:25 funnycafeteria6 set files: + 21781-1413-28807.html
title: Summary Command #1410 -> Summary Command #7473
2013-12-17 21:27:57 funnycafeteria6 set files: + 32578-27743-26456.html
title: Summary Command #2540 -> Summary Command #1410
2013-12-17 21:27:30 funnycafeteria6 set files: + 7885-15286-18961.html
title: Summary Command #5474 -> Summary Command #2540
2013-12-17 21:27:03 funnycafeteria6 set files: + 32147-11853-3993.html
title: Summary Command #5255 -> Summary Command #5474
2013-12-17 21:26:36 funnycafeteria6 set files: + 5873-20429-24995.html
title: Summary Command #3098 -> Summary Command #5255
2013-12-17 21:26:08 funnycafeteria6 set files: + 25357-17571-31533.html
title: Summary Command #6417 -> Summary Command #3098
2013-12-17 21:25:41 funnycafeteria6 set files: + 32045-19479-12469.html
title: Summary Command #193 -> Summary Command #6417
2013-12-17 21:25:15 funnycafeteria6 set files: + 14021-30714-10528.html
title: Summary Command #7940 -> Summary Command #193
2013-12-17 21:24:48 funnycafeteria6 set files: + 5451-15709-6907.html
title: Summary Command #7475 -> Summary Command #7940
2013-12-17 21:24:20 funnycafeteria6 set files: + 29344-8096-4793.html
title: Summary Command #2549 -> Summary Command #7475
2013-12-16 18:07:28 funnycafeteria6 set files: + 998012514.html
title: Summary Command #8751 -> Summary Command #2549
2013-12-16 18:06:31 funnycafeteria6 set files: + 783251696.html
title: Summary Command #6122 -> Summary Command #8751
2013-12-16 18:05:33 funnycafeteria6 set files: + 506424319.html
title: Summary Command #9476 -> Summary Command #6122
2013-12-16 18:04:32 funnycafeteria6 set files: + 453205451.html
title: Summary Command #6841 -> Summary Command #9476
2013-12-16 18:03:29 funnycafeteria6 set files: + 839771880.html
title: Summary Command #6645 -> Summary Command #6841
2013-12-16 18:02:28 funnycafeteria6 set files: + 444542170.html
title: Summary Command #4879 -> Summary Command #6645
2013-12-16 18:01:27 funnycafeteria6 set files: + 655595725.html
title: Summary Command #9825 -> Summary Command #4879
2013-12-16 18:00:30 funnycafeteria6 set files: + 855331029.html
title: Summary Command #8021 -> Summary Command #9825
2013-12-16 17:59:32 funnycafeteria6 set files: + 298196731.html
title: Summary Command #7460 -> Summary Command #8021
2013-12-16 17:58:34 funnycafeteria6 set files: + 291638095.html
title: Summary Command #5850 -> Summary Command #7460
2013-12-16 17:57:34 funnycafeteria6 set files: + 727193920.html
title: Summary Command #1256 -> Summary Command #5850
2013-12-16 17:55:58 funnycafeteria6 set files: + 452968870.html
title: Summary Command #8959 -> Summary Command #1256
2013-12-16 17:54:59 funnycafeteria6 set files: + 683443011.html
title: Summary Command #4498 -> Summary Command #8959
2013-12-16 17:54:00 funnycafeteria6 set files: + 808610981.html
title: Summary Command #7153 -> Summary Command #4498
2013-12-16 17:53:02 funnycafeteria6 set files: + 223575510.html
title: Summary Command #324 -> Summary Command #7153
2013-12-16 17:52:03 funnycafeteria6 set files: + 957815518.html
title: Summary Command #214 -> Summary Command #324
2013-12-16 17:51:03 funnycafeteria6 set files: + 416444568.html
title: Summary Command #5708 -> Summary Command #214
2013-12-16 17:50:03 funnycafeteria6 set files: + 897469956.html
title: Summary Command #3896 -> Summary Command #5708
2013-12-16 17:49:03 funnycafeteria6 set files: + 414094241.html
title: Summary Command #5112 -> Summary Command #3896
2013-12-16 17:48:04 funnycafeteria6 set files: + 575185326.html
title: Service Command #749 -> Summary Command #5112
2013-12-16 17:31:42 funnycafeteria6 set files: +
13869598845f9d76e37d5824e08b0b6eaabde159b89486048572.html
files: +
2013-12-13 01:12:29 funnycafeteria6 set 1386878164ef98fa6a69ee53ca45c552aae83d54e120999912514.html
title: Service Command #143 -> Service Command #749
files: +
2013-12-13 01:11:16 funnycafeteria6 set 1386875647bb02731327aacf2324a15400f4770c1b3807135923.html
title: Service Command #308 -> Service Command #143
files: +
2013-12-13 01:10:21 funnycafeteria6 set 138687056446491bb535f612f56f43a6f26efcbc783616418746.html
title: Service Command #695 -> Service Command #308
files: +
2013-12-13 01:09:27 funnycafeteria6 set 13868735195d2e5bfb22c63db16b2dc91a3d87ad232606498340.html
title: Service Command #529 -> Service Command #695
files: +
2013-12-13 01:08:19 funnycafeteria6 set 138687056446491bb535f612f56f43a6f26efcbc783616418747.html
title: Service Command #602 -> Service Command #529
files: +
2013-12-13 01:07:40 funnycafeteria6 set 1386879328e7dca28a0976594090d7dbf3d5b25ca93139613306.html
title: Service Command #233 -> Service Command #602
files: +
2013-12-13 01:06:36 funnycafeteria6 set 138687540358a0681f4e81d397c77419a34366a8e74960536218.html
title: Service Command #952 -> Service Command #233
files: +
2013-12-13 01:05:42 funnycafeteria6 set 1386877205b1d2ddc58a8b76a98176b8c55fe4996790818630211.html
title: Service Command #699 -> Service Command #952
files: +
2013-12-13 01:04:19 funnycafeteria6 set 138687375353f6cdf6f027e6c7ad3b2fcd7fe6b25d5352536263.html
title: Service Command #970 -> Service Command #699
files: +
2013-12-13 01:02:57 funnycafeteria6 set 13868784716389c092c368a367e894f1e131fd07f89247722895.html
title: Service Command #75 -> Service Command #970
files: + test13.html
2013-12-12 06:55:45 funnycafeteria6 set title: Torrents could contain a Tiger top hash -> Service
Command #75
2013-12-11 19:38:24 funnycafeteria6 set files: +
13867033626d679a0676c219359542b45aed998ad968724772114.html
2013-12-11 19:38:19 funnycafeteria6 set files: +
13867033626d679a0676c219359542b45aed998ad968724772113.html
2013-12-11 19:38:15 funnycafeteria6 set files: +
13867033626d679a0676c219359542b45aed998ad968724772112.html
2013-12-11 19:38:11 funnycafeteria6 set files: +
13867033626d679a0676c219359542b45aed998ad968724772111.html
2013-12-11 19:38:06 funnycafeteria6 set files: +
13867033626d679a0676c219359542b45aed998ad968724772110.html
2013-12-11 19:37:16 funnycafeteria6 set files: +
13867033626d679a0676c219359542b45aed998ad96872477219.html
2013-12-11 19:37:10 funnycafeteria6 set files: +
13867033626d679a0676c219359542b45aed998ad96872477218.html
2013-12-11 19:37:06 funnycafeteria6 set files: +
13867033626d679a0676c219359542b45aed998ad96872477217.html
2013-12-11 19:37:01 funnycafeteria6 set files: +
13867033626d679a0676c219359542b45aed998ad96872477216.html
2013-12-11 19:36:52 funnycafeteria6 set files: +
13867033626d679a0676c219359542b45aed998ad96872477215.html
2013-12-09 21:29:47 funnycafeteria6 set files: +
13866134844d3925faa098d575925f4d0258bedce29500906624.html
2013-12-09 21:29:43 funnycafeteria6 set files: +
13866134844d3925faa098d575925f4d0258bedce29500906623.html
2013-12-09 21:29:38 funnycafeteria6 set files: +
13866134844d3925faa098d575925f4d0258bedce29500906622.html
2013-12-09 21:29:35 funnycafeteria6 set files: +
13866134844d3925faa098d575925f4d0258bedce29500906621.html
2013-12-09 21:29:31 funnycafeteria6 set files: +
13866134844d3925faa098d575925f4d0258bedce29500906620.html
2013-12-09 21:28:50 funnycafeteria6 set files: +
13866125768fe1e7d23a11931442822ebaf1090d4793212269514.html
2013-12-09 21:28:44 funnycafeteria6 set files: +
13866125768fe1e7d23a11931442822ebaf1090d4793212269513.html
2013-12-09 21:28:40 funnycafeteria6 set files: +
13866125768fe1e7d23a11931442822ebaf1090d4793212269512.html
2013-12-09 21:28:35 funnycafeteria6 set files: +
13866125768fe1e7d23a11931442822ebaf1090d4793212269511.html
2013-12-09 21:28:30 funnycafeteria6 set files: +
13866125768fe1e7d23a11931442822ebaf1090d4793212269510.html
2013-12-06 17:28:38 funnycafeteria6 set files: + wl35.html
2013-12-04 18:49:12 funnycafeteria6 set files: + wc31.html
2013-12-04 18:49:06 funnycafeteria6 set files: - wc31.html
2013-11-26 19:01:21 funnycafeteria6 set files: + cmr33.html
2013-11-25 14:42:17 funnycafeteria6 set files: + g-c39.html
2013-11-19 20:04:01 funnycafeteria6 set files: + ll39.html
2013-11-13 19:59:33 funnycafeteria6 set files: + kc34.html
2013-11-08 20:36:44 funnycafeteria6 set files: + rk34.html
2013-11-01 17:34:47 funnycafeteria6 set files: + gc31.html
2013-10-31 17:51:42 funnycafeteria6 set files: + wc31.html
2013-10-30 13:03:10 funnycafeteria6 set files: + ihi32.html
2010-09-19 01:03:53 poeml set keyword: + torrents
2010-03-31 20:14:29 poeml create
[ ]
Title geographic distance ordering for finer mirror selection
Priority wish Status resolved
Superseder Nosy List poeml, theuni
Assigned To poeml Keywords
In some countries (US, Germany) there can be a wealth of mirrors. A specifically
suited mirror can be picked when there's on in the same AS or in the same
network as the client. But otherwise, any mirror from the country will be
picked, which could be from the different end of the country (East coast, West
coast in the us; north/south in Germany).
Another scenario is that no mirror is found in the client's country, but there
are several in its continent. However, which one to choose? Currently, it is a
random choice, which means that, for instance, any European country could be
sent to any European mirror; in most cases, it would probably be better to use a
neighbouring country or the closest one that can be found.
Here's how a finer mirror selection could be achieved in both these cases.
The GeoLite city(!) database provides geographical coordinates. The coordinates
of the mirrors could be filled into the database's mirror records when mirrors
are created, or later, similar as their network and AS data.
When clients' IP addresses are looked up via GeoIP during request processing,
their geographical coordinates would (should) be available as well.
Using a planar approximation of the Haversine formula, the distance to available
mirrors can be calculated, and mirrors sorted by their closeness to the client.
A simple approximation, that is not only easy to implement but also should incur
very low possible overhead, would be described by the following formular:
distance = sqrt((x2 - x1)^2 + (y2 - y1)^2)
Thus, the available mirrors can be ordered according to their distance to the
client.
References:
http://www.movable-type.co.uk/scripts/latlong.html (Haversine formula)
http://www.movable-type.co.uk/scripts/gis-faq-5.1.html (simple approximation)
(The simple approximation should be suitable insofar the 180ยฐ border is pretty
much between Alaska and Russia, and other than islands in the Pacific Ocean all
countries should be usefully covered by it. About the Pacific Ocean I don't know
much, but it is very likely that those clients resolve to satellite links
anyway, and need to be treated specially.)
There is one problem, though: We use a weighted randomization to assign mirrors
that would be equally suitable. That is useful (and used) for load balancing.
With introducing ordering by geographic distance, the question arises how to use
that together with mirror priorities. Alternatives / thoughts:
This needs to be discussed.
I just discovered that the database scheme already contains fields for longitude
and latitude -- I totally had forgotten that. They are just not shown by 'mb',
because they are not used. That's cool because no schema migration is needed...
Geographical coordinates are meanwhile stored in the database for mirrors.
(Was implemented around r7936-r7939).
I haven't received any comments on this, but I'd still regard geographical
distance as a very useful addition to the current mirror selection algorithm.
Something that I didn't realize earlier is that the city version of the GeoIP database
also contains "regions" on a level below country. Here are some examples:
Continent: NA
Country: US
Region id: CA
Region: California
City: Redwood City
Latitude: 37.491402
Longitude: -122.210999
It works not only for US states, but also for German Bundeslรคnder and French
Departements:
Continent: EU
Country: DE
Region id: 02
Region: Bayern
City: Gunzenhausen
Latitude: 49.099998
Longitude: 10.750000
Continent: EU
Country: FR
Region id: A8
Region: Ile-de-France
City: Paris
Latitude: 48.866699
Longitude: 2.333300
So another approach would be to first introduce this additional level in mirror
selection. While this is a geographical approach, it may not give optimal results e.g.
in Germany, where there is at least one huge autonomous system (680) which comprises
most universities, but geographically spans whole Germany.
Another aspect is that when geographical mirror selection becomes too fine-grained,
then mirror priorization becomes less effective: while it is currently possible to
assign only a few requests to certain mirrors, they would then mandatorily get all
traffic from their "region".
But in the US, adding the GeoIP region (state level) would likely be beneficial.
Two arguments for geographical distance:
Mirror selection by geographical distance could make the "region" level in the
US superfluous, achieve the same, and it would work also for locations where there
is no mirror in the exact same state. By geographical distance, it would be easy to
pick a mirror in a neighbour state, instead of falling back to country level.
In addition, we could artificially "increase" distances of mirrors with low
priority, to attract more requests to farther, but more powerful mirrors -- and
thereby keeping excessive traffic away from small mirrors.
On the other hand, when we introduce "state" level in mirror selection, we would
also add a state_only flag that restricts mirror use to that state. Maybe also
other_states to add more states to deal with.
Also, an AS match would always take precedence, which helps if a mirror is in an AS
that spans many states.
Experimenting with geographical distance is promising. Here's the findings for a client in Belgium, with no
mirror in the country:
[Fri Country 'BE', Continent 'EU'
[Fri Nov 05 14:49:42 2010] [warn] [client 192.168.0.117] [mod_mirrorbrain] AS '5400', Prefix '82.138.160.0/19',
lat/lng 50.833302,4.333300 state id 11, state 'Brussels Hoofdstedelijk Gewest'
same region: ftp.astral.ro (score 100) (rank 6777729) (dist 19.69)
same region: labby.co.uk (score 80) (rank 8786956) (dist 7.08)
same region: ftp-stud.hs-esslingen.de (score 100) (rank 6322872) (dist 5.27)
same region: ftp.solnet.ch (score 100) (rank 8061858) (dist 4.86)
same region: ftp.cc.uoc.gr (score 100) (rank 8365641) (dist 25.94)
same region: neacm.fe.up.pt (score 100) (rank 502272) (dist 16.17)
same region: ftp.tu-chemnitz.de (score 100) (rank 2606190) (dist 8.58)
same region: ftp.free.fr (score 100) (rank 671658) (dist 2.80)
same region: ftp.fernuni-hagen.de (score 100) (rank 10110186) (dist 3.18)
same region: openoffice.dcc.fc.up.pt (score 100) (rank 3443310) (dist 16.17)
same region: mirror.switch.ch (score 100) (rank 2029689) (dist 5.46)
same region: ftp.uni-erlangen.de (score 100) (rank 3410610) (dist 6.79)
same region: ftp.udc.es (score 100) (rank 3767694) (dist 14.25)
same region: ooo.mirror.dkm.cz (score 100) (rank 2693499) (dist 10.16)
same region: openoffice.cict.fr (score 100) (rank 10623903) (dist 7.79)
same region: ftp.heanet.ie (score 50) (rank 19739735) (dist 10.87)
same region: ftp.snt.utwente.nl (score 100) (rank 8726322) (dist 2.92)
same region: ftp.proxad.net (score 100) (rank 8012808) (dist 2.80)
same region: ftp.sunet.se (score 80) (rank 12016420) (dist 16.07)
same region: ooo.mirror.garr.it (score 100) (rank 5669853) (dist 12.09)
[...]
By geographical distance, a mirror in France or Netherlands would have been chosen, which sounds better than e.g.
one in Portugal.
Example of a client in Germany:
Country 'DE', Continent 'EU'
AS '3320', Prefix '84.128.0.0/10', lat/lng 50.666698,12.616700 state id 13, state 'Sachsen'
same country: ftp-stud.hs-esslingen.de (score 100) (rank 4746078) (dist 3.92)
same country: ftp.tu-chemnitz.de (score 100) (rank 951243) (dist 0.34)
same country: ftp.fernuni-hagen.de (score 100) (rank 2549292) (dist 5.19)
same country: ftp.uni-erlangen.de (score 100) (rank 1006179) (dist 1.94)
[...]
Here, the mirror in Chemnitz (right around the corner) would have been preferred.
An example with another client in Germany, this time one in the huge AS 680 which sports a lot of mirrors (only 3
in my test setup):
Country 'DE', Continent 'EU'AS '680', Prefix '128.176.0.0/16', lat/lng 51.966702,7.633300 state id 07, state
'Nordrhein-Westfalen'
same AS: ftp.tu-chemnitz.de (score 100) (rank 7522635) (dist 5.40)
same AS: ftp.fernuni-hagen.de (score 100) (rank 2116017) (dist 0.64)
same AS: ftp.uni-erlangen.de (score 100) (rank 3404724) (dist 4.12)
same country: ftp-stud.hs-esslingen.de (score 100) (rank 4702914) (dist 3.56)
[...]
Here, the mirror in Hagen is indeed much closer than the other two mirrors in the AS.
Too useful to not be implemented. Committed to trunk:
http://svn.mirrorbrain.org/viewvc/mirrorbrain?view=revision&revision=8199
To be released with the upcoming version 2.14.0.
Looks very useful indeed peter, thanks!
Cory
The feature is in production at a few sites since a few days, and I got no
catastrophic failure reports so far.
On the centos-mirror mailing list, some interesting scenarios have been discussed
where the new feature is just what's needed!
Closing as "done"!
History
Date User Action Args
2010-11-11 16:23:55 poeml set status: testing -> resolved
messages: + msg309
2010-11-06 03:23:59 theuni set nosy: + theuni
messages: + msg306
2010-11-05 22:42:58 poeml set status: chatting -> testing
messages: + msg305
2010-11-05 14:10:47 poeml set messages: + msg304
2010-11-04 19:43:40 poeml set messages: + msg303
2010-11-04 19:32:15 poeml set messages: + msg302
2010-09-06 00:11:32 poeml set messages: + msg224
2010-09-05 23:53:25 poeml set assignedto: poeml
2010-04-23 03:11:34 poeml set messages: + msg184
2009-12-21 19:12:21 poeml set messages: + msg111
2009-12-11 15:56:04 poeml create
[ ]
Title Torrents could contain SHA1 top hash
Priority feature Status resolved
Superseder Nosy List poeml
Assigned To poeml Keywords
Regarding BitTorrent support (see issue 37)
Some clients, like Sharezea, can make use of a top hash, if provided. They can use
it to find other sources.
The SHA1 top hash goes into the info dict with key "sha1", as 20-byte raw string.
Thanks Harold Feit for the suggestion!
Here's an example torrent file:
http://opensource.depthstrike.com/torrent.php/MPlayer-1.0rc2.tar.bz2.torrent
fixed in trunk (r8066)
http://svn.mirrorbrain.org/viewvc/mirrorbrain?view=revision&revision=8066
History
Date User Action Args
2010-05-28 17:03:24 poeml set status: in-progress -> resolved
messages: + msg193
2010-03-31 19:34:59 poeml set messages: + msg171
2010-03-31 19:31:07 poeml create
[ ]
Title mb scan/probefile leaves temporary directories behind
Priority bug Status resolved
Superseder Nosy List poeml
Assigned To poeml Keywords
drwx------ 2 mirrorbrain mirrorbrain 48 2009-10-31 22:44 mb_probefile__30RC9
drwx------ 2 mirrorbrain mirrorbrain 48 2009-11-01 06:44 mb_probefile_QPfHmA
drwx------ 2 mirrorbrain mirrorbrain 48 2009-11-01 06:44 mb_probefile_-EUd23
drwx------ 2 mirrorbrain mirrorbrain 48 2009-11-01 06:44 mb_probefile_NDtT2m
drwx------ 2 mirrorbrain mirrorbrain 48 2009-11-01 06:44 mb_probefile_xFF50B
drwx------ 2 mirrorbrain mirrorbrain 48 2009-11-01 06:44 mb_probefile__-_NhU
drwx------ 2 mirrorbrain mirrorbrain 48 2009-11-01 06:44 mb_probefile_EhW8Mx
drwx------ 2 mirrorbrain mirrorbrain 48 2009-11-01 06:44 mb_probefile_CBTamD
drwx------ 2 mirrorbrain mirrorbrain 48 2009-11-01 20:44 mb_probefile_keI-Xv
drwx------ 2 mirrorbrain mirrorbrain 48 2009-11-01 20:44 mb_probefile_fjmV_C
These are empty directories, left behind by "mb scan" I think.
While the above was observed on a 64-bit openSUSE 11.0 install, I can't for some reason reproduce it on two other
systems: one with 32-bit openSUSE 11.0, one with 32-bit openSUSE 11.1.
The temporary directories are created by testmirror.py when it looks up files per rsync.
Given that I understand Python's except..finally statement correctly, the cleanup should always happen, because it is
under the finally clause:
elif S.scheme == 'rsync':
try:
tmpdir = tempfile.mkdtemp(prefix='mb_probefile_')
# [do stuff here]
finally:
shutil.rmtree(tmpdir, ignore_errors=True)
return
Thus, it shouls always be executed. There is no reason for errors (shutil.rmtree is recursive, and there is no
problem with file ownership to be expected).
All the machines I tested on have Python 2.5, and as far as I know, the finally clause for except..finally was (re-
)added with Python 2.5.
However, SLE10, where MirrorBrain is deployed extensively as well, there is Python 2.4. So I'm actually not sure why
the code works there at all. A small test program shows that 2.4.2 indeed has support for except..finally:
import sys
try:
print 'in the try clause'
sys.exit(0)
finally:
print 'in the finally clause'
in the try clause
in the finally clause
Replacing the sys.exit(0) with raising an exception, or just removing it, all works correctly.
The only thing that I can imagine is that the backported "processing" module on SLE10 influences the exception
handling in some unfortunate way. But that's only a very wild guess.
As the problem doesn't occur in most setups, and I can't reproduce it anymore (no
access to the previous machines anymore), I see no immediate need to pursue it.
Okay, I'll just close this now.
History
Date User Action Args
2009-12-07 03:18:38 poeml set status: deferred -> resolved
messages: + msg100
2009-11-30 01:00:47 poeml set status: chatting -> deferred
messages: + msg66
2009-11-04 20:45:18 poeml set status: unread -> chatting
messages: + msg49
2009-11-01 22:10:59 poeml create
[ ]
Title mirrorprobe: socket timeout doesn't trigger in some cases
Priority bug Status in-progress
Superseder Nosy List poeml
Assigned To poeml Keywords
I'm seeing the mirrorprobe taking more than 60 seconds sometimes, which shouldn't be the case since it runs with a network
timeout of 20 seconds.
This happens with a non-reachable mirror (the network stack returns "network not reachable")
Strangely enough, the timeout actually used is the threefold of the one configured!
A little test program reproduces this:
import sys
import socket
import urllib2
socket.setdefaulttimeout(float(sys.argv[1]))
req = urllib2.Request('http://openofficeorg.secsup.org/')
try:
response = urllib2.urlopen(req)
except urllib2.URLError, e:
sys.exit(e)
<urlopen error [Errno 101] Network is unreachable>
~/timeout-test.py 1 0.00s user 0.09s system 2% cpu 3.108 total
<urlopen error [Errno 101] Network is unreachable>
~/timeout-test.py 2 0.02s user 0.07s system 1% cpu 6.106 total
<urlopen error [Errno 101] Network is unreachable>
~/timeout-test.py 3 0.01s user 0.08s system 1% cpu 9.107 total
<urlopen error [Errno 101] Network is unreachable>
~/timeout-test.py 4 0.02s user 0.08s system 0% cpu 12.108 total
<urlopen error [Errno 101] Network is unreachable>
~/timeout-test.py 20 0.01s user 0.08s system 0% cpu 1:00.10 total
Usage should be correct:
socket.setdefaulttimeout = setdefaulttimeout(...)
setdefaulttimeout(timeout)
Set the default timeout in floating seconds for new socket objects.
A value of None indicates that new socket objects have no timeout.
When the socket module is first imported, the default is None.
This is with Python 2.6.0 on openSUSE 11.1, and I can reproduce on Python 2.5.2 on 11.0.
But the best is, the timeout works 100% as expected for other non-reachable addresses:
And, worse, the other mirror is now online again, so the bug isn't reproducible with it anymore:
~/timeout-test.py 4 http://openofficeorg.secsup.org/ 0.01s user 0.09s system 32% cpu 0.293 total
~/timeout-test.py 4 http://openofficeorg.secsup.org/ 0.04s user 0.07s system 34% cpu 0.301 total
Hrm. I think it might be a good idea to implement mirrorprobe's parallelization anew, using process pool and simply kill off
all forked children after some time. That might be more robust, and might also fix the problems with too much thread memory
on small machines.
strace brings light into the dark:
13727 connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("208.209.50.16")},
16) = -1 EINPROGRESS (Operation now in progress)
13727 poll([{fd=5, events=POLLOUT}], 1, 20000 <unfinished ...>
13725 <... select resumed> ) = 0 (Timeout)
13725 gettimeofday({1260397081, 562526}, NULL) = 0
13725 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
13725 write(3, "Dec 09 23:18:01 ooo DEBUG wai"..., 59) = 59
13725 select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
13725 gettimeofday({1260397082, 566132}, NULL) = 0
13725 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
13725 write(3, "Dec 09 23:18:02 ooo DEBUG wai"..., 59) = 59
13725 select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
[...]
13727 connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("208.209.50.19")},
16) = -1 EINPROGRESS (Operation now in progress)
13727 poll([{fd=5, events=POLLOUT}], 1, 20000 <unfinished ...>
13725 <... select resumed> ) = 0 (Timeout)
13725 gettimeofday({1260397101, 608601}, NULL) = 0
13725 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
13725 write(3, "Dec 09 23:18:21 ooo DEBUG wai"..., 59) = 59
13725 select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
13725 gettimeofday({1260397102, 608581}, NULL) = 0
13725 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
13725 write(3, "Dec 09 23:18:22 ooo DEBUG wai"..., 59) = 59
13725 select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
[...]
The host has three addresses, all three are tried, and thus the timeout applied 3 times...:
openofficeorg.secsup.org is an alias for mirrors.secsup.org.
mirrors.secsup.org has address 208.209.50.19
mirrors.secsup.org has address 208.209.50.18
mirrors.secsup.org has address 208.209.50.16
mirrors.secsup.org has IPv6 address 2600:803:420:2:b00b:b00b:1234:1111
mirrors.secsup.org has IPv6 address 2600:803:420:2:b00b:b00b:1234:1112
mirrors.secsup.org has IPv6 address 2600:803:420:2:b00b:b00b:1234:1114
Not ideal.
This raises the question whether DNSrr'ed hosts should generally be handled as separate mirrors.
(For other mirrors it's often no problem to use the IP addresses directly, but in this particular
case the Apache virtual host for the OOo mirror isn't hit then.)
At the very least, if multiple addresses are indeed tested, it should happen in
parallel.
History
Date User Action Args
2012-04-16 23:27:33 poeml set messages: + msg391
2009-12-09 22:45:13 poeml set messages: + msg103
2009-12-09 22:40:50 poeml create
[ ]
Title large file check not happening when scanning ftp.uni-muenster.de
Priority bug Status resolved
Superseder Nosy List ant, poeml
Assigned To Keywords scanner
In November 2008 I observed that the scanner didn't do the "large file check" that it
is supposed to do for files > 2GB, when it scanned the University of Muenster mirror.
(Reported in the openSUSE bug tracker at the time, and now moved here.
https://bugzilla.novell.com/show_bug.cgi?id=445831 )
I supposed that something is special about that mirror that prevents the check from
happening; for all other mirors it seemed to work.
This is probably why:
[...]
ftp.uni-muenster.de: rsync ADD: 644 -1 Wed Jul 7 11:33:29 2010
distribution/11.3/iso/openSUSE-11.3-DVD-x86_64.iso
ftp.uni-muenster.de: rsync ADD: 644 189 Thu Jul 8 11:01:22 2010
distribution/11.3/iso/openSUSE-11.3-DVD-x86_64.iso.asc
ftp.uni-muenster.de: rsync ADD: 644 63 Sun Jul 11 07:36:12 2010
distribution/11.3/iso/openSUSE-11.3-DVD-x86_64.iso.md5
ftp.uni-muenster.de: rsync ADD: 644 71 Sun Jul 11 07:38:53 2010
distribution/11.3/iso/openSUSE-11.3-DVD-x86_64.iso.sha1
ftp.uni-muenster.de: rsync ADD: 644 343117 Thu Jul 8 10:58:43 2010
distribution/11.3/iso/openSUSE-11.3-DVD-x86_64.iso.torrent
ftp.uni-muenster.de: rsync ADD: 644 716177408 Tue Jul 6 09:44:46 2010
distribution/11.3/iso/openSUSE-11.3-GNOME-LiveCD-i686.iso
[...]
The rsync daemon, or the scanner, report the file size of the DVD as -1. D'oh!
Using a real rsync client, the output does not look suspicious:
muenster.de/ftp/pub/linux/distributions/opensuse/distribution/11.3/iso/openSUSE-11.3-DVD-i586.iso
-rw-r--r-- 4346398720 2010/07/07 11:11:08 openSUSE-11.3-DVD-i586.iso
The rsync server says hello with "@rsyncd: 30.0", which means it's not an old one.
What's worse, I see that the same "broken" length (-1) comes from other mirrors, like ftp5.gwdg.de. So
we have a serious bug here that probably affects all mirrors running rsync 3.x (which were few, two
years ago, but now should be most).
I was fooled -- it is only the display of the number which was broken. I fixed
this in trunk (r8230).
Regarding the actual bug, I cannot reproduce it anymore: the large file check
happens correctly with that mirror. I'll assume that something has been changed
there and close the bug.
History
Date User Action Args
2010-11-14 17:27:34 poeml set status: deferred -> resolved
messages: + msg313
2010-11-14 16:49:17 poeml set messages: + msg312
2009-12-07 03:16:13 poeml set status: unread -> deferred
2009-12-01 20:55:19 poeml set keyword: + scanner
2009-11-04 16:33:50 ant set nosy: + ant
2009-10-07 20:40:18 poeml create
[ ]
Title ap_dbd_open/ap_dbd_close() might be more economic for db connection
utilization
Priority wish Status resolved
Superseder Nosy List poeml
Assigned Keywords mod_asn
To
from the old TODO file:
Those functions are probably much less used than ap_dbd_acquire(), and less
tested. If they work, it'd be nice; it would be important to make sure that no
close() is forgetten in one of the various "quick exit" code paths.
This change would actually be more interesting for mod_asn, than for
mod_mirrorbrain. mod_asn is usually configured to be used for all requests; like
mod_geoip, it runs early enough so it can't be known if the data looked up will
actually be used.
mod_mirrorbrain however can be configured with various exceptions to not handle a
requests at all, and ap_dbd_acquire() runs late (not earlier than needed) and
might not be reached at all. Especially, that stage is not reached for those
requests that actually serve data to clients and might persist for a while
therefore.
Fixed in trunk, r75.
History
Date User Action Args
2010-03-26 23:05:29 poeml set status: testing -> resolved
2010-03-26 23:05:01 poeml set status: deferred -> testing
messages: + msg164
2010-03-09 18:16:43 poeml set messages: + msg146
keyword: + mod_asn
2010-03-08 21:49:41 poeml set messages: + msg145
2010-03-08 21:24:38 poeml create
[ ]
Title passwords in /etc/mirrorbrain.conf can't contain whitespace
Priority bug Status resolved
Superseder Nosy List poeml, theuni
Assigned To poeml Keywords
Passwords containing spaces in /etc/mirrorbrain.conf are known not to work.
(Putting this issue into the tracker, after it had been mentioned only in
docs/bugs.rst so far.)
When I noticed the bug a while ago, I seem to remember that I traced the error
to be inside SQLObject's adapter to the database. See
http://www.sqlobject.org/sqlobject/postgres/pgconnection.py.html
(connectionFromURI() method or something like that. But I'm not sure.
In fact, the error is in the psycopg2 module, which is used by SQLObject. It gets a nice dictionary passed
for making the connection, but still manages to mess up. Maybe it passes strings to PostgreSQL where things
should be quoted...
(Pdb) print self.dsn_dict
{'user': 'mb', 'host': 'localhost', 'password': 'foo bar', 'port': 5432, 'database': 'mb_opensuse'}
(Pdb) self.module.connect(*self.dsn_dict)
** OperationalError: missing "=" after "bar" in connection info string
Okay, got it. When my module quote certain characters, psycopg2 will do the job.
I have tested with passwords containing the following characters:
space, tab, unicode characters, '`;#$!=
also '=' after a space, so it's similar to the connection string syntax)
They all work now... however double quotes (") don't. No matter how I quote them, they are not accepted.
Another note, it is probably not possible to use whitespace as first character, though, due to parsing
limitations in the ini-style configuration format.
Fixed in trunk!
http://svn.mirrorbrain.org/viewvc/mirrorbrain?view=revision&revision=7885
History
Date User Action Args
2009-12-01 18:11:42 poeml set status: deferred -> resolved
messages: + msg72
2009-12-01 16:08:24 poeml create
[ ]
Title 'mb edit' can fail to parse edited content for empty lines
Priority bug Status resolved
Superseder Nosy List poeml
Assigned To poeml Keywords
When editing a mirror record with 'mb edit', and removing the trailing space
after the colon of an empty variable, a parse failure will result when trying to
save the data.
This can happen if a complete mirror record is copy&pasted from somewhere else,
and trailing spaces get lost during this mouse action.
Traceback is:
Save changes?
y)es, n)o, e)dit again: y
Traceback (most recent call last):
File "/usr/bin/mb", line 1187, in
sys.exit( mirrordoctor.main() )
File "/var/lib/python-support/python2.5/cmdln.py", line 257, in main
return self.cmd(args)
File "/var/lib/python-support/python2.5/cmdln.py", line 280, in cmd
retval = self.onecmd(argv)
File "/var/lib/python-support/python2.5/cmdln.py", line 412, in onecmd
return self._dispatch_cmd(handler, argv)
File "/var/lib/python-support/python2.5/cmdln.py", line 1100, in _dispatch_cmd
return handler(argv[0], opts, *args)
File "/usr/bin/mb", line 516, in do_edit
if str(old_dict[i]) != new_dict[i]:
KeyError: 'publicNotes'
Another failure with copy and paste is this:
When copying output from commit diffs (captured via mb "mb export --commit=svn"),
statusBaseurl is missing in the output.
When pasting that output into "mb edit", mb edit will fail to save the entry for
this reason.
How to reproduce: run "mb edit", remove statusBaseurl line, and try to save.
Fixed in trunk, r8044.
I believe this is fixed.
About to appear in 2.13.0.
I believe that this fix caused a regression. It wasn't possible anymore to remove
an URL (setting the value to empty).
Fixed in trunk r8129.
http://svn.mirrorbrain.org/viewvc/mirrorbrain?view=revision&revision=8129
(To become available in 2.13.1.)
...and r8130.
It turned out that the original issue, reported here, wasn't working anymore.
r8130 now properly ignores the trailing space after the colon.
http://svn.mirrorbrain.org/viewvc/mirrorbrain?view=revision&revision=8130
History
Date User Action Args
2010-09-17 14:07:06 poeml set status: chatting -> resolved
2010-09-17 14:07:01 poeml set status: resolved -> chatting
messages: + msg247
2010-09-17 13:43:12 poeml set status: chatting -> resolved
2010-09-17 13:43:05 poeml set status: resolved -> chatting
messages: + msg246
2010-09-01 16:34:37 poeml set status: testing -> resolved
messages: + msg206
2010-05-04 16:47:02 poeml set status: in-progress -> testing
messages: + msg190
2010-05-04 16:02:49 poeml set messages: + msg189
2009-12-07 03:15:02 poeml set status: unread -> in-progress
2009-12-03 10:21:03 poeml create
[ ]
Title mod_geoip too old on Debian and Ubuntu
Priority feature Status deferred
Superseder Nosy List poeml
Assigned To poeml Keywords
I realised that this issue wasn't in the issue tracker yet, but only discussed
in private communication. I
put it here so it can be better tracked.
Symptom:
error log is reporting 'could not resolve continent'
The problem is that mod_geoip is too old on Debian and Ubuntu, and it lacks the
functionality to yield
region data from lookups.
Excerpts from communication:
Okay, I almost expected that -- I was already digging in the back of my head,
what I remember about the old
mod_geoip version that Debian has. The problem with it is that at the time the
continent lookup wasn't
implemented. This was actually one of the reasons why I implemented the GeoIP
lookup in mod_mirrorbrain
itself in the beginning. Later, when mod_geoip became more powerful, I switched
to using it for the
purposes.
I probably should have chosen a different route -- I should have put the
country-continent mapping
statically right into mod_mirrorbrain, because then it would have this bit of
information, without being
bound to a newer version of mod_geoip. The "new" version of mod_geoip is already
two years old or so, but
the Debian package is very outdated unfortunately.
I think a fix could be either to implement the country-continent mapping inside
mod_mirrorbrain, or to
package the newer mod_geoip for Debian/Ubuntu and provide it via the openSUSE
build service similar to the
other stuff that's already there. (And we should work with Debian to update the
package; this was in fact
one reason why I considered becoming a real Debian contributor)
I'm not sure right now though whether the continent lookup makes mod_mirrorbrain
fail completely, or if it
is maybe still able to pick mirrors based on country? Do you see it doing random
choices, or does it pick
mirrors by country?
I see it just sets continent_code = "--" when that lookup fails, so if the
country of a client is the same
as the country of a mirror, it should redirect to that. If not, it'll use a
random mirror; yes, that's what
probably happens.
(You could set "MirrorBrainDebug yes" in the Apache config to see a lot of
detail.)
I'm sorry! But with newer mod_geoip (and if that fails, also newer libgeoip)
this can be resolved.
Peter
Good morning,
I debianized my apache2-mod_geoip package now, and a Debian/Ubuntu libapache2-
mod-geoip package will become
available in the repositories later today; that should fix the issue.
The GeoIP library itself is version 1.4.4 which should be new enough.
Peter
Thus, this issue is "solved" by providing the updated mod_geoip by us for
installation through the openSUSE
buildservice for Debian and Ubuntu.
For the record, another possible solution would be to
I'm setting the issue to "fixed".
I opened a bug upstream: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=556904
No news from Debian upstream.
I received mail today that the module has been updated in Debian, which closes:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=400980
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=556904
Action item: check which Debian/Ubuntu versions already benefitted from the
update, and which versions still need a backport.
I just checked on Ubuntu Lucid (10.04), and mod_geoip still is at 1.1.8, while we
provide 1.2.5.
History
Date User Action Args
2010-09-26 12:06:17 poeml set messages: + msg296
2010-09-06 00:09:14 poeml set messages: + msg223
2010-05-21 07:27:56 poeml set messages: + msg191
2010-04-23 03:05:19 poeml set status: chatting -> deferred
2010-04-23 03:05:09 poeml set messages: + msg183
2009-12-07 03:17:26 poeml set priority: bug -> feature
2009-11-18 08:58:38 poeml set status: resolved -> chatting
messages: + msg65
2009-10-26 19:33:52 poeml set status: unread -> resolved
messages: + msg39
2009-10-26 19:33:03 poeml create
[ ]
Title Ubuntu 9.04: SQLObject gives Python deprecation warnings
Priority bug Status resolved
Superseder Nosy List poeml
Assigned To poeml Keywords Packaging
When running the "mb" tool on Ubuntu 9.04, the SQLObject Python module spits out a
deprecation warning each time it is run:
/usr/lib/python2.6/dist-packages/sqlobject/converters.py:165:
DeprecationWarning: the sets module is deprecated
from sets import Set, ImmutableSet
The depracation warning is in the sqlobject module and is issued when Python 2.6 is
used, which is the default Python on Ubuntu 9.04. We might get rid of it by filing a bug
for the python-sqlobject package, or by using Python 2.5 instead.
I have fixed this in the following way:
See http://www.debian.org/doc/packaging-manuals/python-policy
A fixed package should become available in the build system later today at
http://download.opensuse.org/repositories/Apache:/MirrorBrain/
I'm closing the bug because I believe it's fixed!
The fixed package should automatically pull in the python2.5 package.
Problem: In Ubuntu 10.04, there is no Python 2.5 anymore. Thus, we need to get rid
of this dependency, or make it work in both cases...
http://packages.ubuntu.com/maverick/python-sqlobject
Links to the python-sqlobject packages on different Ubuntu versions.
Unfortunately, the the python-sqlobject package still generates the depracation
warning on 10.04...
Found a workaround.
The hack in debian/rules which fixed the Python scripts on the 2.5 interpreter version is now applied
only on Debian/Ubuntu platforms other than Ubuntu 10.04.
The amateurish "line" of code that achieves this is:
case 10.04 in $(shell cut -d" " -f 2 /etc/issue) ) echo not patching;; *) echo patching Python
scripts to force Python 2.5 being used; sed -i 's/^#!/usr/bin/python$$/&2.5/' mb/mb.py; sed -i
's/^#!/usr/bin/python$$/&2.5/' mirrorprobe/mirrorprobe.py;; esac
Ugly but works.
I'm closing the issue again.
The workaround^Whack wasn't complete I think. Due to the fact that on older Ubuntu
versions the Python module was built twice, both for Python 2.5 and 2.6, only the
last installed script was patched. It is also necessary to patch the script that
temporarily resides in mb/build/scripts-2.6/mb.py. That's done now.
History
Date User Action Args
2010-09-08 00:00:38 poeml set status: chatting -> resolved
2010-09-08 00:00:30 poeml set status: resolved -> chatting
messages: + msg236
2010-09-05 23:59:20 poeml set status: chatting -> resolved
messages: + msg221
2010-09-02 23:28:23 poeml set messages: + msg211
2010-09-02 22:18:23 poeml set messages: + msg208
2010-09-02 21:19:05 poeml set status: resolved -> chatting
messages: + msg207
2009-10-08 10:56:23 poeml set status: testing -> resolved
messages: + msg27
2009-10-08 08:07:24 poeml set status: in-progress -> testing
messages: + msg23
2009-10-08 07:59:53 poeml set status: unread -> in-progress
assignedto: poeml
2009-10-06 22:00:46 poeml create
[ ]
Title Notes on Debian install
Priority wish Status resolved
Superseder Nosy List poeml, theuni
Assigned To poeml Keywords
re: irc discussion with poeml.
When following the Debian guide here:
http://mirrorbrain.org/docs/installation/debian/
I can attest that the install went rather smoothly, here are some suggestions
for the guide:
Thanks for the report!
- mod_geoip is already configured on install, and is located at: /usr/share/GeoIP/ . I
updated for good measure.
Two notes:
I added a note to the docs at just that place where you probably have wondered.
- No need to enable the geoip/mod_form, they are both enabled upon install.
Right - cool thing!
For mod_form it wasn't the case, but I have stolen the postinstall/postrm scripts from
mod_geoip and added them to mod_form, so in the forthcoming builds of the package it'll also be
enabled automatically.
And while I'm at it, I'm adding the same for mod_mirrorbrain as well, plus mod_dbd that it
depends on. Great tip, it will shorten the install instructions considerably.
- After creating the mirrorbrain user, be sure to 'passwd mirrorbrain'. Also needs a 'sudo
mkdir /home/mirrorbrain && sudo chown mirrorbrain:mirrorbrain /home/mirrorbrain'
Great catch, -m was missing as option to the useradd call. Sorry. I added it; that takes care
of creating the home directory.
The next step would be to do this automatically upon installation of the package. This is
already tracked in issue 4.
- When importing the structure and data, needs to be done as mirrorbrain user.
- Note that the passwords in /etc/mirrorbrain.conf and /etc/apache2/mods-available/dbd.conf
are placeholders, they need to be changed to your db's password.
Ah, right. Note added.
http://svn.mirrorbrain.org/viewvc/mirrorbrain?view=revision&revision=7888
Thanks a lot! Fabulous report!
History
Date User Action Args
2009-12-02 00:37:05 poeml set status: in-progress -> resolved
messages: + msg80
2009-12-01 20:56:40 poeml set status: unread -> in-progress
2009-11-30 23:00:03 theuni create
[ ]
Title add support for the new (proposed IETF) Metalink standard
Priority wish Status resolved
Superseder Nosy List ant, poeml
Assigned To poeml Keywords
Support for Metalinks as defined in the Internet-Draft http://tools.ietf.org/html/draft-bryan-
metalink
should be implemented (as reference implementation and for testing reasons).
http://groups.google.com/group/metalink-discussion/web/internetdraft has additional information.
Note to self regarding implementation of the change in timestamp format:
We'll use RFC3339 timestamps instead of RFC822 timestamps in
the future. Cf.
r->request_time, which is attached to the request object, might be useful
instead of apr_time_now(), because it's already filled out and can save the
syscall to time().
Largely implemented; one big thing left to do: the hash cache needs to be
reimplemented. That's something I planned to do anyway, but now I realize that it
is blocking this implementation. The hash cache currently contains preformatted
hashes ready for injection into v3 Metalinks.
But other than that, there is not much left to do. (Commit to trunk pending)
Working on the required hash cache redesign -- now tracked separately in issue 40.
Done!
Closing. 2.13.0 is shortly before becoming released, after months of testing.
History
Date User Action Args
2010-09-01 16:14:24 poeml set status: testing -> resolved
messages: + msg205
2010-03-12 02:49:04 poeml set status: in-progress -> testing
messages: + msg158
2010-03-08 20:47:06 poeml set messages: + msg137
2010-02-16 23:32:45 poeml set assignedto: poeml
messages: + msg131
2010-02-16 22:36:39 poeml set status: chatting -> in-progress
2009-11-04 16:32:08 ant set nosy: + ant
title: add support for (new) Metalink standard
2009-10-09 00:43:25 poeml set -> add support for the new (proposed IETF)
Metalink standard
2009-10-09 00:27:27 poeml set messages: + msg36
2009-10-09 00:23:43 poeml create
[ ]
Title a subdirectory scan may "loose" files outside that subdirectory
Priority bug Status resolved
Superseder Nosy List poeml
Assigned To poeml Keywords
The list of filenames that is grabbed at the beginning of a "subdirectory scan" ('mb scan -d DIR') is too broad. It uses a string prefix
match that isn't terminated with a trailing slash. Thus, a scan in "factory" grabs also files in "factory-snapshot". This leads to
deletion of all the files outside of the directory at the end of the scan.
(The deletion is supposed to happen only for the files that have disappeared in the given subdirectory; this is implemented by copying
the list of known files into a temporary database table at the beginning, and each file that is seen on the mirror during scanning is
removed from that table. All remaining files are deleted in the end. Thus, if too many files are grabbed in the beginning, they'll be
deleted, too.)
The problem is here:
http://svn.mirrorbrain.org/viewvc/mirrorbrain/trunk/tools/scanner.pl?view=markup
277 if(length $start_dir) {
278 $sql = "CREATE TEMPORARY TABLE temp1 AS
279 SELECT id FROM filearr
280 WHERE path LIKE '$start_dir%'
281 AND $row->{id} = ANY(mirrors)";
The LIKE expression needs be changed to '$start_dir/%'.
The fix is successfully tested in openSUSE's setup, where the bug was noticed
Fixed in trunk
http://svn.mirrorbrain.org/viewvc/mirrorbrain?view=revision&revision=7846
Fixed with the 2.10.2 release.
History
Date User Action Args
2009-11-04 19:04:02 poeml set status: testing -> resolved
messages: + msg48
2009-10-30 12:18:12 poeml set messages: + msg46
2009-10-30 12:12:50 poeml set messages: + msg45
2009-10-30 12:11:59 poeml set status: unread -> testing
2009-10-30 12:11:53 poeml create
[ ]
Title PATH_INFO is not ignored, as generally done by Apache for static
files
Priority bug Status resolved
Superseder Nosy List poeml
Assigned To poeml Keywords
Here's a report from the openSUSE bugzilla:
https://bugzilla.novell.com/show_bug.cgi?id=546396
Description From Matt Barringer 2009-10-13 02:31:27 MDT (-) [reply]
A URL like
http://download.opensuse.org/repositories/FATE:/keeper/openSUSE_11.1/noarch/sxkeeper-suse-1.4.0-
4.1.noarch.rpm/not/really/an/rpm/url
returns the RPM rather than returning a 404 error. This causes some bad
problems for SUSE Studio when users add build service repositories by URL.
------ Comment #2 From Matt Barringer 2009-10-26 12:19:19 MDT (-) [reply] -------
Is there a reason to not treat the RPM URLs as files, rather than as a CGI
script? I don't really understand why PATH_INFO would be useful here.
------- Comment #3 From Peter Poeml 2009-10-26 15:12:27 MDT (-) [reply] -------
The behaviour is indeed different than when you would request a static file
from Apache.
The debug log (where r is the request context that Apache gives my module)
debugLog(r, cfg, "URI: '%s'", r->unparsed_uri);
debugLog(r, cfg, "filename: '%s'", r->filename);
logs the following for a request on .../foo/bar:
[Mon Oct 26 21:49:14 2009] [warn] [client 10.10.3.98] [mod_mirrorbrain] URI:
'/zrkadlo/repositories/Apache/openSUSE_11.0/i586/apache2-devel-2.2.12-1.1.i586.rpm/foo/bar'
[Mon Oct 26 21:49:14 2009] [warn] [client 10.10.3.98] [mod_mirrorbrain]
filename:
'/srv/www/htdocs/zrkadlo/repositories/Apache/openSUSE_11.0/i586/apache2-devel-2.2.12-1.1.i586.rpm'
The documentation
(http://httpd.apache.org/docs/2.2/mod/core.html#acceptpathinfo) enlightens us:
The core handler for normal files defaults to rejecting PATH_INFO requests. Handlers that serve scripts, such as cgi-script and isapi-handler, generally accept PATH_INFO by default.
mod_mirrorbrain is not a script, but it runs a handler similar to a script.
If it makes sense, the behaviour could be changed. What is the motivation? What
kind of funny requests are causing the problem?
(For the record, AcceptPathInfo Off in the context of the MirrorBrain config
has the desired effect as well)
I committed a fix to the trunk. It'll appear in the next release, which I'll hopefully complete during
the next weeks.
With the following patch, requests with PATH_INFO correctly lead to 404s, unless the server is
configured explicitely to allow PATH_INFO. This follows the best practice that modules should respect
the default.
mod_mirrorbrain now no longer (falsely) behaves like mod_cgi*.
--- mod_mirrorbrain.c (revision 8042)
+++ mod_mirrorbrain.c (working copy)
@@ -1092,6 +1092,12 @@
return DECLINED;
}
/* is there PATH_INFO, and are we supposed to accept it? */
if ((r->path_info && *r->path_info)
&& (r->used_path_info != AP_REQ_ACCEPT_PATH_INFO)) {
debugLog(r, cfg, "ignoring request with PATH_INFO");
return DECLINED;
}
debugLog(r, cfg, "URI: '%s'", r->unparsed_uri);
debugLog(r, cfg, "filename: '%s'", r->filename);
http://svn.mirrorbrain.org/viewvc/mirrorbrain?view=revision&revision=8043
I haven't received feedback about the effectivness of the bug yet; but I'm closing
this issue as resolved nevertheless. Please reopen if necessary.
History
Date User Action Args
2010-09-06 00:01:52 poeml set status: testing -> resolved
messages: + msg222
2010-04-23 03:31:01 poeml set messages: + msg186
2010-04-23 03:27:46 poeml set status: in-progress -> testing
messages: + msg185
2010-02-10 04:43:26 poeml set status: unread -> in-progress
2009-10-26 22:10:19 poeml create
[ ]
Title filenames with spaces not always detected on mirrors
Priority bug Status resolved
Superseder Nosy List poeml, theuni
Assigned To poeml Keywords
See here for reference:
http://mirrors.xbmc.org/addons/plugins/pictures/The%20Big%20Picture.tar.gz?mirrorlist
Some mirrors have the file, but aren't added to the list. Using that example the
following mirrors are 2 of the ones missing from that list:
http://ftp.osuosl.org/pub/xbmc/addons/plugins/pictures/The%20Big%20Picture.tar.gz
http://mirror.netcologne.de/xbmc/addons/plugins/picture/The%20Big%20Picture.tar.gz
I haven't been able to track down why some work and others don't. But it's clear
that it's the whitespace that causes issues.
oops, typo in report.
http://mirror.netcologne.de/xbmc/addons/plugins/picture/The%20Big%20Picture.tar.gz
should be
http://mirror.netcologne.de/xbmc/addons/plugins/pictures/The%20Big%20Picture.tar.gz
I set up a little test instance here. The list of mirrors, and the URLs look fine, when I search in the database. Here's an example with one
file containing spaces, one not:
mirrorbrain@doozer:> mb file ls 'addons/plugins/pictures/iPhoto.tar.gz' -u> mb file ls 'addons/plugins/pictures/The Big Picture.tar.gz' -u
eu de 100 ok ok www.softliste.de http://www.softliste.de/xbmc/addons/plugins/pictures/iPhoto.tar.gz
eu de 100 ok ok mirror.netcologne.de http://mirror.netcologne.de/xbmc/addons/plugins/pictures/iPhoto.tar.gz
eu es 100 ok ok evorq.ugr.es http://evorq.ugr.es/xbmc/addons/plugins/pictures/iPhoto.tar.gz
eu fr 100 ok ok distrib-coffee.ipsl.jussieu.fr http://distrib-
coffee.ipsl.jussieu.fr/pub/mirrors/xbmc/addons/plugins/pictures/iPhoto.tar.gz
eu se 100 ok ok ftp.sunet.se http://ftp.sunet.se/pub/multimedia/xbmc/addons/plugins/pictures/iPhoto.tar.gz
na us 100 ok ok mirror.its.uidaho.edu http://mirror.its.uidaho.edu/pub/xbmc/addons/plugins/pictures/iPhoto.tar.gz
na us 100 ok ok www.gtlib.gatech.edu http://www.gtlib.gatech.edu/pub/xbmc/addons/plugins/pictures/iPhoto.tar.gz
na us 100 ok ok ftp.osuosl.org http://ftp.osuosl.org/pub/xbmc/addons/plugins/pictures/iPhoto.tar.gz
mirrorbrain@doozer:
eu de 100 ok ok www.softliste.de http://www.softliste.de/xbmc/addons/plugins/pictures/The Big Picture.tar.gz
eu de 100 ok ok mirror.netcologne.de http://mirror.netcologne.de/xbmc/addons/plugins/pictures/The Big Picture.tar.gz
eu es 100 ok ok evorq.ugr.es http://evorq.ugr.es/xbmc/addons/plugins/pictures/The Big Picture.tar.gz
eu fr 100 ok ok distrib-coffee.ipsl.jussieu.fr http://distrib-coffee.ipsl.jussieu.fr/pub/mirrors/xbmc/addons/plugins/pictures/The Big
Picture.tar.gz
eu se 100 ok ok ftp.sunet.se http://ftp.sunet.se/pub/multimedia/xbmc/addons/plugins/pictures/The Big Picture.tar.gz
na us 100 ok ok mirror.its.uidaho.edu http://mirror.its.uidaho.edu/pub/xbmc/addons/plugins/pictures/The Big Picture.tar.gz
na us 100 ok ok www.gtlib.gatech.edu http://www.gtlib.gatech.edu/pub/xbmc/addons/plugins/pictures/The Big Picture.tar.gz
na us 100 ok ok ftp.osuosl.org http://ftp.osuosl.org/pub/xbmc/addons/plugins/pictures/The Big Picture.tar.gz
I also get all 8 mirrors in the mirror list:
http://xbmc.mirrorbrain.org/addons/plugins/pictures/iPhoto.tar.gz?mirrorlist
http://xbmc.mirrorbrain.org/addons/plugins/pictures/The%20Big%20Picture.tar.gz?mirrorlist
However, I just added the mirrors with their HTTP URL. Thus, I use HTTP for scanning, which seems to work. Maybe you have FTP and rsync URLs
configured - which would be preferred by the scanner.
So, I added the FTP URL for the netcologne mirror, and scan it again:
mirrorbrain@doozer:> mb scan netcolo> mb file ls 'addons/plugins/pictures/The Big Picture.tar.gz' -u
Sat Dec 5 12:55:51 2009 mirror.netcologne.de: starting
Sat Dec 5 12:55:51 2009 mirror.netcologne.de: total files before scan: 96
Sat Dec 5 12:55:51 2009 mirror.netcologne.de: no rsync, trying ftp
Sat Dec 5 12:55:52 2009 mirror.netcologne.de: scanned 69 files (60/s) in 1s
Sat Dec 5 12:55:52 2009 mirror.netcologne.de: files to be purged: 39
Sat Dec 5 12:55:52 2009 mirror.netcologne.de: total files after scan: 69
Sat Dec 5 12:55:52 2009 mirror.netcologne.de: purged old files in 0s.
Sat Dec 5 12:55:52 2009 mirror.netcologne.de: done.
Completed in 1 seconds
mirrorbrain@doozer:
eu de 100 ok ok www.softliste.de http://www.softliste.de/xbmc/addons/plugins/pictures/The Big Picture.tar.gz
eu es 100 ok ok evorq.ugr.es http://evorq.ugr.es/xbmc/addons/plugins/pictures/The Big Picture.tar.gz
eu fr 100 ok ok distrib-coffee.ipsl.jussieu.fr http://distrib-coffee.ipsl.jussieu.fr/pub/mirrors/xbmc/addons/plugins/pictures/The Big
Picture.tar.gz
eu se 100 ok ok ftp.sunet.se http://ftp.sunet.se/pub/multimedia/xbmc/addons/plugins/pictures/The Big Picture.tar.gz
na us 100 ok ok ftp.osuosl.org http://ftp.osuosl.org/pub/xbmc/addons/plugins/pictures/The Big Picture.tar.gz
na us 100 ok ok www.gtlib.gatech.edu http://www.gtlib.gatech.edu/pub/xbmc/addons/plugins/pictures/The Big Picture.tar.gz
na us 100 ok ok mirror.its.uidaho.edu http://mirror.its.uidaho.edu/pub/xbmc/addons/plugins/pictures/The Big Picture.tar.gz
It's gone from the list. Thus, the problem is in the FTP scanner.
A verbose scan shows the following:
% mb scan netcolo -vvvvv --sql-debug
[...]
mirror.netcologne.de: ftp dir: addons/plugins/pictures
mirror.netcologne.de: -rw-r--r-- 1 804 804 39209 Nov 09 03:13 Phonebin.tar.gz
-rw-r--r-- 1 804 804 436921 Nov 09 03:13 Picasa.tar.gz
-rw-r--r-- 1 804 804 255436 Nov 09 03:13 The Big Picture.tar.gz
-rw-r--r-- 1 804 804 479633 Nov 09 03:12 flickr.tar.gz
-rw-r--r-- 1 804 804 2351538 Nov 09 03:12 iPhoto.tar.gz
-rw-r--r-- 1 804 804 21191 Nov 09 03:13 widelec.org.tar.gz
SELECT mirr_add_bypath(?, ?); <-- 1, addons/plugins/pictures/Phonebin.tar.gz
DELETE FROM temp1 WHERE id = 4
SELECT mirr_add_bypath(?, ?); <-- 1, addons/plugins/pictures/Picasa.tar.gz
DELETE FROM temp1 WHERE id = 5
SELECT mirr_add_bypath(?, ?); <-- 1, addons/plugins/pictures/flickr.tar.gz
DELETE FROM temp1 WHERE id = 7
SELECT mirr_add_bypath(?, ?); <-- 1, addons/plugins/pictures/iPhoto.tar.gz
DELETE FROM temp1 WHERE id = 8
SELECT mirr_add_bypath(?, ?); <-- 1, addons/plugins/pictures/widelec.org.tar.gz
DELETE FROM temp1 WHERE id = 9
mirror.netcologne.de: committing ftp dir addons/plugins/pictures
The file is seen via FTP (first half of the log), but no action taken when it comes to storing the file into the
database (second part of log). Staring at the scanner, line 638 now.
fixed in trunk:
http://svn.mirrorbrain.org/viewvc/mirrorbrain/trunk/tools/scanner.pl?
r1=7905&r2=7904&pathrev=7905
I'm verifying the results that the new scanner brings by running it on all
OpenOffice.org mirrors that are scanned via FTP; it looks good.
fixed in 2.11.2. Thanks for the report!
Whoops, missed the reply.
Indeed this is fixed and working great. Thanks for the quick turnaround as usual.
History
Date User Action Args
2009-12-09 22:16:27 poeml set status: chatting -> resolved
2009-12-09 07:44:41 theuni set status: resolved -> chatting
messages: + msg101
2009-12-05 21:21:25 poeml set status: testing -> resolved
messages: + msg95
2009-12-05 16:10:37 poeml set messages: + msg94
2009-12-05 12:39:36 poeml set status: in-progress -> testing
messages: + msg93
2009-12-05 12:08:27 poeml set messages: + msg92
2009-12-05 11:57:43 poeml set messages: + msg91
2009-12-05 11:05:02 poeml set status: chatting -> in-progress
2009-12-05 06:09:26 theuni set status: unread -> chatting
messages: + msg90
2009-12-05 06:07:20 theuni create
no content (issue created during migration from old issue tracker, as placeholder)
[ ]
Title Things lacking in the Debian/Ubuntu packages
Priority wish Status in-progress
Superseder Nosy List poeml
Assigned To poeml Keywords Packaging
Files
File name Uploaded Type Edit Remove
dom_fail.html danmea77, 2014-02-19.15:14:40 text/html
In the recently created Debian packages, there are a number of things missing:
I think that the Debian dovecot package could be serving as perfect example; it
creates a group, a user, has the respective prerequires on "useradd" and stuff
like that, and it also installs config files. It has a -common package that
mirrorbrain could also have (more obvious than "mirrorbrain" for the common
files, and probably closer to the Debian packaging policies).
http://packages.debian.org/lenny/dovecot-common
user, group, /var/log/mirrorbrain are created automatically now.
Logrotate snippet left.
History
Date User Action Args
2014-02-19 15:14:40 danmea77 set files: + dom_fail.html
2010-09-07 18:04:06 poeml set status: chatting -> in-progress
messages: + msg232
2009-12-07 03:16:52 poeml set priority: bug -> wish
2009-10-06 20:51:00 poeml set status: unread -> chatting
messages: + msg14
2009-10-06 20:48:59 poeml create
[ ]
Title metalink-hasher: doesn't set correct mtime in some cases
Priority bug Status resolved
Superseder Nosy List poeml
Assigned To poeml Keywords
Checking whether a saved metalink hash file is up to date, the comparison for mtimes doesn't
work sometimes. I encountered this case:
hashes/ue/srv/mirrors/ue /srv/mirrors/ue -v
looking at /srv/mirrors/ue
locked /srv/metalink-hashes/ue/srv/mirrors/ue/LOCK
Hashing '/srv/mirrors/ue/bar' ... <-----------
Up to date: '/srv/metalink-hashes/ue/srv/mirrors/ue/ultimate-edition-2.4-
x64.iso.size_2562793472'
Up to date: '/srv/metalink-hashes/ue/srv/mirrors/ue/ultimate-edition-2.4-
x86.iso.size_2534909952'
unlocking /srv/metalink-hashes/ue/srv/mirrors/ue/LOCK
The log shows how /srv/mirrors/ue/bar is re-hashed. However, the script always claims that.
Inserting a debug print shows that the mtime on the source and destination file are
slightly different (in subsecond range):
hashes/ue/srv/mirrors/ue /srv/mirrors/ue -v
looking at /srv/mirrors/ue
locked /srv/metalink-hashes/ue/srv/mirrors/ue/LOCK
src mtime 1259537492.4343064 <---------
dst mtime 1259537492.4343059 <---------
Hashing '/srv/mirrors/ue/bar' ...
src mtime 1257621747.0
dst mtime 1257621747.0
Up to date: '/srv/metalink-hashes/ue/srv/mirrors/ue/ultimate-edition-2.4-
x64.iso.size_2562793472'
src mtime 1257621923.0
dst mtime 1257621923.0
Up to date: '/srv/metalink-hashes/ue/srv/mirrors/ue/ultimate-edition-2.4-
x86.iso.size_2534909952'
unlocking /srv/metalink-hashes/ue/srv/mirrors/ue/LOCK
This means that os.utime() doesn't set the correct utime for some reason.
I'm fixing this in trunk by comparing int(mtime) to int(mtime) instead of a direct
comparison.
fixed in r7881
http://svn.mirrorbrain.org/viewvc/mirrorbrain?view=revision&revision=7881
No adverse effects observed with the fix. Seems to work. Closing this issue.
History
Date User Action Args
2009-12-01 20:58:37 poeml set status: testing -> resolved
messages: + msg75
2009-11-30 14:49:39 poeml set status: in-progress -> testing
messages: + msg68
2009-11-30 14:46:09 poeml create
[ ]
Title Don't send 404 to files that still exist on some of the mirrors
Priority bug Status chatting
Superseder Nosy List poeml, rhertzog
Assigned To poeml Keywords
When experimenting with a MirrorBrain setup that uses a dummy file tree, I ran into the
situation that the file tree wasn't complete, and I got 404s (file not found) in the
client. The same would happen if the tree is not up to date, and some new files are not
present yet.
When trying to keep the system running under adverse circumstances, it doesn't make sense
to error out in such a case, and it would probably make sense to redirect such requests to
one of the fallback servers. (Referring to the fallback servers that can be configured
since recently, r7880.) Or maybe a different set of servers, don't know.
An obvious disadvantage is that those fallback servers end up getting all requests that
requests that lead to a 404. Those mirror servers must be assumed to be fairly complete
for the whole thing to make sense.
On the plus side, this way the redirector could keep running even when it looses its file
tree (disk crash).
Not to forget, this feature (and similar ones) could be made configurable, so the
behaviour could be switched on only in emergency, thereby minimizing negative
consequences. Or, touching a file in the filesystem could signal to Apache that it needs
into "degraded mode".
As a slight variant of this, Apache could still do database lookups, even if the file tree
is gone. That would preserve the ability to redirect to all mirrors that have a requested
file, and only those that have it (and not blindly).
The feature would need to hook in earlier in the request phase. It should be relatively
straightforward to implement.
I don't have a strong need to send 404 to fallback servers but I have a real
need to not send 404 when the requested file is still available on some of the
mirrors in the database, even though it's gone from the master copy.
I'm using mirrorbrain on top of Debian package archives. Some servers tend to
lag behind for a few hours/days for various reasons. Imagine a situation where
the package list references package_1.0_all.deb and all servers are in sync. The
master copy is updated with a new package list and file tree that contains
package_2.0_all.deb but not package_1.0_all.deb. Until the various mirror get in
sync, people will be redirected to old package list and they will request the
old package but they will get back 404 because the old package is gone from the
master copy while the local mirror they are usually redirected to still has the
required file.
Thus the default setup is actively harmful in that regard. If you consider that
serving old files might be a security issue, you might want to add a
configuration parameter to limit the time that you accept to redirect to
obsolete files. But we need some time period where this is allowed or things
will break.
I'm thus taking the liberty to change the title because I believe that's the
better way to solve your initial problem too.
Thanks for this thoughtful comment. MirrorBrain is indeed a bit
narrow-minded in this regard, because it simply assumed that this case
doesn't occur (or is not wanted). I kind of accepted this but I also see
the limitation. But your suggestion makes a lot of sense. It would be
very clever to simply do a database lookup in case of a request on a
non-existing file. There might even be hashes in the database for such a
file, which a client could use to verify file integrity.
I have to think about the implementation. No time right now, but I
wanted to at least reply shortly for now. I heard you :-)
Thanks for the answer. Let me know if I can help.
I don't know if I can bribe you to implement my requests (also #150) but I'd be
willing to upload (and maintain) mirrorbrain to the official Debian archive in
exchange (I'm a Debian developer). :-)
Of course you can bribe me :-) In fact, one of my biggest wishes would
come true! I actually started looking into becoming a Debian package
maintainer recently, because the lack of MirrorBrain packages became
already evident (and I stumbled over your fine manual!). I would be
thrilled if you could help out with packages. Then I can actually use
the time to work on MirrorBrain itself.
Having said that, you are of course also welcome to join hacking; and as
one of the next steps I'll collect ideas for implementation, as it's
always hard to find the right places in the code when not being familiar
with it. (And it serves as a refresher for myself.) This also applies to
the other request you sent. So stay tuned...
History
Date User Action Args
2014-02-20 00:49:04 poeml set messages: + msg546
2014-02-19 14:46:43 rhertzog set messages: + msg545
2014-02-19 12:42:57 poeml set messages: + msg544
priority: wish -> bug
nosy: + rhertzog
messages: + msg542
2014-02-17 14:44:23 rhertzog set title: Send 404s to certain fallback
mirrors? -> Don't send 404 to files
that still exist on some of the
mirrors
2014-02-17 14:31:30 rhertzog set files: - ul36.html
2013-10-27 16:09:19 funnycafeteria6 set files: + ul36.html
2009-12-01 15:35:30 poeml create
[ ]
Title issue with mirrorprobe mail handler
Priority bug Status resolved
Superseder Nosy List dfarning, poeml
Assigned To poeml Keywords
Peter here is a issues report on the mail handler.
mirrorbrain@Y650:/home/dfarning$ mirrorprobe
Traceback (most recent call last):
File "/usr/bin/mirrorprobe", line 334, in
main()
File "/usr/bin/mirrorprobe", line 216, in main
'root@' + socket.gethostbyaddr(socket.gethostname())[0],
socket.gaierror: [Errno -2] Name or service not known
I just commented out lines 214-222 in /usr/bin/mirrorprobe .
david
Hi David, I remember the email where you already mentioned this. I didn't read closely enough though.
I thought the error came from some mirror hostname. In fact, it is the machine name itself (of the
MirrorBrain host) that the script tries to resolve.
The lookup is done to have a hostname/domain to append to mailed logs; the feature of mailing logs
isn't actually used anymore, but the code is still there.
On my test host, the hostname "ubuntu" resolves to 127.0.0.1:
ubuntu
('ubuntu', [], ['127.0.1.1'])
The code is obviously naively running on assumptions that can't be met everywhere.
Development plans for the mirrorprobe are:
Therefore, I'd think the code should be removed, and make place for a better notification mechanism. A
future notification system should be integrated with a web frontend, and allow for mail notification
at the same time, for important things.
I'll commit a fix in SVN, but as it will take a while until that ends up in the Ubuntu packages, I'd
recommend a workaround for now. Maybe you can adjust /etc/hostname and /etc/hosts in a way that it
avoids the crash - or comment out the code as you did.
This works here:
root@ubuntu:# cat /etc/hostname# grep ubuntu /etc/hosts
ubuntu
root@ubuntu:
127.0.1.1 ubuntu
root@ubuntu:# python -c "import socket; socket.gethostbyaddr(socket.gethostname())[0]"#
root@ubuntu:
(Returns empty value, but doesn't crash at least)
Fix committed with r7838. Will be included in the next release (post-2.10.1).
History
Date User Action Args
2009-10-08 11:46:56 poeml set status: chatting -> resolved
messages: + msg30
status: unread -> chatting
assignedto: poeml
2009-10-08 11:33:53 poeml set messages: + msg29
title: issue with mail handler on ubuntu
9.04 -> issue with mirrorprobe mail handler
2009-10-08 07:08:24 poeml set nosy: + poeml
2009-10-07 23:17:12 dfarning create
no content (issue created during migration from old issue tracker, as placeholder)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.