Giter Club home page Giter Club logo

http-redirector's People

Contributors

mikaela avatar pabs3 avatar rgeissert avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

http-redirector's Issues

display a map of the mirrors

Most likely displaying some details, but an overall map would be a start. One where the visitor's location is also included.

Move mirror type-specific knowledge into packages

As a first step, the checker should not need to have any mirror type-specific knowledge. At present time the code should be redundant enough in those areas as to make it easy to abstract those parts. The mirror type-specific stuff could go into Mirror::Type::foo.

Per IP family mirror subsets

It may happen that a subset is switched to a new serial even when there are no IPv6 mirrors that are up to date. This leads to v6 requests to be redirected to another subset until a v6-enabled mirror is up to date and joins the subset.

The subsets should depend on the IP family, therefore allowing v6 mirrors to be older than their v4 counterpart.

Show additional Geo/IP information

Would it be possible to include additional location showing where the redirector (geoip) believes the query is coming from?

For example:

IP: xxx.xxx.xxx.xx
AS: xxxx
Continent: NA
Country: xx
State: xxxxx
City: xxxxx

Check mirrors over IPv6

At present time, the field in the master list is blindly trusted. There should at least be a one-time check to make sure IPv6 connectivity does work.

Support v6 AS peers db

The v4 AS peering database can't be used for v6 clients, even if ticket #11 was fixed. Each IP version needs its own database.

RFC6249 Link headers could break APT

Pre-wheezy's versions of APT abort on header lines longer than 360 characters, inclusive.
Whenever support for RFC6249 is enabled for GET requests, either the number of links should be:

  • cut down so that the length of a comma-separated list of links doesn't reach the limit, or
  • a Vary: User-Agent header used and some sort of white or black list implemented

double-check country against GeoIP.dat db?

It appears that in some cases the GeoLiteCity.dat database, incorrectly, says a given IP is in one country, yet the more generic GeoIP.dat database does indicate the correct country. Ideally, with an AS match, or AS-peer match this shouldn't be an issue, but we are not there yet, and it has happened.

Make it an application server

Turn the redirector from a CGI into an application server using plack, to allow it to be run as a fastcgi or mod_perl application.
This requires a few changes as to the way the output is handled, but the main changes are related to the databases, and how they would need to be reloaded.

allow per-mirror restrictions

Some mirrors have good connectivity within their country but have poor connectivity to the outside world. Such kinds of restrictions (country-based, AS-based, etc) should be allowed to be taken into consideration by the redirector.

This kind of restriction should probably be added as another filed in the master list. E.g.
Restricted-to: "AS" | "country" | "subnet"

Redirects to out-of-date mirror

I'm currently getting redirected to an out of date mirror. Since the redirector is supposed to detect these, I'm filing this as a bug

% sudo apt-get update
Get:1 http://http.debian.net sid InRelease [268 kB]
E: Release file for http://http.debian.net/debian/dists/sid/InRelease is expired (invalid since 1d 20h 13min 24s). Updates for this repository will not be applied.

% wget http://http.debian.net/debian/dists/sid/InRelease
--2013-05-13 00:29:01--  http://http.debian.net/debian/dists/sid/InRelease
Resolving http.debian.net (http.debian.net)... 46.4.205.43, 2a01:4f8:131:152b::42
Connecting to http.debian.net (http.debian.net)|46.4.205.43|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: http://debian.utalca.cl/debian/dists/sid/InRelease [following]
--2013-05-13 00:29:01--  http://debian.utalca.cl/debian/dists/sid/InRelease
Resolving debian.utalca.cl (debian.utalca.cl)... 190.110.100.3
Connecting to debian.utalca.cl (debian.utalca.cl)|190.110.100.3|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 268429 (262K) [text/plain]
Saving to: ‘InRelease’

100%[======================================>] 268,429      748KB/s   in     0.4s   

2013-05-13 00:29:01 (748 KB/s) - ‘InRelease’ saved [268429/268429]

iquartile() is unneccessarily complicated

Hi,

iquartile() does a lot of pushing, popping, reimplementing ceil() and whatnot. It can be written much simpler as:

sub iquartile(@) {
my @elems = @_;
my ($lower, $upper) = (0.25 * $#elems, 0.75 * $#elems);

$lower = POSIX::ceil($lower);
$upper = POSIX::ceil($upper);

return @elems[$lower..$upper];

}

Correctly check instances that are run on top of real mirrors

If a mirror runs a redirector on top of its real copy, the checker might be redirected to a different mirror when the trace file is requested. This will result in the mirror being disabled.
If the checker is aware of mirrors that run the redirector, it could download the file from serve/ instead and correctly monitor the mirror.

This most likely requires a new field in the master list.

Distance calculation is wrong

Hi,

The distance calculation assumes the Earth is a flat rectangle that doesn't wrap, which is obviously not the case. For small distances, the “flat rectangle” assumption is not so bad, but not properly wrapping, be it at the zero-meridian of between -180/+180 (I don't know offhand which one is used), can give completely bogus results.

You probably want http://en.wikipedia.org/wiki/Haversine_formula instead.

translate-log may fail or produce incorrect output when updating to a new database

Given that the checker may use an '''incoming'''' database, the translate-log may produce incorrect output or even fail due to the mismatch of mirror ids.
Since the only input to the translate-log script is the checker's output, the latter should probably tell the former the name of the mirrors database it is actually reading from.

Restricted-to mirrors are not checked for freshness

Commits 1e33f17 and 306b6d3 (part of issue #3) introduced support for the Restricted-to field in the master list. However, due to check.pl's per-continent age check and the way the restriction was implemented (by simply not adding the mirrors to some indexes), they are not checked for freshness.

This could lead to inconsistencies that would only be noticed in the scope of the mirror's restriction (AS, country, etc.)

should limit the number of alternative mirrors

Even after creating a geo location-based subset of the population of mirrors that may serve a file, the number of alternatives may go from one to over 15 based on some real data-backed tests.

Whenever APT starts handling external redirections better it will cause issues. A great diversity of mirrors would only beneficial if there's at least one file that needs to be downloaded from each, ideally more than just one.

The redirector should limit the candidates to about 5 and see how it works.

Bypass a mirror's redirector, if they run one

Similarly to issue #7, if one redirector redirects the request to a mirror that has an instance of the redirector, the mirror's instance may redirect the user away one more time. This could lead to redirection loops.
If the redirector is aware of a mirror running an instance, it could bypass the mirror's instance by redirecting the request to serve/.

This change could re-use the new field in the master list mentioned in issue #7.

Allow narrowing down the per-AS match by subnet(s)

Some geographically disperse AS' have more than one mirror, and it would be convenient to redirect requests only to the local mirror. In general the use of geo location should address this kind of issues, but it is not uncommon for the free database to lack accuracy.

Trace files should be stored and I-M-S headers used

To reduce network traffic (and possibly skip some checks), the trace files downloaded from the mirrors should be stored. Whenever there's a local copy of a trace file, the GET request should include an If-Modified-Since.
LWP::UserAgent has a mirror method that might do the trick.

Include depth parameter in RFC6249-like Link headers

When requests for /dists/$dist/$comp/binary-$arch/Packages.gz, given that mirrors only differ on the set of architectures they include, it should be possible to include the depth parameter, as specified in RFC6249.

E.g. a request for /dists/sid/main/binary-armel/Packages.gz would include a
Link: <http://mirror.tld/path/to/dists/sid/main/binary-armel/Packages.gz>; rel=duplicate; depth=1

Another example, a request for /dists/sid/main/binary-armel/Packages.diff/Index:
Link: <http://mirror.tld/path/to/dists/sid/main/binary-armel/Packages.diff/Index>; rel=duplicate; depth=2

Similarly, this could be done for Contents-$arch.diff/, Translation-$lang.diff/ (or directly to i18n/, but would reduce parallelism for clients that do not actually use the alternative download locations), /tools/, and /project/.

"site trace > master trace -> synchronised" assumption is not safe

If a mirror attempts to sync more than one time between mirror pushes, the site's trace would be even more recent than it previously was. This doesn't affect the code's assumption that if site > master it is fully synchronised.
Unless, the following sequence occurs:

  • Downstream mirror is up to date
  • Upstream mirror is up to date
  • Master sends a new push
  • Downstream attempts to sync and there's nothing new to sync. It still updates its trace file (date newer than master).
  • Upstream syncs
  • Downstream starts syncing. At the end of phase one, the old site's trace would still be there and would still be newer than the master's, causing havoc.

Correct geolocation of tunnelbroker users

The whole netblock used by tunnelbroker is currently detected as located in the US.

As for what can be done, the whois records should be more or less correct at least regarding country information.

non-ftpsync mirrors shouldn't be second-class citizens if they meet the criteria

There are currently some requests that are forced to be served by mirrors that use ftpsync. This is to guarantee that there will not be inconsistencies caused by some index files being synchronised too early.
Since there are quite some mirrors that don't use ftpsync, and some of them do keep up with the recommended rsync settings and other changes introduced in ftpsync, it would be ideal not to treat them as second-class citizens.

Continuously performing all sorts of checks to determine if they don't follow the recommendations is doomed to fail. Perhaps a new field could be introduced in the trace file that states which "features"/changes they have been updated to.

For instance, mirrors that correctly sync the InRelease file in the second stage could include:
Revision: InRelease

The absence of the field would indicate that such mirror should not be used to serve InRelease files.

Similarly, for the translation files issues:
Revision: i18n

Whether or not this field should be included in ftpsync-generated trace files should be considered. For consistency, it probably should.

A master stamp may be promoted for a subset too soon

This may occur, for example, when multiple mirrors that only include a few architectures and are updated from a mirror in another subset finish updating before other mirrors that include more architectures are also done updating. In this scenario, the subset may end up having no mirror that can satisfy requests for some architectures.

The rules for promoting a master stamp should take into account that at least files for the most popular architectures can be served from within the subset.

Better handling of clients for which the geo lookup failed

There are very few times where a geo lookup fails because there's no entry for a given IP subnet. In such cases, the code ends with a 501 Not Implemented. It should probably send the client to some well-connected mirror in the US or the EU, making it configurable of course.

Allow an instance to push its db results to another, remote, instance

The current implementation for http.debian.net sends a db dump over ssh to a script on the remote server that validates it to avoid perl code and then imports it.

The validation could probably be omitted if the database was created using nstore, potentially making it compatible with the remote instance. Testing is needed to guarantee compatibility between versions of Storable. If a "network order" database were to be used, performance testing would also be needed.

Rate-limit the checking of bad mirrors

Many mirrors don't have a site trace, or at least it doesn't match their hostname. After a certain number of failures, the checking of those mirrors should be rate-limited: it is a waste of time to check them every X minutes when they are usually not going to be fixed.

Ditto for mirrors that go down, are very out of date, and the like.

the checker should probably be a daemon

By making it a daemon that constantly re-calculates the database even on partial mirror-rechecks, it should be possible to reduce the time it takes for some changes to take effect. It would also avoid the requirement of running a cronjob.

If this is done, the mirrors should probably be prioritised by: "reference mirrors" and later by continent (or sub-regions).

Should be possible to periodically re-run non-default checks

It would, for instance, be useful do detect dropped architectures after the database was built. Ideally mirrors should list the architectures they include, but...

At the moment trying to re-run non-default checks is tricky, as one needs to play the "write to the 'incoming' database" dance and that is not fun to do in a cronjob.

Ensure that IPv6 assumptions hold true

There's currently no code to confirm that the assumptions regarding IPv6 support actually hold true.
Some checks should be added, at least to know if and when a separate IPv6 database is necessary.

serial/date-based subsets

Many mirrors remain disabled because by the time they are up to date wrt their upstream, their subset is out of date already.
If instead of creating per-continent subsets (which sort of works for NA and EU) the subsets were based on the date they were last updated.
I.e. a client from AS A should be redirected to the mirror in its own A as long as it is not very out of date and even if their country's mirror is more up to date.
"Out of date" would still mean anything older than twelve hours (two archive pulses).

The trace-architectures check should be enabled by default

At present it is not enabled by default because it could potentially enable an architecture that was disabled by the architectures check (the one based on additional HTTP requests).
The traces check should be modified, probably in a similar way to how the archs check, to not re-enable an arch it didn't disable itself.

Cache-friendly redirections

I've read in some places that most HTTP caches don't cache redirections unless they include an Expires header. All this should be investigated and tested, keeping in mind that traditional web caching is problematic with APT's repository design.

Demo mode should list all candidate mirrors

The demo page only lists the subset of the population of candidate mirrors. However, it appears that some people would like to know about all the other candidates and as such should be displayed. Perhaps the whole demo mode should stop using HEAD and use some alternative way to request and return the desired information.

Error code 501 is not always appropriate

501 Not Implemented was chosen as the geo lookup failures are mostly caused by trying to lookup an IP address in a private range. However, when the request doesn't come from a private address, it should send a more appropriate error code.
Related to issue #18

should be possible to blackhole certain requests

There are some files in dists/ that are known not to exist in certain releases. For example, there are no InRelease files in squeeze.
It should therefore be possible to "blackhole" (i.e. throw a 404) certain requests to avoid a useless request to a mirror.

To err on the side of safety, it should be T(mirror_type,codename,fiile_pattern)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.