internetarchive / wayback Goto Github PK
View Code? Open in Web Editor NEWThis project forked from iipc/openwayback
IA's public Wayback Machine (moved from SourceForge)
This project forked from iipc/openwayback
IA's public Wayback Machine (moved from SourceForge)
When page=X
is added to the request, some rows disappears.
For example:
http://web.archive.org/cdx/search/cdx?url=www.expertsender.ru&matchType=domain
document.body.innerText.split("\n").length = 6486
http://web.archive.org/cdx/search/cdx?url=www.expertsender.ru&matchType=domain&page=0
document.body.innerText.split("\n").length = 6246
http://web.archive.org/cdx/search/cdx?url=www.expertsender.ru&matchType=domain&page=1
Returns empty result.
Example of disappeared row:
ru,expertsender,blog)/ispolzovanie-gif-v-emejl-rassylkax-kejs-ot-butik-ru 20160401182916 http://blog.expertsender.ru:80/ispolzovanie-gif-v-emejl-rassylkax-kejs-ot-butik-ru/ text/html 200 K6ZNHY3FGL6X67KDYYW5U7L3WEJRSIM5 12454
Problems found during investigating WWM-163 (replay is blocked despite robots.txt is 403):
original
CDX field.Wayback should differentiate 404 and 403 from other failures and treat them as a success, rather than a failure.
Let's say I want to get a list of all images under a given domain. It so happens that this query would span multiple pages.
If I use the parameter filter=image/jpeg
and the page happens to not have any images on it, the page will appear to be blank instead of filling up results from later pages
Content-Range
header field in a capture needs to be passed through as-is to replay response for audio file playback to work. Found as part of ARI-3774.
AlphaParitionIndexTest
assumes HashMap.values()
returns items in a particular order, and it breaks with OpenJDK 7u51 (more precisely 7u51-2.4.4-0ubuntu0.12.04.2
AMD64).
Navigate to: https://web.archive.org/web/http://bugs.chromium.org/p/project-zero/issues/detail?id=1139
see that wayback says it's blocked by robots.txt:
See that the robots.txt for that domain, while complicated, specifically allows that type of URL:
User-agent: *
# Start by disallowing everything.
Disallow: /
# Some specific things are okay, though.
Allow: /$
Allow: /hosting
Allow: /p/*/adminIntro
# Query strings are hard. We only allow ?id=N, no other parameters.
Allow: /p/*/issues/detail?id=*
Disallow: /p/*/issues/detail?id=*&*
Disallow: /p/*/issues/detail?*&id=*
# 10 second crawl delay for bots that honor it.
Crawl-delay: 10
Expect that complex robot.txt files are parsed and matched correctly by the wayback machine.
Sometimes a website will display a favicon even though one isn't explicitly defined in the page. The site for Iridion II for example.
It would be nice if in the event of no explicitly defined favicon, Wayback Machine would look for one at %DOMAIN%/favicon.ico.
It would probably be more preferable, if not more resource-consuming, to start at the same folder as the current URL and step backwards until it finds something or reaches the domain. However, I've seen only one case where there was a favicon in a place deeper than the root and I'm not even sure if it was ever used.
Playback of certain URL fails with net::ERR_CONTENT_LENGTH_MISMATCH
(Chrome error message). All captures of the URL are warc/revisit
. There is no original captures.
Wayback is supposed to return 404 response instead 200 for such case, but it's playing back content from a revisit record (which has no response payload). Closest capture has WARC-Refers-To-Date
pointing to another revisit capture. AccessPoint.retrievePayloadForIdenticalContentRevisit
blindly believe WARC-Refers-To-Date
always points to a non-revisit record (i.e. the original capture), and subsequent CDX query does not exclude revisit captures.
Wayback repairs URLs like http:/example.com/
to http://example.com/
, but does not repair https:/example.com/
to https://example.com/
. It should.
Source: ARI-4337
Hi,
I want to get all archived pages for domain and all its subdomains. So I'm using the following url:
There are no records for subdomain news.tut.by. But if I try the following url I'll get a lot of records for subdomain news.tut.by:
Thanks
http://web.archive.org/cdx/search/cdx?url=google.com&matchType=domain&output=json
Something like this to the count of all elements (number of archived pages).
XHTML capture results in XML parse error on browser, because head insert is inserted before XML declration <?xml version="1.0" ... ?>
.
CharsetDetector fails to detect correct character encoding when META
tag says charset=UTF-16
but it is in fact in UTF-8
. It is because CharsetDetector puts higher priority on META
tag over charset detected from content. Reimplement CharsetDetector in reference to WHAT-NG recommendation http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#encoding-sniffing-algorithm
Known internally as ARI-3933.
FastArchivalUrlReplayParserEventHandler
gets confused by what looks like end-tag in the script.
Minimized test case is
<html><head><script>/</g;900>a;a<k;</script></head><body></body></html>
JspInsert
is inserted between a;a
and <k;
, because </g;900>
is parsed a end-tag.
Internally known as WWM-118.
We've gone through several iterations trying to come up with good URL rewrite scheme for archival-URL mode. Our conclusion at this point is that we need to maintain the form of original URL before and after rewrite. By form we mean the various absolute/relative-ness of URL. In other words we want to rewrite full URL to full URL (http://www.example.com
to http://web.archive.org/20140101121314/http://www.example.com
), protocol relative to protocol relative (//www.example.com
to //web.archive.org/20140101121314/http://www.example.com
), relative path to relative path (styles/mobile.css
to styles/mobile.css
) etc.
We found this is rather awkward to achieve with existing framework for URI rewriting. ResultURIConverter.makeReplayURI()
method takes only two String parameters datespec
and url
. Thus it doesn't have access to the context in which url
was found. To work around this situation, there are a lot of clumsy code around it, which results in overly complex framework. Here are some observations:
ReplayParseContext
has ResultURIConverter instances for each of context flags (ex cs_
) built through ContextResultURIConverterFactory
, just for including context flags in replay URL. Those instances would be unnecessary if makeReplayURI() took context flags as an argument.ContextResultURIConverterFactory
has two different uses. While its getContextConverter
method has single argument called flags
, implying its context flags, it can also receive replay URL prefix (see AccessPointAdapter.getUriConverter()
). I suppose the ContextResultURIConverterFactory
implementations taking context flags would have been unnecessary if ResultURIConverter.makeReplayURI()
took context flags as an argument.ReplayParseContext.contextualizeUrl(String, String)
checks if URL-rewrite is necessary, and then converts URL to full absolute form before passing it to ResultURIConverter. This makes it impossible for ResultURIConverter implementation to preserve mode of URL
described above. Considering ResultURIConverter's primary role, these steps should be left to ResultURIConverter implementation.http://www.example.com
, relative URL need to be converted to full path. ResultURIConverter needs to know the URL being replayed to achieve this. This is another support case for additional parameters in ResultURIConverter.makeReplayURI()
.EmbeddedCDXServerIndex.addTimegateHeaders()
) prepends mementoPrefix to URI returned by ResultURIConverter
to ensure Memento URLs are always in absolute form. This is necessary because ResultURIConverter is used for two different purposes and breaks if ResultURIConverter returns different forms of URL depending on the context.X-Forwarded-Proto
request header field to URL rewriting so that it can build absolute URL with appropriate protocol (http
or https
). We worked around it by storing the header value in ThreadLocal
.Our JIRA ARI-4033
depends on the resolution of this issue.
Resolution Plan:
AccessPointAdapter
implement ResultURIConverter
. This (along with the change below) should make ContextResultURIConverterFactory
unnecessary.ReplayParseContext
can implement (for better modularity and ease of testing)When currently searching for an archived version of an URL with status code 2xx it can take some time before an archived version is found which was archived while it was still available. Searching for the right version of an archived URL can become a lot easier if it's easy to see which archived versions returned status code 2xx or 3xx.
Currently an archived page is shown in the Wayback Machine as a blue circle on the date it was archived, see for example http://wayback.archive.org/web/20010501000000*/http://archive.org. Multiple colors can be used here to let someone know the status code of a page, for example:
When an URL is archived multiple times on the same day a larger circle is shown. Multiple colors can be added to this larger circle to show with which status codes the page was archived, for example:
This same idea can be used for the black bars showing the number of archived version for the months.
I think implementing colors or some other way of showing what status code an URL returned when it was achived would be very helpful for finding a right version of an URL.
Let me know if this isn't the right repo, but ran into an issue when testing archival features on http://www.goodbyetohalos.com/
Like many webcomics using wordpress nowadays, Goodbye to Halos uses html5 srcset attribute to displays different image sizes to different devices:
<img
width="800" height="1200"
src="http://www.goodbyetohalos.com/wp-content/uploads/2017/01/WEB_ch1_108.jpg"
class="attachment-full size-full" alt=""
srcset="http://www.goodbyetohalos.com/wp-content/uploads/2017/01/WEB_ch1_108.jpg 800w,
http://www.goodbyetohalos.com/wp-content/uploads/2017/01/WEB_ch1_108-480x720.jpg 480w,
http://www.goodbyetohalos.com/wp-content/uploads/2017/01/WEB_ch1_108-96x144.jpg 96w"
sizes="(max-width: 800px) 100vw, 800px"
data-webcomic-parent="837"
>
so far, so good. however, after crawling/scraping these with wayback, only the src url is scraped and rewritten, leading to the image on wayback'ed page still being served from the original server:
<img
width="800" height="1200"
src="/web/20170127042412im_/http://www.goodbyetohalos.com/wp-content/uploads/2017/01/WEB_ch1_108.jpg"
class="attachment-full size-full" alt=""
srcset="http://www.goodbyetohalos.com/wp-content/uploads/2017/01/WEB_ch1_108.jpg 800w,
http://www.goodbyetohalos.com/wp-content/uploads/2017/01/WEB_ch1_108-480x720.jpg 480w,
http://www.goodbyetohalos.com/wp-content/uploads/2017/01/WEB_ch1_108-96x144.jpg 96w"
sizes="(max-width: 800px) 100vw, 800px"
data-webcomic-parent="837"
>
this is very obvious because the original site doesn't use https, so it leads to a broken image on the wayback machine view:
Obviously, the correct behavior here is that all of the images should be scraped (in this case they're just resizings, but in theory they could be completely different images—nothing prevents that) and rewritten.
Thanks! let me know if you need more information, or want me to whip up a more minimal test case
ReplayParseContext
has ad-hoc support for the case where URLs are escaped in the target resource. For example it recognizes URLs written in JavaScript as "http://example.com/..." as absolute URLs. This approach has a few problems:
ReplayParseContext
messyJSStringTransformer
is also used for other types of resources, which can have different way of escaping characters.It would be more robust to implement unescaping at JSStringTransformer
to pass clean URLs to ReplayParseContext
. It can also escape special characters back before inserting rewritten URLs.
Hi,
I was looking at syncing up our forks, and couldn't proceed because webarchive-commons was forked a while ago: 6555609
I've just pulled your changes to webarchive-commons into the IIPC version and rolled a 1.1.3 release including that change (and a number of bugfixes). Would you consider switching back to the IIPC version? It would make keeping our forks in sync much easier.
Thanks,
Andy
I have downloaded the wayback=cdx-server api and imported the api in eclipse but http 404 error is coming. I have checked the deployment assembly and also checked the pom.xml.
If text (HTML, CSS, JavaScript) response is gzip-encoded (has Content-Encoding: gzip
), replay response has weirdly-named header field: X-Archive-Orig-X-Archive-Orig-Encoding: gzip
. It is supposed to be X-Archive-Orig-Encoding: gzip
.
This is because TextReplayRenderer.decodeResource
replaces Content-Encoding
header field with X-Archive-Orig-Encoding
header field while applying gzip-decode, and then later RedirectRewritingHttpHeaderProcessor
prepends X-Archive-Orig-
to it (note this is configurable).
Easy solution would be to avoid prepending head field name if it starts with X-Archive-Orig-
, but this sounds too ad-hoc. X-Archive-Orig-
prefix is currently hard-coded, but we may want to make it configurable.
EmbeddedCDXServerIndex
has timestampDedupLength
property for culling captures to prevent capture search result page from getting too crowded with many captures (we call this feature timestamp-dedup thereafter). This property applies to all capture search queries, whether it is for capture list page or looking up closest capture for replay.
While we want timestamp-dedup for capture list page, we learned it is problematic for capture lookup for replay, because it often break revisit resolution. We want to disable timestamp-dedup if capture search query is for replay.
Internally known as ARI-3883.
RobotRule.blocksPathForUA(String, String)
returns false
for any paths with this robots.txt:
User-agent: *
Disallow:
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-login.php
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /comments
Per robots.txt specification, empty Disallow:
shall be just ignored. RobotRules
returns false
when it hits empty Disallow:
, ignoring the rest of rules.
Found by ARI-4212.
originally reported in ARI-3880.
Failure case has HTML like this:
<html>
<head>
<title>...</title>
<script type="text/javascript" src="scripts/header.js"></script>
<p align="center">
...
FastArchivalUrlReplayParseEventHandler
fails to insert body-insert (jspInsertPath
) because the code block is skipped if inHead
flag is true (set by appearance of HEAD tag). This results in failure to render top-of-the-page banner (typically disclaimer and navigation bar).
Currently Wayback does nothing special with range request and simply render whichever capture matching the URL + timestamp combination. This works as long as capture is either 200 response (browser assumes the server does not support Range request), or 206 response with matching Content-Range
.
Recently found, some HTML5 browser, when playing a video, first probes server by making range request for entire file, then making another request for small range near the end of file. If the server does not return 206 response matching the request, it stops video playback. To support HTML5 video playback, Wayback needs to implement range request handling of its own.
This issue is internally known as ARI-4254.
(This is an issue item for already completed work)
Determine mime type by looking into the payload when either mimetype
in the search result is suspected to have incorrect value (ex. text/html
) or missing (ex. unk
).
Known internally as ARI-3822, ARI-3888, WWM-58. Bug fixes in ARI-4071 and ARI-4078.
Base work is done in commits 65dfc40 through 7d9d332, then bug fixes are being tracked on mimetype-detector branch.
Resource record is always rendered as text/html
, regardless of Content-Type
WARC header field.
This is due to a lack of metadata record support in JWATResource
. It does not return Content-Type
header field from its getHttpHeaders()
method, thus Tomcat supplies default value text/html
.
Known internally as WWM-126.
Blocked captures are often referred to by later revisit captures, and there's a need for making such blocked captures available only for replaying revisit captures.
Internally known as ARI-3879 and ARI-4034.
Archive-It found a issue with Referer header generated by Flash plugin for Firefox (ARI-4169) and wants to extend ServerRelativeArchivalRedirect with supplemental method for obtaining ArchivalUrl context. As the method depends on private JavaScript library it uses, we'd like to keep the enhancement local to Archive-It, for now. Unfortunately ServerRelativeArchivalRedirect does not have extension point to enable this.
Plan is to move the code in ServerRelativeArchivalRedirect that parses Referer into a new method, so that sub-class can override it.
Here for example. Note how the URL has ampersands in it. If you were to click to another point in the timeline, the URL you would go to would have all of the ampersands replaced with &, resulting in seeing a different set of crawls.
Sure, with this page, you would still see something, but in any other case, the user won't be as lucky.
It appears the core issue is that for whatever reason the wbCurrentUrl variable is HTML-encoded. Bizarrely, this does not happen in the "see all crawls" page
This could probably be fixed by changing line 74 at this file to var wbCurrentUrl = "<%= StringEscapeUtils.unescapeHtml(searchUrlJS) %>";
Currently percent encoded URLs are not rewritten. For example, the text from https://web.archive.org/web/20150804131701/http://blip.tv/file/get/NostalgiaCritic-NCPlanetOfTheApes401.m4v?showplayer=2014093037100220150422135039&referrer=http://blip.tv&mask=11&skin=flashvars&view=url should be rewritten like:
Original:
message=http%3A%2F%2Fj41.video2.blip.tv%2F5520014255207%2FNostalgiaCritic-NCPlanetOfTheApes401.m4v%3Fir%3D96428%26sr%3D2334
message=http%3A%2F%2Fweb.archive.org%2Fweb%2F20150804131701%2Fhttp%3A%2F%2Fj41.video2.blip.tv%2F5520014255207%2FNostalgiaCritic-NCPlanetOfTheApes401.m4v%3Fir%3D96428%26sr%3D2334
Timestamp-collapsing returns the first best capture in each group. Another option is to return the last best capture in each group. There are the cases where the latter works better than the former.
Internally known as ARI-3994
.
Revisit record handling code makes a bad assumption that revisit records are always an instance of WarcResource
. There's a alternative implementation JWATResource
, and revisit replay throws ClassCastException with it.
Internally known as WWM-101.
From WWM-110.
Some UA does more URL-encoding than strictly necessary. Notably, *
is sometimes passed to Wayback as %-encoded %2A
. Currently it results in 404 error. There seems to be nothing against URL-decoding date component of Archival-URL before parsing, so that 2010%2A
is recognized as 2010*
.
At https://github.com/internetarchive/wayback/blob/master/wayback-cdx-server/README.md#advanced-usage:
"Closest Timestamp Match" link fails. No such section exists.
"Resumption Key" link fails. Such a section exists, but with a different anchor.
"Resolve Revisits" link fails. No such section exists.
Internally known as ARI-4024.
Wayback has two distinct interfaces for rewriting text resources: StringTransformer
and RewriteRule
. It'll be useful if we can somehow unify these. At least there's a need for using MultipleRegexReplaceStringTransformer
as RewriteRule
. First step is to have MultiRegexReplaceStringTransformer
implement RewriteRule
interface.
Changes are on unify-rewrite branch, and ready to merge into openwayback.
The filter below is used some rows are not presented in the result list:
http://web.archive.org/cdx/search/cdx?url=http://www.expertsender.ru/bundles/core&matchType=prefix
Example row
ru,expertsender)/bundles/core?v=9nx-rocbnddl6mfbsncc8jgbjid4p8wyv00b9yjdxm81 20161015143737 http://www.expertsender.ru/bundles/core?v=9nX-roCbNddL6MFBsnCc8JGbjiD4p8wYv00b9YJdXm81 text/css 200 JROZKHBMIZC6TXGOLNJGIPRE73Q23WTD 28877
To find this row you have to specify only full URL:
http://web.archive.org/cdx/search/cdx?url=http://www.expertsender.ru/bundles/core?v=9nX-roCbNddL6MFBsnCc8JGbjiD4p8wYv00b9YJdXm81&matchType=prefix
EmbeddedCDXServer
has a feature that turns off robots exclusion check for embeds, but it is not working at all because PrivTokenAuthChecker.isAllUrlAccessAllowed() method turns robots exclusion flag on again.
I consider it a bad practice for getter method to have this kind of side-effect.
Hey ,
As I can get all the text/html pages that have stayed?
Thanks
UIResults.makeCaptureQueryUrl() generates very long URL with a pile of unnecessary query parameters. This significantly increases the size of URL query result page.
Embed-mode replay first searches for captures with timestampSearchKey
flag turned on for faster lookup. If the URL has long revisit history and thus replay cannot resolve revisit within the constrained time range for timestampSearchKey
, it reruns capture query with timestampSearchKey
flag turned off. It is supposed to re-initialize captureSelector
at that point, but it doesn't. So the replay code goes on to the next capture and returns a redirect response, and repeat.
Currently collection-sensitive exclusion filter provided by CompositeAccessPoint is inflexible.
CustomPolicyOracleFilter
is hard-coded, and it can only be combined with ExclusionFilterFactory
s configured in CompositeAccessPoint
staticExclusions
property.urlkey
, prohibiting time-ranged exclusion rules.CDXServer
cannot pass oraclePolicy
(used for delivering custom rewrite rules) from ExclusionFilter
to capture search resultAs a result,
EmbeddedCDXServerInde
to inject ExclusionFilterFactory
from AccessPointAdapter
exclusionFactory
into CDXToCaptureSearchResultWriter
exclusionFilter
- that is, Oracle exclusion filter runs at the final step of CDX processing pipeline. This turned out to be problematic, since exclusion happens after timestamp-deduplication.Apparently CDXToCaptureSearchResultWriter
exclusionFilter
is necessary solely to support use of Oracle exclusion filter with CDXServer
. Having multiple ways of configuring exclusion filter makes the code hard-to-follow, and customization painful.
Add configuration option to PerfWritingHttpServletResponse
for writing perfStats response header field in JSON format. JSON is easier for monitoring tools to parse.
In making use of the Wayback CDX server API (documented here), I noticed that when using resumeKeys I get odd behavior when leaving the urlkey
field out from the fieldOrder
. Specifically, it looks like the CDX server jumps directly to the 2013 era, even though there are valid records before that:
$ wget -q -U '' -O - 'https://web.archive.org/cdx/search/cdx?collapse=timestamp%3A8&url=https%3A%2F%2Farchive.org&limit=5&fl=timestamp%2Cstatuscode&showResumeKey=true'
19970126045828 200
19971011050034 200
19971211122953 200
19980109140106 200
19980113025731 200
-+19980113025732
$ wget -q -U '' -O - 'https://web.archive.org/cdx/search/cdx?collapse=timestamp%3A8&url=https%3A%2F%2Farchive.org&limit=5&fl=timestamp%2Cstatuscode&showResumeKey=true&resumeKey=-+19980113025732'
20131019030216 502
20130818180757 502
20130402123654 502
20130902085637 502
20130903032956 502
Everything seems to work fine if I include the urlkey field:
$ wget -q -U '' -O - 'https://web.archive.org/cdx/search/cdx?collapse=timestamp%3A8&url=https%3A%2F%2Farchive.org&limit=5&fl=urlkey,timestamp%2Cstatuscode&showResumeKey=true'
org,archive)/ 19970126045828 200
org,archive)/ 19971011050034 200
org,archive)/ 19971211122953 200
org,archive)/ 19980109140106 200
org,archive)/ 19980113025731 200
org%2Carchive%29%2F+19980113025732
$ wget -q -U '' -O - 'https://web.archive.org/cdx/search/cdx?collapse=timestamp%3A8&url=https%3A%2F%2Farchive.org&limit=5&fl=urlkey,timestamp%2Cstatuscode&showResumeKey=true&resumeKey=org%2Carchive%29%2F+19980113025732'
org,archive)/ 19980129163431 200
org,archive)/ 19980501124530 200
org,archive)/ 19990116225149 200
org,archive)/ 19990117003935 200
org,archive)/ 19990202042615 200
org%2Carchive%29%2F+19990202042616
Perhaps there's an undocumented dependency on passing the urlkey field?
Thanks
For example:
http://example.com/*
.http://example.com/index.php?p=90575209&
at 20031224194819
.Issue ARI-4272 reports Wayback replay fails as ResourceNotInArchive
for URL ending with &*
even though there are multiple captures of it.
FlexResourceStore throws NullPointerException if any of configured PathIndex file is missing:
WARNING: Runtime Error
org.archive.wayback.exception.ResourceNotAvailableException: File not Found: aaa.warc.gz
at org.archive.wayback.resourcestore.FlexResourceStore.retrieveResource(FlexResourceStore.java:266)
I'm listening to the Dead on archive.org, and it would sher be swell to have gapless playback, to reduce the buzzkill when listening to Dead concerts (I'm currently working through 1989 :).
This is almost certainly the wrong repo for this ticket. Can you point me to the right repo?
I looked at the banner and "Help," "Jobs," and "Volunteer" all sound the same to me, and none of them answered the "which repo" question for me.
The closest was https://developers.archive.org/get-started/. That seems to be aimed more at developers using JSON APIs than developers interested in helping with the software itself.
I also browsed around https://github.com/internetarchive ... and even ended up on https://github.com/iipc (:scream_cat: good lord what is this!?). I found a repo for Wayback, but I think that's different than what I want, right? Is there a repo for archive.org?
Thanks! :-)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.