Giter Club home page Giter Club logo

domain-list-community's Introduction

Domain list community

This project manages a list of domains, to be used as geosites for routing purpose in Project V.

Purpose of this project

This project is not opinionated. In other words, it does NOT endorse, claim or imply that a domain should be blocked or proxied. It can be used to generate routing rules on demand.

Download links

Usage example

Each file in the data directory can be used as a rule in this format: geosite:filename.

"routing": {
  "domainStrategy": "IPIfNonMatch",
  "rules": [
    {
      "type": "field",
      "outboundTag": "Reject",
      "domain": [
        "geosite:category-ads-all",
        "geosite:category-porn"
      ]
    },
    {
      "type": "field",
      "outboundTag": "Direct",
      "domain": [
        "domain:icloud.com",
        "domain:icloud-content.com",
        "domain:cdn-apple.com",
        "geosite:cn",
        "geosite:private"
      ]
    },
    {
      "type": "field",
      "outboundTag": "Proxy-1",
      "domain": [
        "geosite:category-anticensorship",
        "geosite:category-media",
        "geosite:category-vpnservices"
      ]
    },
    {
      "type": "field",
      "outboundTag": "Proxy-2",
      "domain": [
        "geosite:category-dev"
      ]
    },
    {
      "type": "field",
      "outboundTag": "Proxy-3",
      "domain": [
        "geosite:geolocation-!cn"
      ]
    }
  ]
}

Generate dlc.dat manually

  • Install golang and git
  • Clone project code: git clone https://github.com/v2fly/domain-list-community.git
  • Navigate to project root directory: cd domain-list-community
  • Install project dependencies: go mod download
  • Generate dlc.dat (without datapath option means to use domain lists in data directory of current working directory):
    • go run ./
    • go run ./ --datapath=/path/to/your/custom/data/directory

Run go run ./ --help for more usage information.

Structure of data

All data are under data directory. Each file in the directory represents a sub-list of domains, named by the file name. File content is in the following format.

# comments
include:another-file
domain:google.com @attr1 @attr2
keyword:google
regexp:www\.google\.com$
full:www.google.com

Syntax:

The following types of rules are NOT fully compatible with the ones that defined by user in V2Ray config file. Do Not copy and paste directly.

  • Comment begins with #. It may begin anywhere in the file. The content in the line after # is treated as comment and ignored in production.
  • Inclusion begins with include:, followed by the file name of an existing file in the same directory.
  • Subdomain begins with domain:, followed by a valid domain name. The prefix domain: may be omitted.
  • Keyword begins with keyword:, followed by a string.
  • Regular expression begins with regexp:, followed by a valid regular expression (per Golang's standard).
  • Full domain begins with full:, followed by a complete and valid domain name.
  • Domains (including domain, keyword, regexp and full) may have one or more attributes. Each attribute begins with @ and followed by the name of the attribute.

How it works

The entire data directory will be built into an external geosite file for Project V. Each file in the directory represents a section in the generated file.

To generate a section:

  1. Remove all the comments in the file.
  2. Replace include: lines with the actual content of the file.
  3. Omit all empty lines.
  4. Generate each domain: line into a sub-domain routing rule.
  5. Generate each keyword: line into a plain domain routing rule.
  6. Generate each regexp: line into a regex domain routing rule.
  7. Generate each full: line into a full domain routing rule.

How to organize domains

File name

Theoretically any string can be used as the name, as long as it is a valid file name. In practice, we prefer names for determinic group of domains, such as the owner (usually a company name) of the domains, e.g., "google", "netflix". Names with unclear scope are generally unrecommended, such as "evil", or "local".

Attributes

Attribute is useful for sub-group of domains, especially for filtering purpose. For example, the list of google domains may contains its main domains, as well as domains that serve ads. The ads domains may be marked by attribute @ads, and can be used as geosite:google@ads in V2Ray routing.

Contribution guideline

  • Fork this repo, make modifications to your own repo, file a PR.
  • Please begin with small size PRs, say modification in a single file.
  • A PR must be reviewed and approved by another member.
  • A script will verify your pull request to test whether your PR is correct or not every time you update the PR. Only the PR which passes the test will be merged. Please go to the Action label to get detailed information if you didn't pass it. We also provide the file which has been generated to make you test.
  • After a few successful PRs, you may apply for manager access to this repository.

domain-list-community's People

Contributors

bijoni avatar chunyunghsu avatar darienraymond avatar database64128 avatar dependabot[bot] avatar ducksoft avatar epliar avatar fduzs avatar funray avatar icecodenew avatar kevinzonda avatar kslr avatar kukisa avatar loyalsoldier avatar m2kar avatar marcushsu avatar molink36 avatar muenyu avatar opportunityliu avatar peeweep avatar rhjdvsgsgks avatar robot-daneelolivaw avatar rootmelo92118 avatar tooadstool avatar toutyrater avatar victoriaraymond avatar wevsty avatar xjasonlyu avatar zhullyb avatar zzqqbs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

domain-list-community's Issues

How do I restrict file uploads?

I apologize for the stupid question, but I used a search and found nothing.
How do I restrict file uploads? for example zip

update.googleapis.com wrongly tagged with 'cn' attribute

Describe the bug

Environment

Shadowsocks client version: 4.4.0.0, prefer direct mode.
OS version: Windows 10 20H2
.NET version:

Steps you have tried

Check update in Google Chrome.

What did you expect to see?

Successfully checking and updating Google Chrome

What did you see instead?

Checking failed.
An error occurred while checking for updates: Unable to connect to the Internet. If you use a firewall, please whitelist GoogleUpdate.exe
Error code 0x80072F7D

Gone through the traffic logs and found out a possible cause:
Domain 'update.googleapis.com' is with 'cn' attribute in DLC data. While in fact 'update.googleapis.com' is not reachable directly.

shadowsocks/shadowsocks-windows#3065

建議拆分 `amazon`

由於 amazon 事業龐大,現有的 geosite:amazongeosite:amazon-ads 不足以滿足更細緻的分流需求。

建議比照對 facebook 和 google 域名的做法,將 amazon 域名細分為 geosite:awsgeosite:kindlegeosite:primevideo

fonts.googleapis.com has 'cn' attribute

While it should not. For example, when accessing

https://fonts.googleapis.com/css?family=Libre+Franklin%3A300%2C300i%2C400%2C400i%2C600%2C600i%2C800%2C800i&subset=latin%2Clatin-ext&display=fallback

The browser gives connection timeout error.

Remove `@cn` from `gstatic.com`

The CDNs of gstatic.com in Mainland China don't serve the same content as the international ones. This causes issues when using some of Google's services.

Steps to replicate

  1. Add a rule to use direct connection for geosite:geolocation-!cn@cn.
  2. Use your phone's browser to google something that has an AMP preview.
  3. Tap on the link to the preview. The page shows a 404 error.
  4. Add another rule before the one added in step 1 to use proxy for domain:gstatic.com.
  5. Tap on the same link. The AMP preview now loads.

Related issues

A user of shadowsocks-windows has also found it to be causing issues with Google's translation service: shadowsocks/shadowsocks-windows#3028.

The Chinese CDNs of gstatic.com is basically enforcing censorship. They don't serve the same content, and therefore can't be treated as Chinese access points of gstatic.com. The cn tag on these domain names should be removed.

geosite:cn 里面竟然不包含 bfsu.edu.cn.

因为我使用 Arch linux, 使用北京外国语大学来作为源。

当我使用如下配置时,经常会因为 bfsu.edu.cn 走了代理的缘故,遇到错误。

{
                "type": "field",
                "domain": [
                    // 这里添加域名白名单.
                    "geosite:cn",
                    "dl.google.com",
                    "sourceforge.net"
                ],
                "outboundTag": "direct"
            },
 ╰─ $ 1  pacman -Syu
:: Synchronizing package databases...
 core.db failed to download
error: failed retrieving file 'core.db' from mirrors.bfsu.edu.cn : Resolving timed out after 10000 milliseconds
error: failed to synchronize all databases (download library error)

然后,我如果改为下面的配置。(手动添加 bfsu.edu.cn 到白名单),一切工作正常。

{
                "type": "field",
                "domain": [
                    // 这里添加域名白名单.
                    "geosite:cn",
                    "dl.google.com",
                    "sourceforge.net",
                    "bfsu.edu.cn"
                ],
                "outboundTag": "direct"
            },

我的问题是,难道不是应该自动包含所有 cn 结尾的域名的吗?

建议移除elsevier ieee等付费数据库

ieeexplore.ieee.org engineeringvillage.com 等数据库本身能正常访问,并且访问权限是跟学校机构付费的ip绑定的,过梯子后网站打开速度可能更快但内容本身就不可见了,没有意义

gstatic.com被墙

不代理gstatic.com的情况下, 能打开Google drive登录界面,但是点击登录没反应,查看控制台发现gstatic.com 提示 Unable to establish SSL connection

some question about PR

I saw some files in /data that without any tag except domains

  1. if i want to add some domain to here, should i do just like above? (just leave some domains in a file)
  2. if so, then you will classify them, right?
  3. i'd like to work for this(domain collection), if the way above was wrong, show me the right way please!

thanks for every developer of this program, it's an excellent program.

downloaddispatch.itunes.apple.com域名含有**cdn地址,但是没归为apple@cn

➜  ~ doggo downloaddispatch.itunes.apple.com @udp://119.29.29.29
NAME                                             	TYPE 	CLASS	TTL 	ADDRESS                                          	NAMESERVER      
downloaddispatch.itunes.apple.com.               	CNAME	IN   	300s	downloaddispatch-cdn.itunes-apple.com.akadns.net.	119.29.29.29:53	
downloaddispatch-cdn.itunes-apple.com.akadns.net.	CNAME	IN   	60s 	ioshost.qtlcdn.com.                              	119.29.29.29:53	
ioshost.qtlcdn.com.                              	A    	IN   	60s 	61.132.231.223                                   	119.29.29.29:53	
ioshost.qtlcdn.com.                              	A    	IN   	60s 	58.220.72.166                                    	119.29.29.29:53	

Domains to be added to current lists

I believe that the following domains can be added to the corresponding lists:

domain-list-community/data/rarbg:

rarbgway.org
rarbgaccessed.org
domain-list-community/data/signal:

whispersystems.org
domain-list-community/data/bbc:

bbcpersian.com

googleapis.com 似乎被分到 geosite:cn 下

我的 v2ray config.json 里有下面的规则用来直连国内网站:

        {
          "outboundTag": "direct",
          "type": "field",
          "domain": [
            "geosite:cn",
            "geosite:speedtest"
          ]
        },

但是在访问 YouTube 时会一直加载, 卡在白屏界面, 调试之后发现是 css 加载不出来, url 是

https://fonts.googleapis.com/css?family=Roboto:500,300,700,400 . 我测试之后发现 googleapis.comgeosite:cn 存在的时候会试图直连, 去掉之后就正常了

复现

curl 'https://fonts.googleapis.com'
2021/03/20 14:50:56 [Info] [548106236] v2ray.com/core/proxy/http: request to Method [CONNECT] Host [fonts.googleapis.com:443] with URL [//fonts.googleapis.com:443]
2021/03/20 14:50:56 127.0.0.1:51693 accepted //fonts.googleapis.com:443 [direct]
2021/03/20 14:50:56 [Info] [548106236] v2ray.com/core/app/dispatcher: taking detour [direct] for [tcp:fonts.googleapis.com:443]
2021/03/20 14:50:56 [Info] [548106236] v2ray.com/core/proxy/freedom: opening connection to tcp:fonts.googleapis.com:443
2021/03/20 14:50:56 [Info] [548106236] v2ray.com/core/transport/internet/tcp: dialing TCP to tcp:fonts.googleapis.com:443
2021/03/20 14:51:12 [Info] [548106236] v2ray.com/core/transport/internet/tcp: dialing TCP to tcp:fonts.googleapis.com:443
2021/03/20 14:51:28 [Info] [548106236] v2ray.com/core/transport/internet/tcp: dialing TCP to tcp:fonts.googleapis.com:443

合并如MVPS host等清单中列出的域名。

诸如MVPS host, Dan Pollock's hosts的项目,给出了非常详尽、丰富的广告/追踪/spam域名列表,建议合并到本项目。

因为这些域名列表包含了广告/追踪/spam域名,这些列表似乎不适合直接归入现有的分类。

我建议新开一个general-ads-tracking-spam分类或为每一个列表单独开一个同名的分类。

如果没有异议的话,我就提PR了.

移除 gfwlist 生成

之前我以为 ss-windows 需要,但已经切换到了 geosite,所以我认为可以移除掉

cn.bing.com怎么不在列表里了

好像以前都有啊

反正今天发现是不走direct了
关键就是我看data里面那个bing文件最近就没修改啊 而我最近更新了新版本之后发现不对劲了

发现是在这里的时候那个记录就不见了
983b604
f59afd9#diff-899344237445ec1c576b5d6ec26dc46d51c664fb986306a4efdeb29bd49915fa

666结尾的地方就消失了 但是diff没显示
https://github.com/v2fly/domain-list-community/blob/983b60497a3e32d945c203ff52c2a0496043bc51/data/geolocation-cn
https://github.com/v2fly/domain-list-community/blob/f59afd97019d5aa9ec24ffb732908c2e4c5f0666/data/geolocation-cn

geoip:ir didnt work

hi there I tried to bypass iran ips with advance route setting.
but it only works if i put sites in domain cell not ip cell
only axbabak.ir will bypass

ipv6访问谷歌学术

我在ipv6分流中添加了谷歌学术的域名,为什么还是无法访问谷歌学术
屏幕截图 2021-07-04 084753
屏幕截图 2021-07-04 084730

technews.tw域名分流到国外域名

technews.tw使用国内dns返回国内腾讯云地址,提示未备案,但是使用google的dns解析可以正常访问,希望添加到国外域名列表

申请对该 repo 的写权限

如题,经历了数次成功的 commit,兼读过了 Issue #28 的讨论后,我有了些信心参与到该项目的管理中。

祝好。

关于 @attr 的语法扩展

我在 issues #28 中提出了基于多种 attr 共存的想法。
现在过了一段时间,我认为从讨论结果上看,社区整体对多种 attr 共存的方案接受程度比较高,我们基本上可以达成共识。

同时为了更好的实践多重属性的方案,include语法势必要进行扩展。
如同 issues #256 中提到的语法扩展方案。
只不过我个人认为 include:filename@attribute 这样的语法还是略有不足。
我个人认为改为 include:filename @attribute1 @attribute2 这样的形式会更加灵活,也更简单。

那么为了实践多重属性的方案,我在这里想重新归纳整理一下对于 @attr 语法的改变,同时我会提出 PR 来实践。

  1. 支持为一个域名设置多种属性 (目前 master 分支已经支持)
    例如:
domain:google.com @attr1 @attr2
  1. include的语法扩展。
    语法:
# 包含 filename 中的所有条目(与目前的语法一致)
include:filename

# 代表包含 filename 中含有 @attribute1 和 @attribute2 这两种属性的条目。
include:filename @attribute1 @attribute2

# 代表包含 filename 中不含 attribute1 属性的条目和包含 attribute2 属性的条目
include:filename @!attribute1 @attribute2

这样做的好处是我们可以针对为不同国家同时提供服务的企业实现最合适的访问路径,并且更易于后续的管理。
举个例子:

# filename hsbc
hsbc.com
hsbc.com.hk @hk
hsbc.com.cn @cn
hsbc.co.uk @uk

# geolocation-cn file
include:hsbc @cn

# geolocation-!cn file
include:hsbc @!cn

我个人觉得这样的写法更符合人类认知,也更好操作。

  1. 规范化属性标签

关于属性标签的问题,我这里参考 Loyalsoldier 的建议,同时也是作为一个规范。
我们可以把属性规范为2类。

3.1. 传统的地域属性
按照企业或网站所在地区来进行分类。

比如:
**企业为**用户提供服务的使用@cn
**企业的美国分支机构,只为美国提供服务的可使用@us
美国企业为美国用户提供服务的使用@us
美国企业专为**用户提供服务的使用@cn

接受所有国家代码(2字母小写)作为属性,同时为了方便以及兼容性也接受不写该属性。
这样无论任何域名都可以按照此方法来分类,并且现有的数据无需更改可以继续使用。

如果按照规范实施的话,最后的成果可能会变成这样。

amazon.com
163.com @cn
github.com @us
samsung.com @kr
rakuten.co.jp @jp

注1:此外对于域名条目中@!cn这样的语法为了兼容性予以保留,等待后续修改。
注2:除国家代码以外的所有2字母组合均定义为保留字。

3.2. 增加如下使用属性

# anycast 代表具有多国接入点
@anycast
# gfw 代表被 GFW 污染
@gfw
# ads 代表该域名被用于展示广告
@ads

按照这几条来实施的话,我个人认为对于目前的版本来说是易于实施的,并且不会导致不兼容的情况
以上是我对 @attr 语法的一些想法,欢迎大家来讨论。

添加部分科研数据库

www.osapublishing.org
airitilibrary.cn
www.umajor.org
cssci.nju.edu.cn
kml.kuke.com
sciencechina.cn
www.htcases.com
www.cnbksy.com
www.libvideo.com
www.angle.com.tw

bwh88.net 不应放入 proxy list

这个是 BandwagonHost 的镜像站,已经推出了很长时间,目前还是可直接访问的。
对于这类网站,如果可以的话,应尽量不要使用 proxy,否则会增加在付款时触发欺诈风控的风险。

新增 services.googleapis.cn 进入黑名单,让其走代理。

华为手机,在配置完成 Google 全家桶,并登录帐号之后,进入 Play,可以找到 app, 但是无法安装任何 app, 总是提示: ”正在等待", 一直没有响应。但是同时,Youtube 可以正常观看视频,解决这个问题的方案是,增加 services.googleapis.cn 到 V2ray 透明代理的黑名单,让它走翻墙,修复!

不知道本项目是否有类似功能? 我的建议是,不可以所有的 .cn 域名一刀切,全部走直连,谢谢。

重构构建流程并添加新功能

随着本项目的成长和影响力的扩大,在过去一年左右的时间里,本项目出现了几个问题:

  • @cn 属性的存在,导致 geolocation-!cn 类别里出现了很多“大陆域名”(隶属于非大陆企业,但在大陆有接入点的域名)
  • 每个列表的域名规则无法去重(如 geolocation-!cn 包含大量顶级域,可以通过树去重,以减少生成文件的体积)

现在此提议,在构建流程中引入多种选项和特性:

  • 自动按优先级查找 data 文件夹的位置(命令行选项)
  • 可自定义生成文件的输出目录(命令行选项)
  • 可自定义用哪个列表来生成 gfwlist.txt 文件(命令行选项。geolocation-cn 或者 cn 即为白名单,geolocation-!cn 即为黑名单)
  • 可自定义去除带有特定属性的规则(命令行选项):生成文件时,去除带有某些特定属性的规则,如:geolocation-!cn 列表去除 @cn 属性的规则;geolocation-cn 列表去除 @!cn 属性的规则(目前并无此规则,但后续可以考虑加入此类域名到 geolocation-cn 列表)
  • 扩展 include 语法:支持 include:filename@attribute(由此,geolocation-cn 可以 include:google@cngeolocation-!cn 可以 include:alibaba@!cn

⚠️ 注意:规则去重功能只处理不带属性的 Domain 类型的规则。

[Discuss] About new category `geolocation-global`(Temp name)

由于之前有用户提出了部分Google服务大陆可直连,DLC中能否区别开来? #487, 于是才有了本次的讨论.

由于主要讨论内容所在的 PR #88 已合并, 于是单独开一个 Issues 进行相关讨论.

以下内容仅为个人提出的意见, 大家可以提出意见进行修改.

  1. 更改geolocation-!cn的定义为“在**大陆外拥有连接点的域名列表”
  2. @cn定义为“在**大陆拥有连接点的境外域名”,用于geolocation-!cn及其包含的子列表,建立geolocation-global收集这类域名
  3. @!cn定义为“在**大陆外拥有连接点的境内域名”,用于geolocation-cn及其包含的子列表,同样收集于geolocation-global(我猜测,需要用到此类域名的用户较少,标记工作可以延后)
  4. 新增include:list@attr这种包含方式


把这几条综合简化一下, 可以将其拆分为 3 个大分类

  1. geolocation-cn: **大陆公司所属的域名列表
  2. geolocation-!cn: 境外公司所属的域名列表
  3. geolocation-global暂命名: 在**大陆与其境外均有可用连接点的的域名列表
    • 主要包含模棱两可的域名, 例如**公司所属但对境外提供主要服务的域名, 或境外公司所属但在**有可用连接点的域名
    • 其中境外公司所属但在**有可用连接点的域名, 标记为 @cn待定
    • 其中**公司所属但对境外提供主要服务的域名, 标记为 @!cn待定

这样就可以同时满足境内境外的用户需求, 用户可以自行选择 geolocation-global[@cn|@!cn] 直连或代理.

现在的可用方式为:

  1. **用户, 仅代理**大陆外的公司所属的域名列表:
CN User -> (Optional)(Direct global) -> CN
           (Optional)(Proxy  global) -> Outside
                     (Proxy     !cn) -> Outside
                     (Direct    all) -> CN
  1. **用户, 代理**大陆内的公司所属的域名列表以外的域名:
CN User -> (Optional)(Direct global) -> CN
           (Optional)(Proxy  global) -> Outside
                     (Direct     cn) -> CN
                     (Proxy     all) -> Outside
  1. 境外用户, 仅代理**大陆内的公司所属的域名列表:
!CN User -> (Optional)(Direct global) -> Outside
            (Optional)(Proxy  global) -> CN         #这个完全没有必要
                      (Proxy      cn) -> CN
                      (Direct    all) -> Outside

^: 由于 v2ray 的 routing rules 优先级的关系.只要使用了 global 的 rules 在 !cn 前面, 即使 global 内的域名被 !cn 包含了也没关系.


执行方式大概如下:

  1. 以上面的3个分类定义整理 geolocaion-cngeolocation-!cn 中包含的域名.
    • 其确定的分类定义可以再慢慢进行讨论
  2. 整理在**大陆与其境外均有可用连接点的域名列表至 <company-name>-global.
    • 可参考 #89
    • 也可以不创建 <company-name>-global 而在其原本 <company-name> 中以 @attr 实现
  3. 创建新的大分类 geolocation-global暂命名, 整合所有 <company-name>-attr.
    • 如使用 @attr 实现则可以 include:<company-name>@attr

以上的方式工程量会比较少, 只需要为相关的条目添加 @attr 即可.


部分列表末尾放置有CDN,其域名不一定属于列表的实体。以上方案并未考虑这种特殊情况,但个人认为可以采取相同手段

如果其 CDN 同时提供境内境外接入点那么按照上面的可以直接添加 @global

标记流程是否能自动化?例如 #54 中提及的域名校验功能

可行, 可以分析境外公司所属域名解析到的 IP 结果, 如果**大陆 DNS 返回的 IP 在 geoip:cn 内就可以为其标记 @global

geolocation-global名称是否合适?如果合适,是否要将@cn及@!cn合并为@global

命名方面, 以我上面的分类定义来说, 包含的域名是来自境外公司所属但在**有可用接入点的域名.
其实用 geolocation-cn-available 比较合适, 不过太长了...
参考 !cn 可以简化为 geolocation-&cn 或者 ~cn 之类的可能比较好.

在有@cn、@!cn的情况下,如果新增@dl来标记系统、软件、游戏的下载、更新域名,还有意义吗?

@dl 应该用处不大, 境内用户下载境外内容如果不经过代理速度极慢几乎不可用.
而下载境内的内容由于 geolocaion-cn 直接直连也不需要 @dl.


Jul 29, 16:00. 补充遗漏, 且按下文修改上文的部分内容.

geolocation-global暂命名 里也应该分两小类:

  • **大陆公司所属但对境外提供主要服务的域名, 标记 @!cn待定
    • data/alibabacloud
  • 境外公司所属但在**大陆有可用连接点的域名, 标记 @cn待定
    • full:fonts.googleapi.com

当前某些域名被 geolocation-cngeolocation-!cn 同时包含, 例:

geolocation-cn  <- include:alibaba <- include:alibabacloud
geolocation-!cn <- include:alibabacloud

新的类别 geolocation-global暂定 目的就是解决这些模棱两可的规则冲突.
同样以 alibabacloud 为例.
alibaba 中删除 alibabacloud,
alibabacloud 内的域名标记 @!cn,
geolocation-global 添加 alibabacloud

geolocation-cn     <- include:alibaba
geolocation-!cn    <- include:alibabacloud
geolocation-global <- include:alibabacloud

这样就解决了冲突, 且用户可用以 geolocaion-global@!cn 覆盖 cn!cn 选择直连或者代理相关域名.
而不是由于重复的归属于 cn!cn 导致意外发生.

[提议] 引入 `geolocation-cn` 域名校验功能

我昨天实现了将 geolocation-cn 完全展开,将 include 的各级子分类数据写回 geolocation-cn 文件的代码。
我想先请教大家的意见,看这样生成的域名列表是否要发布出来(考虑到其他人也许可以拿去转 PAC 或者别的什么用途),还是说仅作为定时校验时的中间文件,不发布。

然后有了共同的决定以后,还需要有人实现一下调用现有的域名备案信息 API,验证展开后的 geolocation-cn 文件中,是否包含查询不到备案信息的域名。(然后一种可能的处理方式是产生提示,让维护者审阅,决定是否删除)

PR #53 里就可以看到之前 geolocation-cn 是错误包含了不该包含的域名的,我觉得这种事不能完全依赖人工检验,自动化检验势在必行。

关于域名分类数据的讨论

geolocation-cn 文件里有这么一段话:

# The following domains are carried over from geosite:cn.
# TODO: Decide how to deal with these domains

这下面的域名没有分类,只是按照字典序排列在一起,其实是非常不利于利用和维护的。

首先很多时候数个域名其实都归属于同一个平台,硬按照字典序打乱了夹在其他域名中间——不利于阅读分析,且在这个服务/平台下线的时候可能会删不干净,这是维护上的困难(本来这么庞大的列表就应该包含了很多访问量极小的站点,里面有些站也许只是昙花一现)

其次现在 @attr 还没有得到充分的应用,未来如果这个数据库进一步扩展,那这里提到的未分类域名都将非常不适合就地添加上 @attr(试想当你只想屏蔽某一特定平台的广告,结果这个平台的域名放在了 geolocation-cn 下,和其他被打了 @ads 属性的未分类域名混在一起)——这是利用上的困难

所以我觉得让 geolocation-cn 下尽量多一些 include:,少一些未分类域名,是最好的发展方向——而这就是我希望拿出来讨论的点了,因为其实就在刚才我的一个 PR#25 才得到了滥用分类的评价,所以这里一定是有协作者之间的看法差异的。

希望能早点讨论出一个共识,避免在需要拐弯的时候给已经十分庞大的历史遗留问题进一步「添砖加瓦」。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.