Giter Club home page Giter Club logo

publicsuffix's Introduction

PublicSuffix

A native Rust library for Mozilla's Public Suffix List

CI Latest Version Crates.io downloads Docs Minimum supported Rust version Maintenance License

This library uses Mozilla's Public Suffix List to reliably determine the suffix of a domain name. This crate provides a dynamic list that can be updated at runtime. If you need a faster, though static list, please use the psl crate instead.

NB: v1 of this crate contained logic to validate domain names and email addresses. Since v2, this functionality was moved to the addr crate. This crate also no longer downloads the list for you.

Setting Up

Add this crate to your Cargo.toml:

[dependencies]
publicsuffix = "2"

Examples

use publicsuffix::{Psl, List};

// the official list can be found at
// https://publicsuffix.org/list/public_suffix_list.dat
let list: List = "<-- your public suffix list here -->".parse()?;

let suffix = list.suffix(b"www.example.com")?;
assert_eq!(suffix, "com");
assert_eq!(suffix.typ(), Some(Type::Icann));

let domain = list.domain(b"www.example.com")?;
assert_eq!(domain, "example.com");
assert_eq!(domain.suffix(), "com");

let domain = list.domain("www.食狮.**".as_bytes())?;
assert_eq!(domain, "食狮.**");
assert_eq!(domain.suffix(), "**");

let domain = list.domain(b"www.xn--85x722f.xn--55qx5d.cn")?;
assert_eq!(domain, "xn--85x722f.xn--55qx5d.cn");
assert_eq!(domain.suffix(), "xn--55qx5d.cn");

let domain = list.domain(b"a.b.example.uk.com")?;
assert_eq!(domain, "example.uk.com");
assert_eq!(domain.suffix(), "uk.com");

let domain = list.domain(b"_tcp.example.com.")?;
assert_eq!(domain, "example.com.");
assert_eq!(domain.suffix(), "com.");

publicsuffix's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

publicsuffix's Issues

If local-part is missing parse_email() validates successfully

Hi all, first of all great crate, it really can save a lot of time and trouble!

In the process of creating tests I noticed that my panic test can't pass because parse_email("@mail.com") get validated without errors.

So I tried searching across the RFC and some other sources to see if {local-part} is permitted to be empty due to some alias (or catch-all specification) or something similar, however I couldn't find anything on the topic.

In my opinion it should validate as error since I don't believe it should be empty or this is something that I'm concluding wrong?

During debugging I noticed that /src/lib.rs:500 the local var is actually returned as &str but without any length. I believe that a simple condition edit on line 505 would do the job, however I'm still not sure whether that's a mistake or on purpose?
I don't see a test about this case as well.

Clarify relationship with addr-rs

The owner of this library seems to also have worked on github.com/addr-rs/addr, which looks similar but abandoned?

Which should we be using?

Personally I was interested in just the domain parsing, so tried to use addr, but noticed it looking outdated. I guess the intention is to use this crate, but to disable the "remote-list" feature?

Trivial case does not parse correctly

I hit a case where cdn.fbsbx.com does not parse correctly:

let domain = dbg!(list.parse_domain("cdn.fbsbx.com")).unwrap();
assert_eq!(Some("com"), domain.suffix());

This outputs:

[src/main.rs:46] list.parse_domain("cdn.fbsbx.com") = Ok(
    Domain {
        full: "cdn.fbsbx.com",
        typ: None,
        suffix: None,
        registrable: None,
    },
)
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `Some("com")`,
 right: `None`', src/main.rs:47:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

The same thing happens with fbsbx.com, foo.cdn.fbsbx.com, foo.fbsbx.com, etc. The closest thing in the public suffix list I can find is apps.fbsbx.com, so I'm not sure what's happening.

Inconsistent handling of trailing dots can lead to panic

// Domain { full: "com", typ: Some(Icann), suffix: Some("com"), registrable: None }
eprintln!("{:?}", list.parse_domain("com").unwrap());
// Domain { full: "localhost", typ: None, suffix: None, registrable: None }
eprintln!("{:?}", list.parse_domain("localhost").unwrap());
// Domain { full: "test.com.", typ: None, suffix: Some("com"), registrable: Some("test.com") }
eprintln!("{:?}", list.parse_domain("test.com.").unwrap());
// Domain { full: "test.localhost.", typ: None, suffix: Some("localhost"), registrable: Some("test.localhost") }
eprintln!("{:?}", list.parse_domain("test.localhost.").unwrap());
// panicked at 'index 2 out of range for slice of length 1'
eprintln!("{:?}", list.parse_domain("com.").unwrap());
// panicked at 'index 2 out of range for slice of length 1'
eprintln!("{:?}", list.parse_domain("localhost.").unwrap());

Notice how the suffix is different for localhost and test.localhost.. I am not sure if this is the same or a different issue.

Stracktrace:

  10: <core::ops::range::Range<usize> as core::slice::SliceIndex<[T]>>::index
             at /checkout/src/libcore/slice/mod.rs:866
  11: <core::ops::range::RangeTo<usize> as core::slice::SliceIndex<[T]>>::index
             at /checkout/src/libcore/slice/mod.rs:912
  12: core::slice::<impl core::ops::index::Index<I> for [T]>::index
             at /checkout/src/libcore/slice/mod.rs:717
  13: <alloc::vec::Vec<T> as core::ops::index::Index<core::ops::range::RangeTo<usize>>>::index
             at /checkout/src/liballoc/vec.rs:1584
  14: publicsuffix::Domain::assemble
             at /cargo/registry/src/github.com-1ecc6299db9ec823/publicsuffix-1.4.0/src/lib.rs:600
  15: publicsuffix::Domain::find_match
             at /cargo/registry/src/github.com-1ecc6299db9ec823/publicsuffix-1.4.0/src/lib.rs:647
  16: publicsuffix::Domain::parse::{{closure}}
             at /cargo/registry/src/github.com-1ecc6299db9ec823/publicsuffix-1.4.0/src/lib.rs:677
  17: <core::result::Result<T, E>>::and_then
             at /checkout/src/libcore/result.rs:602
  18: publicsuffix::Domain::parse
             at /cargo/registry/src/github.com-1ecc6299db9ec823/publicsuffix-1.4.0/src/lib.rs:676
  19: publicsuffix::List::parse_dns_name
             at /cargo/registry/src/github.com-1ecc6299db9ec823/publicsuffix-1.4.0/src/lib.rs:484

Changing lib.rs:671 to

        let input = domain.trim_right_matches('.');

fixes the problem. I will provide a pull request with matching tests later.

Support for invalid hostnames, e.g., in DNS

This crate currently seems to be the only project implementing the PSL algorithm for rust unfortunately, it requires the input to be a valid domain.

This requirement is too strict for my use case. I want to analyze DNS data, however DNS is much more generous what it allows in its labels. RFC 2181 specifies:

The DNS itself places only one restriction on the particular labels
that can be used to identify resource records. That one restriction
relates to the length of the label and the full name. The length of
any one label is limited to between 1 and 63 octets. A full domain
name is limited to 255 octets (including the separators).

For my use cases supporting the printable ASCII subset is plenty as I am mostly interested in the effective second level domain. A concrete example of what does not work currently is _tcp.example.com., which is often found for SRV resource records.

  1. Is this a use case this crate/their maintainers want to support?
  2. What would be the best way to implement support for this? Removing this line would remove the validity check. However, this check is deep down in the code such that I don't see a good way to change the API in this unfamiliar code.

OpenSSL requirement is onerous (use native TLS instead)

Switching from reqwest to hyper had the effect of introducing an OpenSSL requirement on Windows and macOS that wasn’t there before, and getting OpenSSL working properly on Windows is a nuisance. reqwest does what a normal person would expect: it uses the platform’s native TLS implementation, via the native-tls crate.

It’d be good for this crate to switch back to using native-tls so that it can be used on Windows without having to have OpenSSL set up on platforms where it’s tricky to do so.

Provide alternate tls implementation options

Per an issue raised on user_agent, the default features of publicsuffix end up pulling in a native-tls/openssl dependency that some users would like to avoid. cookie_store (the intermediate dependency relying on publicsuffix) was updated to disable default features, which removes the issue, but raising the issue here as well for your consideration to either:

  1. support alternate tls implementations (rustls-tls), although issue #1 suggests perhaps this is more burden than you'd like in this crate.
  2. move the remote_list functionality out of the publicsuffix crate and into a separate crate
  3. simply not including it as a default feature.

Exclusion rules are not implemented correctly

At the time of this writing, the public suffix list contains the rules

*.ck
!www.ck

So with these rules e.g. foo.www.ck is an invalid domain because it does not have a valid suffix. However, the code produces this parse result:

[src/main.rs:46] list.parse_domain("foo.www.ck") = Ok(
    Domain {
        full: "foo.www.ck",
        typ: Some(
            Icann,
        ),
        suffix: Some(
            "ck",
        ),
        registrable: Some(
            "www.ck",
        ),
    },
)

which is incorrect, because ck is not a valid suffix on its own (it needs one label in front of it, and it can't be www).

You see a similar issue with other exclusion rules, e.g.

[src/main.rs:46] list.parse_domain("foo.city.kawasaki.jp") = Ok(
    Domain {
        full: "foo.city.kawasaki.jp",
        typ: Some(
            Icann,
        ),
        suffix: Some(
            "kawasaki.jp",
        ),
        registrable: Some(
            "city.kawasaki.jp",
        ),
    },
)

It seems when evaluating exclusion rules, the code presently seems to just strip off the first label and say that's the suffix. If I'm not mistaken, the correct thing to do here is just return an error that indicates the input domain is invalid.

Patch release 1.5.5 has a breaking change

7120730 moved List::from_str behind std::str::FromStr. This breaks the build if there is no use std::str::FromStr already in the code.

If such a change is done again in the future, please make either a new major release (https://doc.rust-lang.org/cargo/reference/manifest.html#the-version-field) or keep a (deprecated) from_str alias.

#[deprecated(
    since = "1.5.5",
    note = "Use std::str::FromStr::from_str instead"
)]
pub fn from_str(string: &str) -> Result<List> {
    <Self as FromStr>::from_str(string)
}

Question: Single label domain and suffix

As mentioned in #8 there is an inconsistency between the handling of registered single label domains and unregistered ones. Namely, only suffixes which do appear in the publicsuffixlist are parsed into a suffix and and empty registrable.

// Domain { full: "com", typ: Some(Icann), suffix: Some("com"), registrable: None }
eprintln!("{:?}", list.parse_domain("com").unwrap());
// Domain { full: "localhost", typ: None, suffix: None, registrable: None }
eprintln!("{:?}", list.parse_domain("localhost").unwrap());

If we look at two-label domains, this is different:

// Domain { full: "test.com.", typ: None, suffix: Some("com"), registrable: Some("test.com") }
eprintln!("{:?}", list.parse_domain("test.com.").unwrap());
// Domain { full: "test.localhost.", typ: None, suffix: Some("localhost"), registrable: Some("test.localhost") }
eprintln!("{:?}", list.parse_domain("test.localhost.").unwrap());

The code which fills the suffix and registrable in the second case is lines 645-648. I assume this is to try to give as much information as possible for suffixes which are not in the list (maybe because the list is outdated).

Would it make sense to add a case like

else if suffix.is_none() && d_labels.len() == 1 && no_possible_matches_found {
    suffix = Some(Self::assemble(input, 1));
    registrable = None;
}

to make the handling of single-label domains more uniform?

Why is domain that has a suffix that is not on the list returns true in `has_known_suffix` function?

I couldn't find ae1 in the list, why does the domain type an Icann?

if let Ok(domain) = PUBLIC_SUFFIX_LIST.parse_domain(domain_name) {
        println!("domain: {:#?}", domain);
        if domain.has_known_suffix() {
            println!("Yes, this domain has a known suffix");
     }
}
domain: Domain {
    full: "co.ae1",
    typ: Some(
        Icann,
    ),
    suffix: Some(
        "ae1",
    ),
    registrable: Some(
        "co.ae1",
    ),
}
Yes, this domain has a known suffix
domain: Domain {
    full: "co.ae11",
    typ: Some(
        Icann,
    ),
    suffix: Some(
        "ae11",
    ),
    registrable: Some(
        "co.ae11",
    ),
}
Yes, this domain has a known suffix

`has_known_suffix` always return true

This is a security vulnerability, as programs may rely on this to screen out local domains, eg. "example.svc.local".

I understand that the algorithm described on https://publicsuffix.org/list/ specifies that:

If no rules match, the prevailing rule is "*".

However, this is for a specific use-case: when determining what part of the domain is the public suffix. Using this rule when determining whether the suffix is "known" is a huge security hole, as it essentially treats all domains as "known".

Specifically the "type" should be None, if the wildcard rule is used as a fallback.

[noise] any feedback about rspec ?

Hello @rushmorem, I am glad to see you are using our crate, rspec ! :)

If you have any feedback, remarks about it, anything, please say it. We are looking for feedback to move forward and priorize features. What do you miss the most ? What have been painful ? What would you like to see in the docs or in the Readme ?

By the way, I see in your code test in the form:

    rdescribe("the list", |ctx| {
        ctx.it("should not be empty", || {
            assert!(!list.all().is_empty());
            pass!()
        });
        (...)
    });

Is there any problem with this form ?

    rdescribe("the list", |ctx| {
        ctx.it("should not be empty", || {
            !list.all().is_empty()) // <- returns bool, should be fine
        });
    });

(sorry for the issue, I didn't found your email)

Please support accessing an offline version of the list

Some other implementations of the public suffix list use (and share) an offline version of the list. For instance, libpsl, Python's publicsuffix, Perl's Domain::PublicSuffix, and Haskell's publicsuffixlist all use the same shared copy, packaged in Debian as publicsuffix.

Please consider supporting the use of a shared copy on the system. You might also support falling back to a compiled-in version (perhaps provided in a separate crate to avoid having to update this one too frequently). And then, for anyone who wants to download the list themselves and keep it updated, you can provide a URL as a constant, and let callers download that using whatever HTTP library they already use and provide its contents. (That would also avoid having to deal with caching policies within this library.)

This would also address issue #1.

1.5.2 release

Similar to this issue, native-tls 0.2 is now required to build on hosts with newer versions of openssl. This dependency bump already exists on master, could you release 1.5.2?

Static lists

I'd like to create Lists statically at compile-time (I don't want to rely on the internet or ship the dat file). I've implemented this here using lazy_static! and include_str!. Currently, this should extract the &str at compile-time and lazily build the list at run-time. Would you be interested in including this static list in this crate?

I just created the pull request #13, which adds a function from_str to create a List. This should eliminate the need to call to_string which might save some memory.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.