The Faup Library provides:
- A static library you can embed in your software (faup_static)
- A dynamic library you can get along with (faupl)
- A command line tool you can use to extract various parts of a url (faup)
Faup is written to be fast and resilient to badly formated URL. It extract any required url field by checking each character once.
Because they all suck. Find a library that can extract, say, a TLD even if you have an IP address, or http://localhost, or anything that may confuse your regex so much that you end up with an unmaintainable one.
[ URL ] -> [ Features discovery ] -> [ Decoding ] -> [ URL Fields ]
Simply pipe or give your url as a parameter:
$ echo "www.github.com" |faup
scheme,credential,subdomain,domain,host,tld,port,resource_path,query_string,fragment
,,www,github.com,www.github.com,com,,,,
$ faup www.github.com
scheme,credential,subdomain,domain,host,tld,port,resource_path,query_string,fragment
,,www,github.com,www.github.com,com,,,,
Here's what you can do:
>>> from pyfaup.faup import Faup
>>> f = Faup()
>>> f.decode("https://www.slashdot.org")
>>> f.get()
{'credential': None, 'domain': 'slashdot.org', 'subdomain': 'www', 'fragment': None, 'host': 'www.slashdot.org', 'resource_path': None, 'tld': 'org', 'query_string': None, 'scheme': 'https', 'port': None}
>>>
Again, things are basic:
faup_handler_t *fh;
fh = faup_init();
faup_decode(fh, "https://wallinfire.net", strlen("https://wallinfire.net"));
tld_pos = faup_get_tld_pos(fh); /* will return 19 */
tld_size = faup_get_tld_size(fh); /* will return 3 */
faup_show(fh, ',', stdout);
faup_terminate(fh);