Giter Club home page Giter Club logo

urllinker's Introduction

UrlLinker

UrlLinker is a PHP module for converting plain text snippets to HTML, and any web addresses in the text into HTML hyperlinks.

Usage:

print(htmlEscapeAndLinkUrls($text));

For a longer example, see UrlLinker-example.php.

UrlLinker assumes plain text input, and returns HTML. If your input is already HTML, but it contains URLs that have not been marked up, UrlLinker can handle that as well:

print(linkUrlsInTrustedHtml($html));

Warning: The latter function must only be used on trusted input, as rendering HTML provided by a malicious user can lead to system compromise through cross-site scripting. The htmlEscapeAndLinkUrls function, on the other hand, can safely be used on untrusted input. (You can remove existing tags from untrusted input via PHP's strip_tags function.)

Note: This repository is no longer being actively maintained. If you want to use UrlLinker with Composer, consider this fork.

Recognized addresses

  • Web addresses
    • Recognized URL schemes: "http" and "https"
      • The http:// prefix is optional.
      • Support for additional schemes, e.g. "ftp", can easily be added by tweaking $rexScheme.
      • The scheme must be written in lower case. This requirement can be lifted by adding an i (the PCRE_CASELESS modifier) to $rexUrlLinker.
    • Hosts may be specified using domain names or IPv4 addresses.
      • IPv6 addresses are not supported.
    • Port numbers are allowed.
    • Internationalized Resource Identifiers (IRIs) are allowed. Note that the job of converting IRIs to URIs is left to the user's browser.
    • To reduce false positives, UrlLinker verifies that the top-level domain is on the official IANA list of valid TLDs.
      • UrlLinker is updated from time to time as the TLD list is expanded.
      • In the future, this approach may collapse under ICANN's ill-advised new policy of selling arbitrary TLDs for large amounts of cash, but for now it is an effective method of rejecting invalid URLs.
      • Internationalized top-level domain names must be written in Punycode in order to be recognized.
      • If you need to support unqualified domain names, such as localhost, you may disable the TLD check by 1) replacing + with * in the $rexDomain value and 2) replacing the if statement line beneath the "Check that the TLD is valid" comment with if (true). This is obviously a quick-and-dirty hack, and may cause false positives.
  • Email addresses
    • Supports the full range of commonly used address formats, including "plus addresses" (as popularized by Gmail).
    • Does not recognized the more obscure address variants that are allowed by the RFCs but never seen in practice.
    • Simplistic spam protection: The at-sign is converted to a HTML entity, foiling naive email address harvesters.
  • Addresses are recognized correctly in normal sentence contexts. For instance, in "Visit stackoverflow.com.", the final period is not part of the URL.
  • User input is properly sanitized to prevent cross-site scripting (XSS), and ampersands in URLs are correctly escaped as & (this does not apply to the linkUrlsInTrustedHtml function, which assumes its input to be valid HTML).

Background

A Stackoverflow.com question prompted me to consider the difficulty of this task. Initially, it seemed easy, but like an itch you just have to scratch, I kept coming back to it, to fix just one more little thing.

Feel free to upvote my answer if you find this code useful.

There's also a C# implementation by Antoine Sottiau.

Public Domain Dedication

To the extent possible under law, the author has waived all copyright and related or neighboring rights to UrlLinker.

For more information see: http://creativecommons.org/publicdomain/zero/1.0/

urllinker's People

Contributors

kwi-dk avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.