Giter Club home page Giter Club logo

phergie-irc-plugin-react-url's Introduction

This project is abandoned

This repo is being kept for posterity and will be archived in a readonly state. If you're interested it can be forked under a new Composer namespace/GitHub organization.

Url Plugin

Phergie plugin for Display URL information about links.

Build Status

Install

To install via Composer, use the command below, it will automatically detect the latest version and bind it with ~.

composer require phergie/phergie-irc-plugin-react-url 

See Phergie documentation for more information on installing and enabling plugins.

Configuration

return array(

    'plugins' => array(

        // dependencies
        new \Phergie\Plugin\Dns\Plugin, // Handles DNS lookups for the HTTP plugin
        new \Phergie\Plugin\Http\Plugin, // Handles the HTTP requests for this plugin

        // configuration
        new \Phergie\Irc\Plugin\React\Url\Plugin(array(
            // All configuration is optional
            
            'hostUrlEmitsOnly' => false, // url.host.(all|<host>) emits only, no further URL handling / shortening
            
            // or

            'handler' => new \Phergie\Irc\Plugin\React\Url\DefaultUrlHandler(), // URL handler that creates a formatted message based on the URL

            // or

            'shortenTimeout' => 15 // If after this amount of seconds no url shortener has come up with a short URL the normal URL will be used. (Not in effect when there are no shorteners listening.)

            // or

            'filter' => null // Any valid filter implementing Phergie\Irc\Plugin\React\EventFilter\FilterInterface to filter which messages should be handled 

        )),

    )
);

Events

This plugin emits the following generic, do what ever you want with it, events.

  • url.host.HOSTNAME For example url.host.twitter.com (www. is stripped from the hostname).
  • url.host.all For all hostnames.

This plugins also emits two events for url shortening. Only called when there are listeners registered. Each event emit is passed a UrlshorteningEvent, if a shortener resolved short url it calls the resolve method on the promise.

  • url.shorten.HOSTNAME For example url.shorten.twitter.com (www. is stripped from the hostname).
  • url.shorten.all For all hostnames.

Placeholders

The following placeholders can be used to compose a message that is passed as the first argument for DefaultUrlHandler to create custom messages:

  • %url% - Full URL
  • %url-short% - Shortened URL
  • %http-status-code% - HTTP status code
  • %timing% - Time in seconds it took for th request to complete
  • %timing2% - Time in seconds it took for th request to complete rounded off to a maximum of two decimals
  • %response-time% - Time in seconds it took for th request to complete
  • %response-time2% - Time in seconds it took for th request to complete rounded off to a maximum of two decimals
  • %title% - Page title
  • %composed-title% - Page title

Header Placeholders

Selection of response headers from: en.wikipedia.org/wiki/List_of_HTTP_header_fields#Response_Headers

  • %header-age%
  • %header-content-type%
  • %header-content-length%
  • %header-content-language%
  • %header-date%
  • %header-etag%
  • %header-expires%
  • %header-last-modified%
  • %header-server%
  • %header-x-powered-by%

UrlSectionFilter

This plugin comes with the UrlSectionFilter that lets you filter on the different key value pairs coming out of parse_url. The following example filter allows www.phergie.org, www2.phergie.org, and phergie.org:

new OrFilter([
    new UrlSectionFilter('host', '*.phergie.org'),
    new UrlSectionFilter('host', 'phergie.org'),
])

The filter comes with a third strict parameter where instead of declaring out of scope on missing an URL part it return false.

Tests

To run the unit test suite:

curl -s https://getcomposer.org/installer | php
php composer.phar install
./vendor/bin/phpunit

License

Released under the MIT License. See LICENSE.

phergie-irc-plugin-react-url's People

Contributors

clue avatar elazar avatar matthewtrask avatar meroje avatar pschwisow avatar scrutinizer-auto-fixer avatar sitedyno avatar svpernova09 avatar wyrihaximus avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

phergie-irc-plugin-react-url's Issues

http -> https redirect results in endless loop

When posting a domain with a http -> https 301 redirect the plugin falls into an endless loop:

2015-02-07 22:15:40 DEBUG [email protected] :hashworks!~hashworks@0Quoten3Ossi PRIVMSG moonbasetest :hashworks.net []
2015-02-07 22:15:40 DEBUG [Url][54d6807cc0df0]Found url: hashworks.net []
2015-02-07 22:15:40 DEBUG [Url][54d6807cc0df0]Corrected url: http://hashworks.net/ []
2015-02-07 22:15:40 DEBUG [Url][54d6807cc0df0]Emitting: http.request []
2015-02-07 22:15:40 DEBUG [Http]Creating new HttpClient []
2015-02-07 22:15:40 DEBUG [Http]Requesting DNS Resolver []
2015-02-07 22:15:40 DEBUG [Dns]dns.resolver called []
2015-02-07 22:15:40 DEBUG [Dns]Creating new Resolver []
2015-02-07 22:15:40 DEBUG [Http]DNS Resolver received []
2015-02-07 22:15:40 DEBUG [Http]Requesting DNS Resolver []
2015-02-07 22:15:40 DEBUG [Http][54d6807cc14d6]Sending request []
2015-02-07 22:15:40 DEBUG [Url][54d6807cc0df0]Emitting: url.host.all []
2015-02-07 22:15:40 DEBUG [Http][54d6807cc14d6]Writing body []
2015-02-07 22:15:40 DEBUG [Http][54d6807cc14d6]Response received []
2015-02-07 22:15:40 DEBUG [Url][54d6807cc0df0]Reponse (after 0.12431311607361s): 301 []
2015-02-07 22:15:40 DEBUG [Http][54d6807cc14d6]Data received []
2015-02-07 22:15:40 DEBUG [Http][54d6807cc14d6]Request done []
2015-02-07 22:15:40 DEBUG [Url][54d6807cc0df0]Download complete (after 0.12470602989197s): 0 in length length []
2015-02-07 22:15:41 DEBUG [email protected] PRIVMSG moonbasetest :[ http://hashworks.net/ ] []
2015-02-07 22:15:41 DEBUG [email protected] :[email protected] PRIVMSG moonbasetest :[ http://hashworks.net/ ] []
2015-02-07 22:15:41 DEBUG [Url][54d6807d47e58]Found url: http://hashworks.net/ []
2015-02-07 22:15:41 DEBUG [Url][54d6807d47e58]Emitting: http.request []
2015-02-07 22:15:41 DEBUG [Http]Existing HttpClient found, using it []
2015-02-07 22:15:41 DEBUG [Http][54d6807d4800a]Sending request []
2015-02-07 22:15:41 DEBUG [Url][54d6807d47e58]Emitting: url.host.all []
2015-02-07 22:15:41 DEBUG [Http][54d6807d4800a]Writing body []
2015-02-07 22:15:41 DEBUG [Http][54d6807d4800a]Response received []
2015-02-07 22:15:41 DEBUG [Url][54d6807d47e58]Reponse (after 0.11247301101685s): 301 []
2015-02-07 22:15:41 DEBUG [Http][54d6807d4800a]Data received []
2015-02-07 22:15:41 DEBUG [Http][54d6807d4800a]Request done []
2015-02-07 22:15:41 DEBUG [Url][54d6807d47e58]Download complete (after 0.11286997795105s): 0 in length length []
<REPEAT>

This is because it's messaging itself over and over again prefixed with the http protocol.

(Originally opened by @hashworks on a different repo, moving issue to this rep)

Feature Request: Ignore some URLs

When third-party bots co-exist in the same channel as Phergie, the URL plugin can contribute to the noise-to-signal ratio.

travis-ci
12:58 phergie/plugin-dns#14 (version-2-updates - 672b7b9 : Joe Ferguson): The build passed.
12:58 Change view : https://github.com/phergie/plugin-dns/compare/a8c7322d1833...672b7b9d0af8
12:58 Build details : https://travis-ci.org/phergie/plugin-dns/builds/99308672
12:58 travis-ci left the room.
Phergie
12:58 [http://gsc.io/u/11] Travis CI - Test and Deploy Your Code with Confidence
12:58 [http://gsc.io/u/10] Comparing a8c7322d1833...672b7b9d0af8 · phergie/plugin-dns · GitHub

It would be nice if it was possible to ignore some URLs based on specific components (e.g. hostname). This could be done by implementing a FilterInterface with a single method that accepts a URL and returns a boolean value indicating whether that URL should be handled or not. To maintain BC, a default that returns true for all URLs could be implemented. This would allow for maximum flexibility and customization by the end-user.

301 and 302 responses not being handled

Looks like 301 Moved Permanently and 302 Found responses are not being handled.

From #phpc on Freenode:

3:48 terratoma: http://www.rawstory.com/2015/04/texas-gop-lawmaker-what-is-going-on-in-baltimore-is-because-of-too-many-gay-marriages
3:48 Phergie: [http://gsc.io/u/57]

Note the lack of a title in the output.

When I hit the URL as it's given above, I get this:

04:09:24 ~ $ curl -v "http://www.rawstory.com/2015/04/texas-gop-lawmaker-what-is-going-on-in-baltimore-is-because-of-too-many-gay-marriage"
* Adding handle: conn: 0x7f91b380aa00
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7f91b380aa00) send_pipe: 1, recv_pipe: 0
* About to connect() to www.rawstory.com port 80 (#0)
*   Trying 104.239.182.155...
* Connected to www.rawstory.com (104.239.182.155) port 80 (#0)
> GET /2015/04/texas-gop-lawmaker-what-is-going-on-in-baltimore-is-because-of-too-many-gay-marriage HTTP/1.1
> User-Agent: curl/7.30.0
> Host: www.rawstory.com
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
* Server nginx is not blacklisted
< Server: nginx
< Content-Type: text/html
< Date: Fri, 01 May 2015 21:10:42 GMT
< Keep-Alive: timeout=20
< Location: http://www.rawstory.com/2015/04/texas-gop-lawmaker-what-is-going-on-in-baltimore-is-because-of-too-many-gay-marriage/
< X-Type: default
< Connection: keep-alive
< Set-Cookie: X-Mapping-fjhppofk=8D623D1BB3EE0A25628811CAA06CFFB8; path=/
< Content-Length: 178
<
<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx</center>
</body>
</html>
* Connection #0 to host www.rawstory.com left intact

But if I append a /, to make the URL consistent with what's shown in the Location response header above, I get a 200 OK response and the body contains a title:

04:10:43 ~ $ curl -v "http://www.rawstory.com/2015/04/texas-gop-lawmaker-what-is-going-on-in-baltimore-is-because-of-too-many-gay-marriages/" 2>&1 | grep '<title>'
<title>  Texas GOP lawmaker: &#8216;What is going on in Baltimore&#8217; is because of too many gay marriages</title>

(Originally opened by @elazar on a different repo, moving issue to this rep)

Some entities aren't decoded in titles from HTML responses

elazar 10:22 Speaking of, if anyone wants to take on some low-hanging fruit: https://github.com/phergie/phergie-irc-plugin-react-twitter/issues/7
Phergie 10:22 [http://gsc.io/u/35] Entities aren&#39;t decoded · Issue #7 · phergie/phergie-irc-plugin-react-twitter · GitHub

Seems like they should be, but as the output above indicates, they aren't. :(

Related: phergie/phergie-irc-plugin-react-twitter#7

(Originally opened by @elazar on a different repo, moving issue to this rep)

Try HEAD vs GET requests first

GET requests can fetch large resources that consume lots of bandwidth unnecessarily.

Example:

2015-06-12 15:05:56 DEBUG [Url][557b3ba416813]Found url: http://upload.wikimedia.org/wikipedia/commons/4/4f/Funny_Car_AAA.JPG []
...snip...
2015-06-12 15:05:56 DEBUG [Url][557b3ba416813]Download complete (after 0.18480515480042s):

Implement a strategy whereby HEAD requests are tried first, so that the resource body isn't downloaded since a lot of desired information is generally in the response headers anyway.

There are instances where this is not the case. For example, if chunked transfer encoding is used, the size of the resource won't be available. While most responses should include a content type header, it's possible they may not. Finally, some resources won't support HEAD (in which case a 405 response should be returned). So, use GET as a fallback in cases where desired information isn't available in HEAD responses.

(Originally opened by @elazar on a different repo, moving issue to this rep)

Parser handles XXX scene dirnames as URLs

18:22:35 <%someone> GirlsDoPorn.E157.21.Years.Old.XXX.720p.WMV-KTR
18:22:35 <+Phergie> [ http://GirlsDoPorn.E157.21.Years.Old.XXX/ ]  

Yes, I'm in weird channels. My guess is that Twitter_Extractor looks for the .xxx TLD, not sure how to change this without altering Twitter_Extractor.

(Originally opened by @hashworks on a different repo, moving issue to this rep)

GuzzleHttp\Exception\RequestException in resolveCallback throwing fatal error

From the PHP log:

PHP Catchable fatal error:  Argument 1 passed to Phergie\Irc\Plugin\React\Url\Plugin::Phergie\Irc\Plugin\React\Url\{closure}() must be an instance of GuzzleHttp\Message\Response, instance of GuzzleHttp\Exception\RequestException given, called in /home/phergie/phergie-freenode/vendor/phergie/phergie-irc-plugin-http/src/Request.php on line 98 and defined in /home/phergie/phergie-freenode/vendor/phergie/phergie-irc-plugin-react-url/src/Plugin.php on line 180

Current tagged version of both plugins (http and url) on version 2 of the bot.

Url that caused the issue: http://www.senzati.com/jet-sprinter/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.