Giter Club home page Giter Club logo

roboxt's People

Contributors

barryvdh avatar benja-m-1 avatar jshedde avatar omansour avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

roboxt's Issues

Unit test integration

As title, to let class behaviors be reliable, we should consider integrating unit tests.

User-agent names case-insensitive match

According to the draft http://www.robotstxt.org/norobots-rfc.txt
User-agent names are case-insensitive.
Currently User-agent of 'Crawler' cannot read the record like:

User-agent: crawleR
Disallow: /no

Also if no matching record is found (i.e. no '*' record), the following error is occurred when you call isUrlAllowedByUserAgent().
Error: Call to a member function isUrlAllowed() on null in /path/to/vendor/m6web/roboxt/src/Roboxt/File.php:90

Fallback on user-agent "*"

If there is no matching directive for the given user agent, it should fallback on "*" and do the same test.

Dropping php-5.x versions

As title, since the php-7.0 versions has been released for whiles, we should consider dropping php-5.x versions.

Regex delimiter must be escaped using preg_quote

When using preg_match(), preg_quote($pattern, '@') is expected to be used to escape input.
Currently a warning like any of the following occur when a path contains some meta character:

PHP Warning: preg_match(): Compilation failed: missing ) at offset 13 in /path/to/vendor/m6web/roboxt/src/Roboxt/Directive/Directive.php on line 84
PHP Warning: preg_match(): Compilation failed: unmatched parentheses at offset 2 in /path/to/vendor/m6web/roboxt/src/Roboxt/Directive/Directive.php on line 84
PHP Warning: preg_match(): Unknown modifier 'c' in /path/to/vendor/m6web/roboxt/src/Roboxt/Directive/Directive.php on line 84

Parser parse from content string

From the example:

$parser = new \Roboxt\Parser();
$file = $parser->parse("http://www.google.com/robots.txt");

Perhaps it would be more logical for the parse() method to accept the robots.txt content as string and add a new load() method, to load an url directly. Currently, you can't really use this library when you have the content already, or use a different http client (instead of file_get_contents()).

Or if you don't want to break BC, you could add a new method. But name of parse() doesn't imply it does any fetching of the url..

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.