Giter Club home page Giter Club logo

readability.php's Introduction

readability.php's People

Contributors

andreskrey avatar castrocrea avatar davidfricker avatar fivefilters avatar ninoskopac avatar topotru avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

readability.php's Issues

->textContent is not available

Hello.

->getContent()->textContent is not available, it's available in readability.js:
textContent: text content of the article, with all the HTML tags removed;

is it possible to add this function? thank you.

Exception: The scheme `href='https` is invalid.

I'm getting a lot of errors related to the URI lib, such as:
Exception: The scheme `href='https` is invalid. /home/[...REMOVED...]/public_html/application/third_party/Readability.php/v3.1.2/vendor/league/uri/src/Uri.php 249

Attempt to read property "length" on null in src/Readability.php on line 1514

related to : #12
Hey @fivefilters , we are getting this error instead in php8.1.
Attempt to read property "length" on null in /PATH/vendor/fivefilters/readability.php/src/Readability.php on line 1514

it's probably better to make the fix something like this :
vendor/fivefilters/readability.php/src/Readability.php:1506 : $siblings = $parentOfTopCandidate->childNodes ?? [];

Thanks for your attention on this !

Notice: Trying to get property '' of non-object

Hey there,

I am facing the following notices about getting properties of non-objects when using this library for articles from this feed:

Trying to get property 'length' of non-object in /PATH/vendor/fivefilters/readability.php/src/Readability.php on line 1514
Trying to get property 'textContent' of non-object in /PATH/vendor/fivefilters/readability.php/src/Readability.php on line 207
Trying to get property 'childNodes' of non-object in /PATH/vendor/fivefilters/readability.php/src/Readability.php on line 1506

minify html removes <html> attributes

When I enabled HTML minification, it removed attributes from <html> tag.

<html lang="en" class="dark-mode"> -> minify -> <html>

Versions:

"laravel/framework": "^10.10",
"fivefilters/readability.php": "3.1.6",

Issue with relative URLs that contain special characters

I am having problems with relative URLs from parsed content. Somehow the getAbsolutURI() method seems to choke on relative URLs returned from the parsed document (for me in postProcessContent()'s array walk) if they contain e.g. whitespaces. Instead of escaping them, the URL seems to be cut at the place of the whitespace, leading to incorrect generated URIs.

Example:

The URL parsed from the document is https://media.sleep-hero.de/MDE/uploads/product/Tauro Matratzenbezug.jpg?p=n&vh=840500&width=390&height=360&func=bound 2x, and is passed to getAbsoluteURI. It should likely be https://media.sleep-hero.de/MDE/uploads/product/Tauro%20Matratzenbezug.jpg?p=n&vh=840500&width=390&height=360&func=bound%202x to correctly encode whitespaces.

(using Readability via Nextcloud bookmarks, see nextcloud/bookmarks#1965)

when i use mentioned function for getting website readability score but its giving some error

inputUrl, FILTER_VALIDATE_URL) == true) { $readability = new Readability(new Configuration()); $html = file_get_contents($request->inputUrl); try { $readability->parse($html); $readObj = new ReadabilityAlgorithm(); $titleOfThePage = $readability->getTitle(); $excerpt = $readability->getExcerpt(); $totalReadabiltyScore = $readObj->calculateReadabilityScore($readability->getContent()); $wordsCount = str_word_count($readability->getContent()); $readabiltyScore = (int)$totalReadabiltyScore; if($readabiltyScore > 100){ $readabiltyScore = 95; return view('readabilityScore.index',compact('titleOfThePage','excerpt','readabiltyScore','wordsCount')); }else{ return view('readabilityScore.index',compact('titleOfThePage','excerpt','readabiltyScore','wordsCount')); } } catch (ParseException $e) { return redirect()->back()->with('danger', 'Sorry! unable to process the url try again'); } }else{ return redirect()->back()->with('danger', 'Incorrect Url'); } } }

Deprecated: Return type of andreskrey\Readability\Nodes\NodeTrait::getAttribute($attributeName) should either be compatible with DOMElement::getAttribute(string $qualifiedName)

`Deprecated: Return type of andreskrey\Readability\Nodes\NodeTrait::getAttribute($attributeName) should either be compatible with DOMElement::getAttribute(string $qualifiedName): string, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/users/dokukanal/www/vendor/fivefilters/readability.php/src/Nodes/NodeTrait.php on line 172

Deprecated: Return type of andreskrey\Readability\Nodes\NodeTrait::hasAttribute($attributeName) should either be compatible with DOMElement::hasAttribute(string $qualifiedName): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/users/dokukanal/www/vendor/fivefilters/readability.php/src/Nodes/NodeTrait.php on line 190`

Namespace includes are for fivefilters in the Readme.md โ€” but the project uses the andreskrey namespace

The namespace includes in the Readme.md reference fivefilters:

use fivefilters\Readability\Readability;
use fivefilters\Readability\Configuration;
use fivefilters\Readability\ParseException;

But the project itself still uses the (old) andreskrey namespaces throughout. E.g.:

use andreskrey\Readability\Nodes\DOM\DOMDocument;
use andreskrey\Readability\Nodes\DOM\DOMElement;
use andreskrey\Readability\Nodes\DOM\DOMNode;

Either the project should be updated to reflect the fivefilters fork namespaces, or the Readme updated to use the original namespaces.

Deprecated error

I am getting the following error:

Deprecated: Return type of fivefilters\Readability\Nodes\NodeTrait::getAttribute($attributeName) should either be compatible with DOMElement::getAttribute(string $qualifiedName): string, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /var/www/html/xxx/vendor/fivefilters/readability.php/src/Nodes/NodeTrait.php on line 170

I am using PHP8. Please advice.

Composer error

if i use
composer require fivefilters/readability.php it download only version 1
if i run
composer require fivefilters/readability.php:^3
I got error
Problem 1
- fivefilters/readability.php[v3.1.5, ..., v3.1.6] require league/uri ~6.7.2 -> satisfiable by league/uri[6.7.2].
- fivefilters/readability.php[v3.0.0, ..., v3.1.4] require psr/log ^1.0 -> found psr/log[1.0.0, ..., 1.1.4] but the package is fixed to 3.0.0 (lock file version) by a partial update and that version does not match. Make sure you list it as an argument for the update command.
- league/uri 6.7.2 requires psr/http-message ^1.0 -> found psr/http-message[1.0, 1.0.1, 1.1] but the package is fixed to 2.0 (lock file version) by a partial update and that version does not match. Make sure you list it as an argument for the update command.
- Root composer.json requires fivefilters/readability.php ^3 -> satisfiable by fivefilters/readability.php[v3.0.0, ..., v3.1.6].

PHP Deprecated: dirname(): Passing null to parameter #1 ($path) of type string is deprecated

Hello,

I've some PHP Deprecated with some URLs

Here is my sample code:

$url = 'https://investor.cummins.com';
$html = file_get_contents($url);

$config = new fivefilters\Readability\Configuration();
$config->setFixRelativeURLs(true);
$config->setOriginalURL($url);

$readability = new fivefilters\Readability\Readability($config);
$readability->parse($html);

Current result:

PHP Deprecated: dirname(): Passing null to parameter #1 ($path) of type string is deprecated in vendor/fivefilters/readability.php/src/Readability.php on line 842
PHP Deprecated: dirname(): Passing null to parameter #1 ($path) of type string is deprecated in vendor/fivefilters/readability.php/src/Readability.php on line 842

I'm using PHP 8.2.7 with fivefilters/readability.php v3.1.6.

This is not blocker, but I like clean logs ;)

Thanks!

Classes not being found

I installed this library by running: composer require fivefilters/readability.php:dev-master, I am getting error:

Fatal error: Uncaught Error: Class 'fivefilters\Readability\Readability' not found 

I used used the example code given here

composer require not pulling latest package

Package install in a clean directory

~/Sites/package-test ๏ฃฟ composer require fivefilters/readability.php
Using version ^2.1 for fivefilters/readability.php
./composer.json has been updated
Running composer update fivefilters/readability.php
Loading composer repositories with package information
Updating dependencies
Lock file operations: 2 installs, 0 updates, 0 removals

  • Locking fivefilters/readability.php (v2.1.0)
  • Locking psr/log (1.1.4)
    Writing lock file
    Installing dependencies from lock file (including require-dev)
    Package operations: 2 installs, 0 updates, 0 removals
  • Installing psr/log (1.1.4): Extracting archive
  • Installing fivefilters/readability.php (v2.1.0): Extracting archive
    1 package suggestions were added by new dependencies, use composer suggest to see details.
    Generating autoload files

vi the Readability file, and namespace are all incorrect .

namespace andreskrey\Readability;

use andreskrey\Readability\Nodes\DOM\DOMDocument;
use andreskrey\Readability\Nodes\DOM\DOMElement;
use andreskrey\Readability\Nodes\DOM\DOMNode;
use andreskrey\Readability\Nodes\DOM\DOMText;
use andreskrey\Readability\Nodes\NodeUtility;
use Psr\Log\LoggerInterface;

Just to be sure I cleaned out composer cache and uninstalled and reinstalled.. Same issues.

From composer this is broken.

Could not parse text error

I am trying to use the library for the link and it is giving the error

Error processing text: Could not parse text.

Below are my local machine details:

PHP Version: 7.4.21
Readability Version "fivefilters/readability.php": "dev-master"

On my remote machine, the same URL gives an error.

Remote Digital Ocean Machine has PHP 8.1.2 is installed

Deprecated: Return type of fivefilters\Readability\Nodes\NodeTrait::getAttribute($attributeName) should either be compatible with DOMElement::getAttribute(string $qualifiedName): string, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /var/www/html/xx/vendor/fivefilters/readability.php/src/Nodes/NodeTrait.php on line 170

Deprecated: Return type of fivefilters\Readability\Nodes\NodeTrait::hasAttribute($attributeName) should either be compatible with DOMElement::hasAttribute(string $qualifiedName): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /var/www/html/xx/vendor/fivefilters/readability.php/src/Nodes/NodeTrait.php on line 188

Can you please tell me what should I do?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.