Giter Club home page Giter Club logo

xray's Introduction

X-Ray your TYPO3 installation

This extension is a collection of utility commands that scan a TYPO3 installation for potential integrity improvements.

External links that could be internal Links

The Command

./bin/typo3 xray:external-links --dry-run

lists all external links that could be converted to internal links. This supports links to pages and files.

Without the --dry-run option the migration will be performed and the links will be rewritten in the t3:// syntax.

Sharing our expertise

Find more TYPO3 extensions we have developed that help us deliver value in client projects. As part of the way we work, we focus on testing and best practices to ensure long-term performance, reliability, and results in all our code.

xray's People

Contributors

bmack avatar davidsteeb avatar ervaude avatar peterkraume avatar sypets avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xray's Issues

Converting page links only evaluates subpages of root page

In a project where we tested the extension for the first time, we get no result when executing the console command. From our point of view, it is probably due to the fact that when checking for URLs, only subpages of the root page are evaluated.

The method B13\Xray\ExternalLinks\Converter\PageLinkConverter::getUrlCandidates() uses TYPO3\CMS\Core\Domain\Repository\PageRepository::getMenu() which does not return all subpages recursively.

Check if translated pages are properly handled?

I checked if translated pages are handled. This does not look right:

| 31678 | tt_content | 123225 | bodytext | http://site-uol/en/students | t3://page?uid=15145   |

It should be: t3://page?uid=15145&_language=1

The URL http://site-uol/en/students resolves to a translated page (sys_language_uid=1).

base: 'https://example.org'
baseVariants:
  -
    base: '%env(URL)%'
    condition: 'applicationContext == "Development/%env(SHORTCUT)%"'
languages:
  -
    title: Deutsch
    enabled: true
    languageId: 0
    base: /
    typo3Language: de
    locale: de_DE.UTF-8
    iso-639-1: de
    navigationTitle: Deutsch
    hreflang: de-DE
    direction: ltr
    flag: de
  -
    title: English
    enabled: true
    languageId: 1
    base: /en/
    typo3Language: default
    locale: en_US.UTF8
    iso-639-1: en
    navigationTitle: English
    hreflang: en-US
    direction: ltr
    fallbackType: strict
    fallbacks: '0'
    flag: gb

dry-run will abort with exception if file does not exist

There are a bunch of files in our site where the location was moved (and a redirect installed). Of course I don't expect the extension to be able to convert these but it should react robust and not abort on error.


php vendor/bin/typo3 xray:external-links --dry-run -v

In LocalDriver.php line 270:
                                                                                                           
  [InvalidArgumentException (1314516809)]                                                                  
  File /user_upload/musik/download/MA_Muwi/Modul_BM3_Musikwissenschaft_Erlaeuterungen.pdf does not exist.  
                                                                                                           

Exception trace:
  at /var/www/mysite/htdocs/typo3/sysext/core/Classes/Resource/Driver/LocalDriver.php:270
 TYPO3\CMS\Core\Resource\Driver\LocalDriver->getFileInfoByIdentifier() at /var/www/mysite/htdocs/typo3/sysext/core/Classes/Resource/ResourceStorage.php:1503
 TYPO3\CMS\Core\Resource\ResourceStorage->getFileInfoByIdentifier() at /var/www/mysite/htdocs/typo3/sysext/core/Classes/Resource/Index/Indexer.php:318
 TYPO3\CMS\Core\Resource\Index\Indexer->gatherFileInformationArray() at /var/www/mysite/htdocs/typo3/sysext/core/Classes/Resource/Index/Indexer.php:81
 TYPO3\CMS\Core\Resource\Index\Indexer->createIndexEntry() at /var/www/mysite/htdocs/typo3/sysext/core/Classes/Resource/ResourceFactory.php:421
 TYPO3\CMS\Core\Resource\ResourceFactory->getFileObjectByStorageAndIdentifier() at /var/www/t3dev.uni-oldenburg.de/t3dev2.uol.de/htdocs/typo3conf/ext/xray/Classes/ExternalLinks/Converter/FileLinkConverter.php:72
 B13\Xray\ExternalLinks\Converter\FileLinkConverter->finMatchingFileId() at /var/www/t3dev.uni-oldenburg.de/t3dev2.uol.de/htdocs/typo3conf/ext/xray/Classes/ExternalLinks/Converter/FileLinkConverter.php:53
 B13\Xray\ExternalLinks\Converter\FileLinkConverter->convert() at /var/www/t3dev.uni-oldenburg.de/t3dev2.uol.de/htdocs/typo3conf/ext/xray/Classes/ExternalLinks/ExternalLinkCollection.php:78
 B13\Xray\ExternalLinks\ExternalLinkCollection->convertAll() at /var/www/t3dev.uni-oldenburg.de/t3dev2.uol.de/htdocs/typo3conf/ext/xray/Classes/Command/ExternalLinksCommand.php:51
 B13\Xray\Command\ExternalLinksCommand->execute() at /var/www/mysite/htdocs/vendor/symfony/console/Command/Command.php:255
 Symfony\Component\Console\Command\Command->run() at /var/www/mysite/htdocs/vendor/symfony/console/Application.php:971
 Symfony\Component\Console\Application->doRunCommand() at /var/www/mysite/htdocs/vendor/symfony/console/Application.php:290
 Symfony\Component\Console\Application->doRun() at /var/www/mysite/htdocs/vendor/symfony/console/Application.php:166
 Symfony\Component\Console\Application->run() at /var/www/mysite/htdocs/typo3/sysext/core/Classes/Console/CommandApplication.php:91
 TYPO3\CMS\Core\Console\CommandApplication->run() at /var/www/mysite/htdocs/typo3/sysext/core/bin/typo3:28
 {closure}() at /var/www/mysite/htdocs/typo3/sysext/core/bin/typo3:29

Extension will result in syntax errors with PHP 7.3

protected string $baseUrl;

protected string $baseUrl;

will result in syntax error with PHP < 7.4.

php -l Classes/ExternalLinks/ExternalLink.php 
PHP Parse error:  syntax error, unexpected 'string' (T_STRING), expecting function (T_FUNCTION) or const (T_CONST) in Classes/ExternalLinks/ExternalLink.php on line 70

Errors parsing Classes/ExternalLinks/ExternalLink.php


If that is changed, rest of extension is PHP 7.3 compatible as far as I could tell.


Might be best to either change that or add PHP version constraints (since TYPO3 10 still supports 7.3 and even 7.2 - thought that is EOL).

Result with some rows with empty columns for link result in dry-run

Generally, the funktionality of converting links works. But I noticed another problem:

(also the problem with file links to not existing files still exists, see #2, but I "worked around" this for now.)

bin/typo3 xray:external-links --dry-run

DRY RUN. This is what would happen: 

+-------+------------+--------+----------+--------------------------------------------+-----------------------+
| PID   | Table      | UID    | Field    | Found external Link                        | Would be converted to |
+-------+------------+--------+----------+--------------------------------------------+-----------------------+
| 13493 | tt_content | 147092 | bodytext |                                            |                       |
| 61656 | tt_content | 299647 | bodytext |                                            |                       |
| 59    | tt_content | 111    | bodytext |                                            |                       |
... 
| 44744 | tt_content | 178196 | bodytext | https://dev.mydomain.de/studium              | t3://page?uid=15145   |
| 44732 | tt_content | 178197 | bodytext | https://dev.mydomain.de/studium              | t3://page?uid=15145   |

etc.

  • lots of columns are displayed without the URLs (in last 2 columns): 27137
  • columns with URLs: 108

Where the URLs are displayed, the result is correct. (Also, if it is run without --dry-run the links will be converted correctly, at least I verified that in one CE ๐Ÿ˜„).


  1. I looked at the first row (content of 147092) where the URLs are not displayed. It is a content element in English (sys_language_uid=1) with content:
<h4>Maps &amp; Directions</h4>

<p class="mit-icon map"><a href="https://dev.mydomain.de/en/contact/">Directions to the University</a></p>

<p class="mit-icon map"><a href="https://dev.mydomain.de/uni/lageplan.php?wo=A5">Site Map Building A5</a></p>

The first link could be converted to a page link (the URL works and page exists).

  1. The second row where URls are not displayed is 299647 in default language. I don't think the language is the problem because most CE are in default language and most links will be as well.

The content is:

<ul>
	<li><strong><a href="https://dev.mydomain.de/adapt-lockin">Climate adaptation policy lock-ins: a 3x3 approach</a></strong></li>
</ul>

The page https://dev.mydomain.de/adapt-lockin exists.


So to sum it up: I don't know the reason right now, I would have to debug but can't do that on this site.


dev.mydomain.de, domain.de are fictional domains. I am working on a copy of the production site mydomain.de with a different dev domain. I converted the links in tt_content.bodytext in order to test this:

MariaDB> update tt_content set bodytext=REPLACE(bodytext,'"https://mydomain.de/','"https://dev.mydomain.de/') where bodytext like '%https://mydomain.de%';
Query OK, 21913 rows affected (7.274 sec)
Rows matched: 23571  Changed: 21913  Warnings: 0

Because of #2 (problem with links to not existing files), the file links are converted back to the original domain and will not be considered for now:

MariaDB> update tt_content set bodytext=REPLACE(bodytext,'"https://dev.mydomain.de/fileadmin/','"https://mydomain.de/fileadmin/') where bodytext like '%"https://dev.mydomain.de/fileadmin/%';
Query OK, 884 rows affected (7.094 sec)
Rows matched: 884  Changed: 884  Warnings: 0
MariaDB> update tt_content set bodytext=REPLACE(bodytext,'"https://dev.mydomain.de/f/','"https://mydomain.de/f/') where bodytext like '%"https://dev.mydomain.de/f/%';

Test report (preliminary)

I have found some minor problems (most of which are already fixed), the extension works nicely.

So far I have tested the following:

  • site configuration is considered correctly
  • Links to files are detected and converted correctly (also, other configured storages besides fileadmin considered)
  • Links to pages are detected and converted correctly (including links to translated files, but this currently already uses the _language parameter instead of the L parameter which needs a new TYPO3 core patch until next release, see #11 (comment))

...

If ok, I would like to keep this issue open and add more of my findings as I make progress.

Also might be nice to add tests. If ok, I can push a patch to start add tests.

If configuring a site without trailing slash, duplicate results may be displayed

The following only happens if the site is configured without the trailing slash. I am sometimes not sure if this is necessary. It seems to work either way in TYPO3. In any case, it might be good if the extension handles this fault-tolerant.

Example:

base: 'https://example.org'

=> Problem

base: 'https://example.org/'

=> no problem

Result

For the site without trailing slash, I get duplicate results with xray:

+-------+------------+--------+----------+------------------------------------------------------------------------------------+-----------------------+
| PID   | Table      | UID    | Field    | Found external Link                                                                | Would be converted to |
+-------+------------+--------+----------+------------------------------------------------------------------------------------+-----------------------+
| 85803 | tt_content | 407936 | bodytext | http://site-uol/fileadmin/user_upload/aktuelles/medizin/medizinische-forschung.jpg | t3://file?uid=263545  |
| 85803 | tt_content | 407936 | bodytext | http://site-uol/fileadmin/user_upload/aktuelles/medizin/medizinische-forschung.jpg | t3://file?uid=263545  |
+-------+------------+--------+----------+------------------------------------------------------------------------------------+-----------------------+

This

$this->addExternalLinksWithBaseUrlToCollection(

looks like it might have to be inside the if.

What I see when debugging, is that the same link is added twice to the $this->collection->links.

The siteBase and languageBase is not always the same, specifically the trailing slash:

languageBase=http://site-uol/
siteBase=http://site-uol

This might be an error in my configuration, but the resolving of URLs generally works, so probably the trailing slash be handled fault tolerant here as well.

I am currently on a development system with different domain.

config

.env

URL="http://site-uol"

config.yaml

base: 'https://uol.de'
baseVariants:
  -
    base: '%env(URL)%'
    condition: 'applicationContext == "Development/%env(SHORTCUT)%"'

site resolves to http://site-uol

Extracting of Links too greedy

While writing Tests for ExternalLink I have some cases where wrong results are generated.

E.g. content in bodytext:

  • 'Content https://example.com/abc, hello' => https://example.com/abc, (trailing URL)
  • Content <a href="https://no.match.example.com/path">https://example.com/abc</a>, => 'https://example.com/abc</a>,' (should not match at all and trailing </a>

Also, in xray the softref parsers are not used, but a regular expression directly, which makes it inflexible.

I had similar problems in Linkvalidator (at least the thing with the comma). For my site I would probably want to do away with handling URIs as links entirely in bodytext. I find this messy. I am thinking about changing softref parser to remove url from it:

softref = rtehtmlarea_images,typolink_tag,email[subst],url

Maybe we can think about doing that in core as well, but not scope of this patch.

Another idea is to have more lowlevel public functions in typo3/cms-core and reuse that. This way can fix problems in core (and will work for linkvalidator, xray and other use cases).

code

public function prepareMatchedLinks(): void
    {
        $matches = [];
        preg_match('#' . $this->baseUrl . '/?[^" ]*#', $this->fieldContent, $matches);
        $this->matches = $matches;
    }

public function prepareMatchedLinks(): void

Do comparison with mediafile_ext (file extension) case insensitive

 protected function canConvert(ExternalLink $link): bool
    {
        return in_array(
            $link->getExtension(),
            explode(',', $GLOBALS['TYPO3_CONF_VARS']['SYS']['mediafile_ext'] ?? '')
        );
    }

explode(',', $GLOBALS['TYPO3_CONF_VARS']['SYS']['mediafile_ext'] ?? '')

In the core, the comparison is done case-insensitive:

    public function isMediaFile($ext)
    {
        return GeneralUtility::inList(strtolower($GLOBALS['TYPO3_CONF_VARS']['SYS']['mediafile_ext']), strtolower($ext));
    }

Also, maybe the core function could be used.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.