Giter Club home page Giter Club logo

purple-a11y's Introduction

Purple A11y

Purple A11y is a customisable, automated accessibility testing tool that allows software development teams to assess whether their products are user-friendly to persons with disabilities (PWDs).

This is the engine and command-line interface (CLI) for Purple A11y. For a desktop graphical user interface (GUI), check out Purple A11y Desktop. The official application can only be downloaded at https://go.gov.sg/purple-a11y-cicd. We recommend that you download the software only from the official link, as other sources and/or third party links may pose risks and/or compromise your system.

Technology Stack

  1. Crawlee
  2. Axe-core
  3. Node.js
  4. Playwright
  5. Pixelmatch
  6. Corretto
  7. VeraPDF

Using Purple A11y as a NodeJS module

If you wish to use Purple A11y as a NodeJS module that can be integrated with end-to-end testing frameworks, refer to the integration guide on how you can do so.

Prerequisites and Installations

Portable Purple A11y

Portable Purple A11y is the recommended way to run Purple A11y as it reduces the difficulty for installation. Refer to Installation Guide for step-by-step instructions.

Manual Installation

Please ensure the following requirements are met:

  • Node.js version to be version 15.10.0 and above.
  • To check your version of Node go into terminal and paste the command bellow
node -v

Usage of Node Version Manager (NVM)

# If have not installed a version >= v15, install NodeJs version with NVM
nvm install <nodejs_version_greater_than_15>

# For subsequent use, you will need to run the command below as time you open a new terminal
nvm use <nodejs_version_greater_than_15>

Facing issues?

Please refer to Troubleshooting section for more information.


Features

Purple A11y can perform the following to scan the target URL.

  • To run Purple A11y in terminal, run node index. Questions will be prompted to assist you in providing the right inputs.
  • Results will be compiled in JSON format, followed by generating a HTML report.

NOTE: For your initial scan, there may be some loading time required before use. Purple-A11y will also ask for your name and email address and collect your app usage data to personalise your experience. Your information fully complies with GovTech’s Privacy Policy.

Delete/Edit Details

You may delete and edit your cached name and e-mail address by running the following command to delete userData.txt:

  • Windows (PowerShell): rm "$env:APPDATA\Purple A11y\userData.txt"
  • MacOS (Terminal): rm "$HOME/Library/Application Support/Purple A11y/userData.txt"

If userData.txt does not exists just run node index.

Scan Selection

You can interact via your arrow keys.

% node index
┌────────────────────────────────────────────────────────────┐
│  Purple A11y (ver      )                                   │
│  We recommend using Chrome browser for the best experience.│
│                                                            │
│ Welcome back User!!                                        │
│ (Refer to readme.txt on how to change your profile)        │
└────────────────────────────────────────────────────────────┘
? What would you like to scan? (Use arrow keys)
❯ Sitemap
  Website
  Custom

Headless Mode

Headless mode would allow you to run the scan in the background. If you would like to observe the scraping process, please enter n

 % node index
┌────────────────────────────────────────────────────────────┐
│ Purple A11y (ver      )                                    │
│ We recommend using Chrome browser for the best experience. │
│                                                            │
│ Welcome back User!                                         │
│ (Refer to readme.txt on how to change your profile)        │
└────────────────────────────────────────────────────────────┘
? What would you like to scan? Sitemap
? Do you want purple-a11y to run in the background? (Y/n) No

Sitemap Scan

% node index
┌────────────────────────────────────────────────────────────┐
│ Purple A11y (ver      )                                     │
│ We recommend using Chrome browser for the best experience. │
│                                                            │
│ Welcome back User!                                         │
│ (Refer to readme.txt on how to change your profile)        │
└────────────────────────────────────────────────────────────┘
? What would you like to scan? Sitemap
? Do you want purple-a11y to run in the background? No
? Which screen size would you like to scan? (Use arrow keys) Desktop
? Please enter URL or file path to sitemap, or drag and drop a sitemap file here:  https://www.sitemaps.org/sitemap.xml


 Scanning website...


 Fetching URLs. This might take some time...


Scanning website...

#purple-a11y will then start scraping from the file link provided above.
#Console results

If the sitemap URL provided is invalid, an error message will be prompted for you to provide a valid input.

>> Invalid sitemap format. Please provide a URL with a valid sitemap.

Website Scan

% node index
┌────────────────────────────────────────────────────────────┐
│ Purple A11y (ver      )                                    │
│ We recommend using Chrome browser for the best experience. │
│                                                            │
│ Welcome back User!                                         │
│ (Refer to readme.txt on how to change your profile)        │
└────────────────────────────────────────────────────────────┘
? What would you like to scan? Website
? Do you want purple-a11y to run in the background? Yes
? Which screen size would you like to scan? (Use arrow keys) Desktop
? Please enter URL of website:  https://www.domain.org

If the website URL provided is invalid, an error message will be prompted for you to provide a valid input.

>> Cannot resolve URL. Please provide a valid URL.

Customised Mobile Device Scan

% node index
┌────────────────────────────────────────────────────────────┐
│ Purple A11y (ver      )                                   │
│ We recommend using Chrome browser for the best experience. │
│                                                            │
│ Welcome back User!                                         │
│ (Refer to readme.txt on how to change your profile)        │
└────────────────────────────────────────────────────────────┘
? What would you like to scan? Website
? Do you want purple-a11y to run in the background? No
? Which screen size would you like to scan? (Use arrow keys) (Use arrow keys)
❯ Desktop 
  Mobile
  Custom

Choose Mobile for a default mobile screen size scan and Custom to choose a device or specify viewport width options.

Custom flow (Preview)

Custom flow allows you to record a series of actions in the browser and re-play them and Purple A11y will trigger the accessibility scan at each step. This is useful to scan websites that require user and form input. The recorded script will be stored as generatedScript*.js.

  1. Start by choosing the Custom flow in the menu selection.
% node index
┌────────────────────────────────────────────────────────────┐
│ Purple A11y (ver      )                                   │
│ We recommend using Chrome browser for the best experience. │
│                                                            │
│ Welcome back User!                                         │
│ (Refer to readme.txt on how to change your profile)        │
└────────────────────────────────────────────────────────────┘
? What would you like to scan?
  Sitemap
  Website
❯ Custom
  1. Specify the URL of the starting page you wish to scan
  2. A Chrome and Playwright Inspector window will appear. Navigate through the pages you would like to conduct an accessibility scan.
  3. Close the Chrome window. Purple A11y will then proceed to re-run your recorded actions and scan each page for accessibility.

Other options:

  • You can specify sites to exclude from accessibility scan (e.g. login page) by adding a pattern of the domain to exclusions.txt. An example of exclusions.txt:
\.*login.singpass.gov.sg\.*
  • You can re-run your accessibility scan by running node generatedScript-PHScan_...js file that is generated.

Caution: During the custom flow, sensitive information such as username and passwords might be stored in generatedScript*.js as part of the recording.

Known Issues

If the custom flow fails to start, you might be running multiple versions of Playwright. Re-install Playwright:

  1. On Windows, delete the folder %USERPROFILE%\AppData\Local\ms-playwright where %USERPROFILE% is typically located at C:\Users\<username>.
  2. On MacOS, delete the folder ~/Library/Caches/ms-playwright where ~ refers to /Users/<username>.
  3. Within PowerShell (Windows) or Terminal (MacOS) app, run the following two commands to re-install Playwright:
npx [email protected] install

CLI Mode

CLI mode is designed to be run in continuous integration (CI) environment. Run node cli.js for a set of command-line parameters available.

Usage: node cli.js -c <crawler> -d <device> -w <view
port> -u <url> OPTIONS

Options:
      --help                         Show help                         [boolean]
  -c, --scanner                      Type of scan, 1) sitemap, 2) website crawl,
                                      3) custom flow, 4) custom flow 2.0, 5) int
                                     elligent
  [required] [choices: "sitemap", "website", "custom", "custom2", "intelligent"]
  -u, --url                          Website URL you want to scan
                                                             [string] [required]
  -d, --customDevice                 Device you want to scan            [string]
  -w, --viewportWidth                Viewport width (in pixels) you want to scan
                                                                        [number]
  -o, --zip                          Zip filename to save results       [string]
  -p, --maxpages                     Maximum number of pages to scan (default: 1
                                     00). Only available in website and sitemap
                                     scans                              [number]
  -f, --safeMode                     Option to disable dynamically clicking of p
                                     age buttons and links to find links, which
                                     resolve issues on some websites. Defaults t
                                     o no.
                                 [string] [choices: "yes", "no"] [default: "no"]
  -h, --headless                     Whether to run the scan in headless mode. D
                                     efaults to yes.
                                [string] [choices: "yes", "no"] [default: "yes"]
  -b, --browserToRun                 Browser to run the scan on: 1) Chromium, 2)
                                      Chrome, 3) Edge. Defaults to Chromium.
                     [choices: "chromium", "chrome", "edge"] [default: "chrome"]
  -s, --strategy                     Crawls up to general (same parent) domains,
                                      or only specific hostname. Defaults to "sa
                                     me-domain".
                                       [choices: "same-domain", "same-hostname"]
  -e, --exportDirectory              Preferred directory to store scan results.
                                     Path is relative to your home directory.
                                                                        [string]
  -j, --customFlowLabel              Give Custom Flow Scan a label for easier re
                                     ference in the report              [string]
  -k, --nameEmail                    To personalise your experience, we will be
                                     collecting your name, email address and app
                                      usage data. Your information fully complie
                                     s with GovTech’s Privacy Policy. Please pro
                                     vide your name and email address in this fo
                                     rmat "John Doe:[email protected]".
                                                             [string] [required]
  -t, --specifiedMaxConcurrency      Maximum number of pages to scan concurrentl
                                     y. Use for sites with throttling. Defaults
                                     to 25.                             [number]
  -i, --fileTypes                    File types to include in the scan. Defaults
                                      to html-only.
       [string] [choices: "all", "pdf-only", "html-only"] [default: "html-only"]
  -x, --blacklistedPatternsFilename  Txt file that has a list of pattern of doma
                                     ins to exclude from accessibility scan sepa
                                     rated by new line
                                            [string] [default: "exclusions.txt"]
  -a, --additional                   Additional features to include in the repor
                                     t:
                                     screenshots - Include element screensho
                                     ts in the generated report
                                     none - Exclude
                                     all additional features in the generated re
                                     port
              [string] [choices: "screenshots", "none"] [default: "screenshots"]
  -q, --metadata                     Json string that contains additional scan m
                                     etadata for telemetry purposes. Defaults to
                                      "{}"              [string] [default: "{}"]
  -r, --followRobots                 Option for crawler to adhere to robots.txt
                                     rules if it exists
                                 [string] [choices: "yes", "no"] [default: "no"]
  -m, --header                       The HTTP authentication header keys and the
                                     ir respective values to enable crawler acce
                                     ss to restricted resources.        [string]

Examples:
  To scan sitemap of website:', 'node cli.js -c [ 1 | sitemap ] -u <url_link>
  [ -d <device> | -w <viewport_width> ]
  To scan a website', 'node cli.js -c [ 2 | website ] -u <url_link> [ -d <devi
  ce> | -w <viewport_width> ]
  To start a custom flow scan', 'node cli.js -c [ 3 | custom ] -u <url_link> [
   -d <device> | -w <viewport_width> ]

Device Options

Click here for list of device options supported
  • "Desktop" (defaults to a 1280x720 viewport)
  • "Mobile" (defaults to iPhone 11 viewport)
  • "Desktop Chrome HiDPI"
  • "Desktop Edge HiDPI"
  • "Desktop Firefox HiDPI"
  • "Desktop Safari"
  • "Desktop Chrome"
  • "Desktop Edge"
  • "Desktop Firefox"
  • "Blackberry PlayBook"
  • "Blackberry PlayBook landscape"
  • "BlackBerry Z30"
  • "BlackBerry Z30 landscape"
  • "Galaxy Note 3"
  • "Galaxy Note 3 landscape"
  • "Galaxy Note II"
  • "Galaxy Note II landscape"
  • "Galaxy S III"
  • "Galaxy S III landscape"
  • "Galaxy S5"
  • "Galaxy S5 landscape"
  • "Galaxy S8"
  • "Galaxy S8 landscape"
  • "Galaxy S9+"
  • "Galaxy S9+ landscape"
  • "Galaxy Tab S4"
  • "Galaxy Tab S4 landscape"
  • "iPad (gen 6)"
  • "iPad (gen 6) landscape"
  • "iPad (gen 7)"
  • "iPad (gen 7) landscape"
  • "iPad Mini"
  • "iPad Mini landscape"
  • "iPad Pro 11"
  • "iPad Pro 11 landscape"
  • "iPhone 6"
  • "iPhone 6 landscape"
  • "iPhone 6 Plus"
  • "iPhone 6 Plus landscape"
  • "iPhone 7"
  • "iPhone 7 landscape"
  • "iPhone 7 Plus"
  • "iPhone 7 Plus landscape"
  • "iPhone 8"
  • "iPhone 8 landscape"
  • "iPhone 8 Plus"
  • "iPhone 8 Plus landscape"
  • "iPhone SE"
  • "iPhone SE landscape"
  • "iPhone X"
  • "iPhone X landscape"
  • "iPhone XR"
  • "iPhone XR landscape"
  • "iPhone 11"
  • "iPhone 11 landscape"
  • "iPhone 11 Pro"
  • "iPhone 11 Pro landscape"
  • "iPhone 11 Pro Max"
  • "iPhone 11 Pro Max landscape"
  • "iPhone 12"
  • "iPhone 12 landscape"
  • "iPhone 12 Pro"
  • "iPhone 12 Pro landscape"
  • "iPhone 12 Pro Max"
  • "iPhone 12 Pro Max landscape"
  • "iPhone 12 Mini"
  • "iPhone 12 Mini landscape"
  • "iPhone 13"
  • "iPhone 13 landscape"
  • "iPhone 13 Pro"
  • "iPhone 13 Pro landscape"
  • "iPhone 13 Pro Max"
  • "iPhone 13 Pro Max landscape"
  • "iPhone 13 Mini"
  • "iPhone 13 Mini landscape"
  • "Kindle Fire HDX"
  • "Kindle Fire HDX landscape"
  • "LG Optimus L70"
  • "LG Optimus L70 landscape"
  • "Microsoft Lumia 550"
  • "Microsoft Lumia 550 landscape"
  • "Microsoft Lumia 950"
  • "Microsoft Lumia 950 landscape"
  • "Nexus 10"
  • "Nexus 10 landscape"
  • "Nexus 4"
  • "Nexus 4 landscape"
  • "Nexus 5"
  • "Nexus 5 landscape"
  • "Nexus 5X"
  • "Nexus 5X landscape"
  • "Nexus 6"
  • "Nexus 6 landscape"
  • "Nexus 6P"
  • "Nexus 6P landscape"
  • "Nexus 7"
  • "Nexus 7 landscape"
  • "Nokia Lumia 520"
  • "Nokia Lumia 520 landscape"
  • "Nokia N9"
  • "Nokia N9 landscape"
  • "Pixel 2"
  • "Pixel 2 landscape"
  • "Pixel 2 XL"
  • "Pixel 2 XL landscape"
  • "Pixel 3"
  • "Pixel 3 landscape"
  • "Pixel 4"
  • "Pixel 4 landscape"
  • "Pixel 4a (5G)"
  • "Pixel 4a (5G) landscape"
  • "Pixel 5"
  • "Pixel 5 landscape"
  • "Moto G4"
  • "Moto G4 landscape"

If the device name contains ( and ), wrap the device name in single quotes when entered into the CLI. Please note that -d and -w are mutually exclusive. If none are specified, the default device used for the CLI scan is Desktop.

For example, to conduct a website scan to the URL "http://localhost:8000" and write to "a11y-scan-results.zip" with an 'iPad (gen 7) landscape' screen, run

node cli.js -c 2 -o a11y-scan-results.zip -u http://localhost:8000 -d 'iPad (gen 7) landscape'

If the site you want to scan has a query string wrap the link in single quotes when entered into the CLI.

For example, to conduct a website scan to the URL "http://localhost:8000" and write to "a11y-scan-results.zip" with a custom screen width '360', run

node cli.js -c 2 -o a11y-scan-results.zip -u "http://localhost:8000" -w 360

Report

Once a scan of the site is completed.

A report will be downloaded into the current working directory.

Accessibility Scan Results

Each Issue has its own severity "Must Fix" / "Good to Fix" based on the WCAG Conformance.

For details on which accessibility scan results triggers a "Must Fix" / "Good to Fix" findings, you may refer to Scan Issue Details.

Troubleshooting

Please refer to the information below to assist in debugging. Most errors below are due to the switching between Node.js versions.

Incompatible Node.js versions

Issue: When your Node.js version is incompatible, you may face the following syntax error.

const URL_NO_COMMAS_REGEX = RegExp('https?://(www\\.)?[\\p{L}0-9][-\\p{L}0-9@:%._\\+~#=]{0,254}[\\p{L}0-9]\\.[a-z]{2,63}(:\\d{1,5})?(/[-\\p{L}0-9@:%_\\+.~#?&//=\\(\\)]*)?', 'giu'); // eslint-disable-line
                            ^
SyntaxError: Invalid regular expression: /https?://(www\.)?[\p{L}0-9][-\p{L}0-9@:%\._\+~#=]{0,254}[\p{L}0-9]\.[a-z]{2,63}(:\d{1,5})?(/[-\p{L}0-9@:%_\+.~#?&//=\(\)]*)?/: Invalid escape

Solution: Install Node.js versions > v15.10.0, i.e. Node.js v16 and above.

Compiled against a different Node.js version

Issue: When you switch between different versions of Node.js in your environment, you may face the following error.

<user_path>/purple-a11y/node_modules/bindings/bindings.js:91
        throw e
        ^

Error: The module '<module_file_path>'
was compiled against a different Node.js version using
NODE_MODULE_VERSION 57. This version of Node.js requires
NODE_MODULE_VERSION 88. Please try re-compiling or re-installing
the module (for instance, using `npm rebuild` or `npm install`).

Solution: As recommended in the error message, run npm rebuild or npm install

dyld Error

Issue: Not able to run Purple A11y due to the following error shown below

dyld: lazy symbol binding failed: Symbol not found: __ZN2v87Isolate37AdjustAmountOfExternalAllocatedMemoryEx
  Referenced from: <user_path>/purple-a11y/node_modules/libxmljs/build/Release/xmljs.node
  Expected in: flat namespace

dyld: Symbol not found: __ZN2v87Isolate37AdjustAmountOfExternalAllocatedMemoryEx
  Referenced from: <user_path>/PURPLE_A11y/purple-a11y/node_modules/libxmljs/build/Release/xmljs.node
  Expected in: flat namespace

zsh: abort      node index.js

Solutions:

  1. Delete existing node_modules folder and re-install the NPM packages with npm install.
  2. Refer to this GitHub issue for more alternative solutions

Element Screenshot Limitation

Limitation: Due to animations causing elements to shift out of the viewport after an Axe scan, there's a risk of element screenshots timing out within 5 seconds if the element is not found. This known issue is particularly prevalent in scenarios like carousels with interval-based transitions.

FAQ

How do I limit number of pages scanned?

If you find a scan takes too long to complete due to large website, or there are too many pages in a sitemap to scan, you may choose to limit number of pages scanned.

To do this, run CLI mode node cli.js with the needed settings and specify -p 10 where 10 is the number of pages you wish to scan.

I am a new developer and I have some knowledge gap.

We recommend looking at our Technology Stack to understand the usage of each component. Take your time to understand.

Additional Information on Data

Purple A11y uses third-party open-source tools that may be downloaded over the Internet during the installation process of Purple A11y. Users should be aware of the libraries used by examining package.json.

Purple A11y may send information to the website, URL and HTML code snippets where the user chooses to initiate a Purple A11y scan for telemetry purposes.

purple-a11y's People

Contributors

angyonghaseyo avatar bernicecpz avatar brianteeman avatar chunteck avatar ckodes avatar cy002 avatar dependabot[bot] avatar georgetxm avatar greyguy21 avatar jodichoo avatar joshualai9922 avatar khoodehui avatar leeyixuan21 avatar lxmn1 avatar magdelineng avatar marcuskerh2017 avatar thumster avatar xinyicxy avatar yangkaiting avatar younglim avatar yunruu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

purple-a11y's Issues

Provide Proof of Progress

If you are scanning pages with purple hats, demonstate that there is improvement over time.

Build a score based on axe criteria which addresses severity - Score = (critical3 + serious2 + moderate1.5 minor) / urls5 - which allows you to easily compare a 1000 page site to a 10000 page site with a numeric value which lets you determine if the site is getting better or worse for users. https://github.com/CivicActions/purple-hats/blob/master/mergeAxeResults.js#L383

This just makes it easier for people to see relative scores.

It is also critical to remind folks about Goodhart's law, that states: "When a measure becomes a target, it ceases to be a good measure". A score based on axe results is just one way to measure accessibility.

Make it easier to find Sitemap.xml files that can be used.

There are a number of sites that have sitemaps but it isn't always easy to find them from a list of URLs.

To reduce the amount of time required to find them it would be helpful to have a script to seek out common places where sitemaps may be defined.

Make it easier to amplify the sitemap.xml crawl

The ability to scan sitemap.xml files in Purple A11y is powerful. However, the sitemaps generally are too big to be effectively crawled by this script.

I've created a script that I think can help make existing sitemap.xml files more powerful.

Scanning just a handful of sites is a problem. Scanning all web pages in a site also brings challenges with it, particularly for larger sites. To have confidence in accessibility, a random sampling of pages, should allow us to get statistical certainty of our knowledge about the accessibility of a whole site.

The trouble is that there are often too many URLs, and you have site-maps of sitemaps.

My script which aggregates the XML files into just one and then removes all the stuff that we don’t want to be analyzing (.doc, .pdf, .zip, ext.) This produces I have a random sampling of URLs. This XML file can be used in the future to test if the exact same URLs have improved over time (or not). I’m capping sitemap size at 2000 URLs as that is a pretty decent sampling for the sites we work with.

It could be enhanced in the future to ensure that files like the home page, search page, representative landing pages, and any unusual pages are included in the scan. This could be something that is just appended.

Are there lists of sitemap.xml tools that folks find useful?

Upgrade to latest version of axe

I've got this in a PR here:
#15

Just making sure it is also in the issue queue.

The current version of Purple Hats is using an older version of the axe-core engine.

Not scanning all pages for all sitemap.xml files in some, more in others.

I'd like to be able to scan more than a hundred pages. Actually more than a thousand. I try this:

node cli.js -c 1 -u file:///Users/mgifford/myOwnSitemap.xml -p 1500 -k "Me Really:[email protected]"

And it only gets me:

Sitemap crawl (139 pages)
Purple A11y Version 0.9.43

Any thoughts?

I can see another with:

Sitemap crawl (955 pages)
Purple A11y Version 0.9.43

But both of these sitemap.xml files has more than a 1000 URLs.

Include xpath & severity from axe

Axe includes the xpath in their csv reports, but Purple Hats does not.

Xpaths are useful for identifying where there may be multiple instances across multiple pages. They are also very useful for finding the exact reference which was causing the error.

I also do think that the Critical, Serious, Moderate & Minor impact is useful. You split yours up into the WCAG vs Deque best practices which is useful to. However it should be possible to capture this:
https://github.com/dequelabs/axe-core/blob/develop/doc/issue_impact.md

I also liked how you had in earlier versions what types of disabilities were affected by a given SC. That was useful.

More data that may be usefu to folks:
https://github.com/CivicActions/accessibility-data-reference

PDFs are being scanned when they shouldn't be.

I am not setting the filetype

  -i, --fileTypes                   

With

node --max-old-space-size=6000 --no-deprecation purple-a11y/cli.js -u https://www.whitehouse.gov -c 2 -s same-domain -p 50  -a none --blacklistedPatternsFilename ./pa-gTracker-exclude-medicare.csv -k "Random Example:[email protected]"

But I am still finding PDFs in the list of URLs crawled. This shouldn't be the case..  If the default is html only then I shouldn't see any PDFs (or other docs) in my results.

Couldn't keep to the domain

When I added -a, but I should be able to do either:

  -a, --additional

With

node --max-old-space-size=6000 --no-deprecation purple-a11y/cli.js -u https://www.whitehouse.gov -c 2 -s same-domain -p 50  -a none --blacklistedPatternsFilename ./pa-gTracker-exclude-medicare.csv -k "Random Example:[email protected]"

It ran fine, but I found sub-domains in the returned results.

yargs needed for CLI

I tried to run the CLI with using Node.js v20.0.0:

% node cli.js -c -u https://www.va.gov

Got this error:

node:internal/errors:490
   ErrorCaptureStackTrace(err);
   ^

Error [ERR_MODULE_NOT_FOUND]: Cannot find module '/Users/mgifford/node_modules/yargs/helpers' imported from /Users/mgifford/purple-hats/cli.js
...

I think this resolved the problem:

% npm install yargs

Filenames should be unique

The reports should have URL & date hard coded into the names. So say 18f.gov-25012020-report.html & 18f.gov-25012020-compiledResult.json

This would make it easier to track.

ERR_DLOPEN_FAILED

From /purple-hats-portable-mac-arm64/purple-hats when I run $ node index I get the following error. 

node:internal/modules/cjs/loader:1243
  return process.dlopen(module, path.toNamespacedPath(filename));
                 ^
Error: dlopen(/Users/mgifford/node_modules/canvas/build/Release/canvas.node, 0x0001): tried: '/Users/mgifford/node_modules/canvas/build/Release/canvas.node' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/Users/mgifford/node_modules/canvas/build/Release/canvas.node' (no such file), '/Users/mgifford/node_modules/canvas/build/Release/canvas.node' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))
    at Module._extensions..node (node:internal/modules/cjs/loader:1243:18)
    at Module.load (node:internal/modules/cjs/loader:1037:32)
    at Module._load (node:internal/modules/cjs/loader:878:12)
    at Module.require (node:internal/modules/cjs/loader:1061:19)
    at require (node:internal/modules/cjs/helpers:103:18)
    at Object.<anonymous> (/Users/mgifford/node_modules/canvas/lib/bindings.js:3:18)
    at Module._compile (node:internal/modules/cjs/loader:1159:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1213:10)
    at Module.load (node:internal/modules/cjs/loader:1037:32)
    at Module._load (node:internal/modules/cjs/loader:878:12) {
  code: 'ERR_DLOPEN_FAILED'
}


Node.js v18.12.1
bash-3.2$

Document how to run against local lists

From this:

  1. A text file with a list of links
  2. A XML file with XML tags in Sitemap Protocol format

It sounds like I should be able to run this against a local file on my local system, but I can't seem to get the formatting to accept a local path.

Closest I can come gives me this error:

Scanning website...
(node:76046) UnhandledPromiseRejectionWarning: Error: Cannot fetch a request list from file:/Users/mikegifford/purple-hats/url_list_Olivero.xml: RequestError: Error: Invalid URI "file:///Users/mikegifford/purple-hats/url_list_Olivero.xml"
    at RequestList._fetchRequestsFromUrl (/Users/mikegifford/purple-hats/node_modules/apify/build/request_list.js:564:13)
    at process._tickCallback (internal/process/next_tick.js:68:7)
    at evalScript (internal/bootstrap/node.js:585:13)
    at startup (internal/bootstrap/node.js:267:9)
    at bootstrapNodeJSCore (internal/bootstrap/node.js:739:3)
(node:76046) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 2)
(node:76046) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
The file /Users/mikegifford/purple-hats/results/2020-9-10/159975560855e6613c80/reports/report.html does not exist.

Mac Installation Error

After cloning the repo I get this:

$ chmod +x mac-installer.sh 
$ bash mac-installer.sh 
: command not foundine 2: 
mac-installer.sh: line 49: syntax error: unexpected end of file

Ended up running:

$ dos2unix mac-installer.sh 
dos2unix: converting file mac-installer.sh to Unix format...

And it's working fine now.

Ability to parse list of URLs for scan

A user might have a list of URLs to scan. Other than existing sitemap scan, a feature to upload a CSV file to scan would help users scan a subsite with a list of URLs.

BasicCrawler limit

Is there a way to get around this?

INFO: BasicCrawler: Crawler reached the maxRequestsPerCrawl limit of 100 requests and will shut down soon. Requests that are in progress will be allowed to finish.
INFO: BasicCrawler: Earlier, the crawler reached the maxRequestsPerCrawl limit of 100 requests and all requests that were in progress at that time have now finished. In total, the crawler processed 104 requests and will shut down.

Exclude urls matching a pattern

The sites I am scanning all have rss links in the format example.com/page?format=feed&type=rss

Is it possible to exclude all urls that have this pattern? I tried to add an entry in exclusions.txt \.*format=feed&type=rss\.* but that had no impact

Respect Robots.txt Files

Sites should respect the robots.txt files that some sites use to manage traffic.

Would be great if by default the scanner respected the wishes of the site owner.

01/03/2024 Not able to run on Windows 11

Hello,
I am currently able to run the 12/14 release, but I get an error with 01/03/2024 release. One of the changes must have had a negative impact, or the installation instructions need to be modified for something. I'm unsure.

Error:
{"timestamp":"2024-01-12 17:50:10","level":"error","message":"uncaughtException: Invalid URL\nTypeError: Invalid URL\n at new URL (node:internal/url:775:36)\n at getHost (file:///C:/Users/User/Desktop/Purple_A11Y_01032024/purple-a11y/utils.js:15:31)\n at combineRun (file:///C:/Users/User/Desktop/Purple_A11Y_01032024/purple-a11y/combine.js:42:81)\n at runScan (file:///C:/Users/User/Desktop/Purple_A11Y_01032024/purple-a11y/index.js:62:11)\n at file:///C:/Users/User/Desktop/Purple_A11Y_01032024/purple-a11y/index.js:99:11\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)"}

Error when Chrome is running

There is nothing that says chrome must not be running before starting a scan.
If that is a requirement then there should be a friendly message and not an error.


PS D:\repos\purple-a11y> node index
┌────────────────────────────────────────────────────────────┐
│ Purple A11y (ver 0.9.46)                                   │
│ We recommend using Chrome browser for the best experience. │
│                                                            │
│ Welcome back Brian Teeman!                                 │
│ (Refer to readme.txt on how to change your profile)        │
└────────────────────────────────────────────────────────────┘
? What would you like to scan? (Use arrow keys)
❯ Sitemap
? What would you like to scan? Sitemap
? Do you want purple-a11y to run in the background? Yes
? Which screen size would you like to scan? (Use arrow keys) Desktop
? Please enter URL or file path to sitemap, or drag and drop a sitemap file here:  https://brianstest.site

 Error: EBUSY: resource busy or locked, copyfile 'C:\Users\brian\AppData\Local\Google\Chrome\User Data\Default\Network\Cookies' -> 'C:\Users\brian\AppData\Local\Google\Chrome\User Data\Purple-A11y\Default\Network\Cookies' 


C:\Users\brian\AppData\Local\Microsoft\Edge\User Data\Purple-A11y destDir
true cloneLocalStateFileSuccess


 Error: EBUSY: resource busy or locked, copyfile 'C:\Users\brian\AppData\Local\Microsoft\Edge\User Data\Default\Network\Cookies' -> 'C:\Users\brian\AppData\Local\Microsoft\Edge\User Data\Purple-A11y\Default\Network\Cookies'  


C:\Users\brian\AppData\Local\Microsoft\Edge\User Data null getEdgeData

Error from purple hats, but not axe

I crawled this site lflegal.com and got a number of errors on pages like this:

<html> element must have a lang attribute WCAG 3.1.1

<html dir="ltr">

One of the errors that popped up was:

https://www.lflegal.com/category/articles/settlement-agreement-press-releases/point-of-sale-press-releases/page/2/

HTML on the page is:

<!DOCTYPE html>

<!--[if IE 9]>
<html class="ie ie9" lang="en" prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb#">
 <![endif]-->
<html lang="en" prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb#">

Did catch some other useful errors, but this seems to be false.

What do you mean by "customisable"?

By customisable, do you mean that Purple A11y can be rebranded into a theme with custom colours, designs, behaviors, and images?

Or by customisable do you mean that we can create custom flows for scanning?

Unable to run an audit starting with the 07/14/2023 portable release

Hello,
I've been able to use PurpleHats with any release after 07/04.

I execute "Node Index" and it just stops. I found this in "errors.txt":
{"timestamp":"2023-08-22 11:50:11","level":"error","message":"uncaughtException: Unexpected end of JSON input\nSyntaxError: Unexpected end of JSON input\n at JSON.parse ()\n at getUserDataTxt (file:///C:/Users/Jackc/Desktop/Purple0822/purple-hats/utils.js:68:27)\n at file:///C:/Users/Jackc/Desktop/Purple0822/purple-hats/constants/questions.js:12:18\n at ModuleJob.run (node:internal/modules/esm/module_job:193:25)\n at async Promise.all (index 0)\n at async ESMLoader.import (node:internal/modules/esm/loader:530:24)\n at async loadESM (node:internal/process/esm_loader:91:5)\n at async handleMainPromise (node:internal/modules/run_main:65:12)"}

Can you help? I'd like to take advantage of the latest updates. I'm on Windows10

Report crashes - Allocation failed - JavaScript heap out of memory

If I crawl this sitemap.xml file https://cnib.ca/en/sitemap.xml?region=on

The page fails. There are over 1800 pages in it, but had hoped it would be able to manage sites larger than this.

INFO: BasicCrawler: All the requests from request list and/or request queue have been processed, the crawler will shut down.
INFO: Crawler final request statistics: {"avgDurationMillis":8803,"perMinute":34,"finished":1813,"failed":3,"retryHistogram":[1811,2,null,3]}
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory

<--- Last few GCs --->

[9364:0x102679000]  3255724 ms: Mark-sweep 1306.0 (1388.6) -> 1305.9 (1357.1) MB, 68.6 / 0.0 ms  (average mu = 0.719, current mu = 0.000) last resort GC in old space requested
[9364:0x102679000]  3255797 ms: Mark-sweep 1305.9 (1357.1) -> 1305.9 (1357.1) MB, 72.7 / 0.0 ms  (average mu = 0.556, current mu = 0.000) last resort GC in old space requested


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x200b585878a1]
Security context: 0x1c9e2dd1e6e1 <JSObject>
    1: byteLength(aka byteLength) [0x1c9eb2305ef1] [buffer.js:~510] [pc=0x200b58ca5d2c](this=0x1c9e693826f1 <undefined>,string=0x1c9e2920a331 <Very long string[460960089]>,encoding=0x1c9e2dd3d4f1 <String[4]: utf8>)
    2: arguments adaptor frame: 3->2
    3: fromString(aka fromString) [0x1c9eb231c639] [buffer.js:~335] [pc=0x200b58cec790](this=0x1c9e693826f1 ...

 1: 0x10003ae75 node::Abort() [/usr/local/bin/node]
 2: 0x10003b07f node::OnFatalError(char const*, char const*) [/usr/local/bin/node]
 3: 0x1001a7ae5 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 4: 0x100572ef2 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/usr/local/bin/node]
 5: 0x10057c3f4 v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/usr/local/bin/node]
 6: 0x10054e1e4 v8::internal::Factory::NewRawTwoByteString(int, v8::internal::PretenureFlag) [/usr/local/bin/node]
 7: 0x10067fd99 v8::internal::String::SlowFlatten(v8::internal::Handle<v8::internal::ConsString>, v8::internal::PretenureFlag) [/usr/local/bin/node]
 8: 0x1001c587d v8::String::Utf8Length() const [/usr/local/bin/node]
 9: 0x10004e7b6 node::Buffer::(anonymous namespace)::ByteLengthUtf8(v8::FunctionCallbackInfo<v8::Value> const&) [/usr/local/bin/node]
10: 0x200b585878a1 
The file /Users/mikegifford/purple-hats/results/2020-9-24/160096044136e10467e2/reports/report.html does not exist.

What's the best way to deal with this memory error

Make it easier to analyze reports

I created another script to help analyze the reports.

Right now if you scan a few sites with Purple A11y's CLI tool you'll be left with bunch of directories with HTML and CSV files in them. That's great, however it doesn't allow you to really gather the metadata behind the scan. I wanted a snapshot and not the detailed reports.

So I created a little script to crawl the CSV files and extract some very basic information that would be a very basic status report.

I wanted a consistent benchmark for our sites that allows us to demonstrate improvement over time. Seeing the individual errors is useful, but if I have scanned 1000 pages, it would make sense that there would be more errors than if I'd just scanned 100.

Knowing how many URLs, WCAG errors (for each type), axe impact status (for each type) gives a better snapshot.

Perfection is nice, but for most sites that may be unattainable, and it would be better to strive to be able to prove progress,

Add to text summary

Right now when running the cli.js file I get output like:
 

Purple A11y version: 0.9.41

Starting scan...

Fetching URLs. This might take some time...

Fetch URLs completed. Beginning scan

┌─────────────────────────────────────────┐

│ Scan Summary                            │

│                                         │

│ Must Fix: 0 issues / 0 occurrences      │

│ Good to Fix: 1 issues / 353 occurrences │

│ Passed: 135610 occurrences              │

└─────────────────────────────────────────┘

┌────────────────────────────────────────────────────────┐

│ Report of this run is at a11y-scan-results.zip         │

│ Results directory is at results/20240105_181127_custom │

└────────────────────────────────────────────────────────┘

Which kinda works when you're running 1 domain. 

However, you could add:
- Domain Name
- Number of URLs Crawled
- axeImpact scores
- WCAGissue numbers

## Add a score for better comparison.

I also like developing a score so that scans can be more easily compared. This is one approach.

Use the axe criteria to addresses severity -  Score  = (critical*3 + serious*2 + moderate*1.5 minor) / urls*5 - which allows you to easily compare a 1000 page site to a 10000 page site with a numeric value which lets you determine if the site is getting better or worse for users. https://github.com/CivicActions/purple-hats/blob/master/mergeAxeResults.js#L383

Grade that score to a letter grade that is easier for folks to understand:

  •     // A+ = 0 ; A <= 0.1 ; A- <= 0.3 ;
  •     // B+ <= 0.5 ; B <= 0.7 ; B- <= 0.9 ;
  •     // C+ <= 2 ; C <= 4 C- <= 6 ;
  •     // D+ <= 8 ; D <= 10 ; D- <= 13 ;
  •     // F+ <= 15 ; F <= 20 ; F- >= 20 ;

Combine efforts

Just saw you put out a new release.

Would love to see some of the work we've done here get pulled back into the main repo - https://github.com/civicactions/purple-hats

I haven't seen what you've done recently but will definitely take a look. Might be worth while talking about some of the places where I've worked to extend the code. Sorry for the messy JS. I'm still very much learning node.

Option to eliminate Axe rules best practices

Hello,
How do we eliminate best practices? Currently all issues are reported both violations & best practices. Many best practices are not required to be implemented & some look like false positives.

Out of one hundred issues I see ninety of them are best practice rules of Axe.

It should be possible to script the execution

I'd love to be able to run this on a bunch of sites from the command line. Even just being able to execute it on cron, so that every Sunday night so we have a fresh report to look at every week. 

The current approach doesn't allow for an execution like this. The Desktop and CLI Interface are nice if you want to scan a web site every once in a while, but after a few weeks you really don't want to go through the steps of doing this by hand.

Make crawls efficient and fast, improve issue coverage by 23%, and domain coverage

Hi, I saw this project recommended on my Github feed. I thought id share this project to help save some time and effort on some challenges that were solved at scale so our environment does not suffer since web automated testing is not a simple thing that should be done in one language ( nodejs is not meant for concurrency - runtime is forked each pool and does not scale - also the community usually confuses concurrency with parallel they are not the same thing ).
The job requires a micro-service setup to scale (crawler Rust,C, C++ etc, HTML needs dedicated browsers with configs that render background processing and script transformation to prevent abuse, RPC streams to communicate REST does not scale with this type of workload and should not be done, security layer, resource control when headless rendering, algorithms that are efficient most of the ones today have loops on loops on loops with no data structure architecting, bad mutations, ownership instead of using iterators, and etc, - we rewrote the old runners and made them efficient when people want to use things like axe for results or codesniffer, and much more. Are goal is to be accurate and performant without any drips or leaks along the way ).

We left a Lite version for the community to use as it is far superior than any combination of tool to get the job done for dynamic automated web accessibility testing with 23% more coverage than alternatives and the most efficient crawling to scale https://github.com/a11ywatch/a11ywatch.

  • Here is a video of a crawl completing with 63% accessibility coverage, subdomain and TLDs coverage, code fixes, detailed AI alt enhancement, and most importantly the most efficient crawling at scale ( millions of pages within seconds to minutes ).
demo.mp4

I hope this helps solve some of the challenges being built as the system has many ways to integrate. We actually built several technologies along the way that are used in big tech companies today ex: https://github.com/spider-rs/spider.

Here is an example of a PHP integration with a tool called Equalify https://github.com/bbertucc/equalify/tree/176-a11ywatch-integration - exact commit for main code required https://github.com/j-mendez/equalify/commit/5eddd04653bf91eca15435465b78dab6c30920d8

Integration for nodejs

You can use the sidecar to integrate without going over the wire.

npm i @a11ywatch/a11ywatch --save

import { scan, multiPageScan, crawlList } from "@a11ywatch/a11ywatch";

// single page website scan.
await scan({ url: "https://jeffmendez.com" });

// single page website scan with lighthouse results.
await scan({ url: "https://jeffmendez.com", pageInsights: true });

// all pages
await multiPageScan({ url: "https://a11ywatch.com" });

// all pages and subdomains
await multiPageScan({
  url: "https://a11ywatch.com",
  subdomains: true,
});

// all pages and tld extensions
await multiPageScan({ url: "https://a11ywatch.com", tld: true });

// all pages, subdomains, and tld extensions
await multiPageScan({
  url: "https://a11ywatch.com",
  subdomains: true,
  tld: true,
});

// all pages, subdomains, sitemap extend, and tld extensions
await multiPageScan({
  url: "https://a11ywatch.com",
  subdomains: true,
  tld: true,
  sitemap: true
});

// multi page scan with callback on each result asynchronously
const callback = ({ data }) => {
  console.log(data);
};
await multiPageScan(
  {
    url: "https://a11ywatch.com",
  },
  callback
);

Ps: only reason for multiple languages and not Rust for everything is due to needing to customize browsers that are made in different languages or tweaking.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.