CLI Website Analyzer for GDPR, Performance and Accessibility Evaluation

Problem Statement

In order to develop an appealing website, requirements from various categories must be met. Users expect a fast loading website, that is pleasant to use (UX) and visually appealing (UI). On the other hand, governments enforce site owners to respect data pri- vacy regulation such as EU’s General Data Protection Regulation (GDPR), better known as Datenschutz-Grundverordnung (DSGVO) in German-speaking countries. While some legal requirements towards accessible websites exist for public bodies, the topic still doesn’t have the attention it deserves. Just try keyboard navigation on the next website you visit, and with high probability you’ll notice shortcomings. If you hit Tab on your keyboard and nothing happens, then chances are high, that you surf on an inaccessible website.

Many of these requirements and goals can be automatically validated, and poten- tial problems identified automatically. Although, there are several solutions from web services to standalone tools, neither of publicly available solutions tick all the boxes. Existing open source tools mainly focus on a specific category like cookie scans or privacy policy checks and neglect everything else. Further shortcomings are the lack of an automated website crawler, so the analysis is restricted to a single page. Thus, big images, exclusively used in a blog post, could never be detected in one go. However, even a single issue can significantly deteriorate the page load time.

Hence, a thorough website analysis is crucial to ensure, that all images load as ex- pected, GDPR is respected on all pages and not violated by the video embed on the third blog post. A common GDPR issue is missing user consent for an interactive map, added through a third party service on a contact page. For developers, it is common to assess these things based on the data shown in the browser’s developer tools. Whereas, it is possible to do many of these checks manually, it is a time-consuming and repetitive task, which you don’t want to repeat for a dozen, let alone a few hundred pages. Even if you quickly gather the data for a single page, you also need to evaluate it. The proposed solution for this problem is a tool, that automates all of these steps. In addition, it al- ways applies the same standard for evaluation and creates reproducible analysis results. It’s useful to determine the status-quo for clients that hire you to improve their existing site, and you need a quick overview. You are able to document the current state and talk about suggestions for improvement based on the automatically generated report. Reevaluation after a site rebuild or improvement tasks, safes the same amount of time again and again.

Certainly, you are now able to understand how much time such an automated website analyzer can save and, that some tests can only be carried out meaningfully with an automated solution. The proposed solution collects information for all pages of a website with the push of a button and creates a user-friendly PDF report with all its findings. The proposed program will be an enormous time saver, making otherwise cumbersome manual checks viable in the first place. Additionally, the report documents the website state at the start and end, and documents the impact of your changes.

Aim of the Project

The goal of this project is to develop a website analyzer as a standalone application, that’s built as an easy-to-use command line tool. It checks for GDPR compliance, including cookie scans and external network requests for all pages of a website. It detects externally embedded fonts and services like Google Maps, Google Analytics, Google Fonts, as well as YouTube or any other third party request. Furthermore, it determines the total page sizes and gives a warning for big pages. It also finds bad practices, such as hot linked images and unusually large images on any of your pages.

Conclusion

The application has been tested with small websites with a dozen pages up to several hundred pages, and it has proven useful as a quick way to find shortcomings or validate, that the website is in good shape. Especially, issues only appearing on single pages like image and page load errors, or big resources are successfully and brought up new information for site owners.

On the other hand, the tool does not check any bandwidth dependent or other key figures influenced by your computer, such as page load time in seconds. The focus lies on reproducible figures like the total page size. Therefore, some figures are intentionally not evaluated, although they could.

With additional checks, the CLI Website Analyzer can get more helpful with the support of more use cases in the future.

Contact

The implementation was done by Matthias Hagmann. If you have any questions about the project, we will be happy to put you in touch.