This project is a repository of files that shouldn't exist. Bioinformatics is know for its various file formats some with complex specifications others without any formal spec at all. This leads to incompatibilities between implementations and cumbersome transformation steps necessary in pipelines. This repository contains files that violate the specifications in subtle ways and thus can be used to test implementations of these formats.
This repository is organised by file format. Each format has its own directory with its own readme. The readme contains information on the file format, how it can be broken and goes into the examples of brokeness the files in that directory exhibit.
The contents of this repository is released under the permissive ISC License. However, I'm uncertain if that even applies to data files. (I'm not a lawyer.) Let's assume the data in this repo is free as in beer and freedom.