At least one valid, representative file for every mimetype, because examples are useful and file extensions are unreliable.
In order of priority:
- Public domain or Creative Commons licensing.
- Correctly represents the mimetype.
- Stable URIs (permalinks are best).
- Relatively small file size (to save disk space and bandwidth).
The IANA lists about 1500 registered mimetypes, some of which are deprecated. There are many more which are unofficial.
It's a big job, but not impossible. See the coverage report for details.
Suppose you want an example of a VRML file.
Wikipedia discusses VRML and links to examples, but finding an actual VRML file via web search is non-trivial.
A Google filetype search for .vrml
files? Not helpful.
https://google.com/search?q=filetype%3Avrml
Oops, that should be .wrl
files.
https://google.com/search?q=filetype%3Awrl
But most of the results still aren't relevant.
Even if you find a good link, you either have to keep track of the file or bookmark the link, which could give a 404 the next time you open it.
With a mimetype menagerie, it's as simple as a link to a directory:
And finding files from scratch is a simple search with a file manager or on the commandline like this:
$ cd media-types/ $ find . -iname '*VRML*' ./model/vrml $ ls model/vrml/ HelloWorld.wrl
Some mimetypes may not have any public domain examples, or may be patent-encumbered. These can be linked to, but will have to be omitted from the actual git repository.
The fuzzing programs American Fuzzy Lop (afl) has a number of testcases, such as archives and image formats: