Simon Willison pioneered a technique that he calls
git scraping. The idea is to use GitHub actions and the
git commit structure to build time series datasets.
I’m currently building two datasets:
In November 2021, CISA announced a Known Exploited Vulnerabilities Catalog. Binding Operational Directive 22-01 uses this as a foundation for requiring federal agencies to patch their systems. Git scraping will enable a couple pieces of analysis: how long does CISA give federal agencies to patch once they know the vulnerability is being exploited? how are these vulnerabilities distributed between different vendors? is there a pattern to how regularly CISA updates the list or requires patching? Data available here.