-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider using VDB6 as a data source #1155
Comments
Thanks for the suggestion @prabhu, will definitely have a look! If we end up pulling in a pre-compiled / curated database, what would be great to have is the possibility of only fetching deltas. As in: "Only give me data that changed since I last checked". Having to pull in an entire blob of 1-N GB for only minor changes in the dataset will be expensive both on the network, but also on the processing side of things. I know this is a tricky problem which may not work when distributing the data as SQLite. Did you look into this aspect before, by chance? |
@nscuro Thank you so much for looking into this. Note the entire compressed database is only 188MB. total 188M
-rw-r--r-- 1 prabhu prabhu 45M Mar 22 10:05 data.index.vdb6.tar.xz
-rw-r--r-- 1 prabhu prabhu 144M Mar 22 10:05 data.vdb6.tar.xz Regarding the delta database, the larger database has a source_data_hash column in the future. I am happy to collaborate and improve this. |
Apologies for the delay. The compression definitely is a good thing here, thanks for pointing it out! Regarding the Would that be a viable thing for VDB to add? I reckon it would require some sort of state-keeping between successive builds of the DB... |
Responding to my own question above, I think the point
from the issue description kinda covers that already. Essentially we can do the state-keeping and enrichment with |
@nscuro I will look into the updated timestamp to see if there is a way to expose it as a column. At this point, I am not sure if all the sources correctly update this timestamp and there are sources with no timestamps too, and hence went with the hash of the metadata. Shall we explore alternatives to syncing the database like having a temp table for VDB6 or searching the sqlite directly for any hits from the index database? |
Another option is to use sqldiff to find the differing rows, but have not tried this command yet. Update: Download sqldiff from here - https://www.sqlite.org/download.html To quickly find the summary
To create SQL update statements for only the changed rows. This took a few minutes for me.
|
While the created/updated timestamps of the upstream sources are nice to have, for our use case we are more interested in when VDB6 updated a given entry. Say we fix a bug in how |
Side note, the selection of ORAS clients is rather sparse right now. The library proposed in the issue description might work, but would pull in Kotlin as additional dependency. It's also fairly new with only a single maintainer. Considering we won't need the full capabilities of ORAS, we should implement the "pull" functionality ourselves, without adding new dependencies. In the end it's just a HTTP API. Spec is here: https://github.com/opencontainers/distribution-spec/blob/main/spec.md#pull |
Some of the observations I found :
|
@sahibamittal, Thank you. Re (1), I am not a fan of epss, so unlikely to ever add support for it. For (2), we can enhance this code to accept a comma separated list of osv keys and create a new |
Inclusion of EPSS is something we could add as additional enrichment on our side. @prabhu Any thoughts on resolving alias relationships? We did some research on this a while back, and found that alias data from some sources is wild west (mostly OSV), however data from GHSA is usually reliable. I'd assume the same to be true for Linux distro feeds. Alias resolution is something that is easiest when all relevant data is present, so VDB6 is in a great position to make this happen as a post-build enrichment. |
@nscuro, interesting idea! Aliases are currently set in the description section for some sources. VDB tries to resolve the CVE id if available to reduce duplicates. But definitely an idea for a future enhancement. |
@nscuro, now that CVE 5.1 is released with support for purl, I am thinking of prioritizing VDB 6.1, which will use 5.1 schema with a couple of breaking changes. Additionally, we can support vulnrichment repo (auto-upgraded to 5.1 format). Are you ok with parking this issue and revisit around September 2024? |
@prabhu Most certainly. |
AppThreat vulnerability-db is an MIT-licensed database used by tools such as depscan for scanning. VDB6 is now available as a downloadable SQLite database. This data would help DT support containers, Linux OS, and some c/c++ with purl-based searches.
The easiest way to download the databases is using the ORAS cli tool.
Use any sqlite browser tool to inspect and query the databases.
Proposed integration
Possible challenges
The text was updated successfully, but these errors were encountered: