Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial repopulation in case NVD API fails #299

Closed
oh2fih opened this issue Jul 2, 2024 · 6 comments · Fixed by #300
Closed

Partial repopulation in case NVD API fails #299

oh2fih opened this issue Jul 2, 2024 · 6 comments · Fixed by #300

Comments

@oh2fih
Copy link
Collaborator

oh2fih commented Jul 2, 2024

There have been problems with the NVD API for a few days: it has been either slow or returning 503 without completely failing. As a result, same updates are only partly successful. Currently, the only option would be repopulating the entire database, but due to the same problems that would result in more CVEs missing.

Currently, source_process.py:158-161 compares the lastModified date of the most recent CPE, and 703-706 does the same for CVEs:

            else:
                last_mod_start_date = self.database[self.feed_type.lower()].find_one(
                    {}, {"lastModified": 1}, sort=[("lastModified", -1)]
                )

We need a solution to define a timestamp that could be used instead, causing all items after that timestamp to be fetched and updated regardless if they are already there, inserting the missing documents.

@P-T-I
Copy link
Member

P-T-I commented Jul 2, 2024

@oh2fih That's a very good point, that would be great! The NIST API gives a lot of causes of concern.... I've opened #221 in the past; if NIST's API keeps erroring out, this might be a feasible alternative; don't you think?

@oh2fih
Copy link
Collaborator Author

oh2fih commented Jul 2, 2024

Manual solution using mongosh

There is a manual solution with mongosh, alone, that does not require modifications for CveXplore.

Commands

The mongosh command for easier copy & paste.

var updateSince = ISODate('2024-06-29T00:00:00.000Z')
db.cpe.updateMany({ lastModified: { $gt: updateSince }}, { $set: { lastModified: updateSince }})
db.cves.updateMany({ lastModified: { $gt: updateSince }}, { $set: { lastModified: updateSince }})

Example results

cvedb> var updateSince = ISODate('2024-06-29T00:00:00.000Z')

cvedb> db.cpe.updateMany({ lastModified: { $gt: updateSince }}, { $set: { lastModified: updateSince }})
{
  acknowledged: true,
  insertedId: null,
  matchedCount: 797,
  modifiedCount: 797,
  upsertedCount: 0
}
cvedb> db.cves.updateMany({ lastModified: { $gt: updateSince }}, { $set: { lastModified: updateSince }})
{
  acknowledged: true,
  insertedId: null,
  matchedCount: 347,
  modifiedCount: 347,
  upsertedCount: 0
}

Possible feature

This could be added as a feature that does not require mongosh at all. Example usage would be:

from CveXplore import CveXplore
cvx = CveXplore()
cvx.database.update("cpe", 7)
cvx.database.update("cves", 7)

That would mean that, in MainUpdater(), update() would be defined as:

    def update(self, update_source: str | list = None, manual_days: int = 0):

And then that value be passed all the way up to the comparisons in each source; if it is set, it should override the values from the database.

@oh2fih
Copy link
Collaborator Author

oh2fih commented Jul 2, 2024

That's a very good point, that would be great! The NIST API gives a lot of causes of concern.... I've opened #221 in the past; if NIST's API keeps erroring out, this might be a feasible alternative; don't you think?

@P-T-I Using the files might work for now, but we moved to the API mainly for the reason that the JSON feeds had been planned to retire in 2023, although it has other advantages, too. The retirement is currently postponed to "in 2024" in the timeline.

Timeline
December 2023 The NVD will retire all 1.0 APIs on December 18th.
2024 The NVD will retire the Legacy Data Feed Files once improvements for bulk download capabilities of the NVD dataset are implemented.

While we could guess it might be postponed again, I'm not that sure we should be building new features over deprecated data sources.

The latest status update says:

May 29, 2024: NIST has awarded a contract for additional processing support for incoming Common Vulnerabilities and Exposures (CVEs) for inclusion in the National Vulnerability Database. We are confident that this additional support will allow us to return to the processing rates we maintained prior to February 2024 within the next few months.

I hope the current timeouts and 503 errors are caused by the increasing usage instead of Analygence's consultants breaking up the API. 😅 We should get quite good services with the $125M of U.S. tax money invested in this! 💸

@oh2fih oh2fih changed the title Partial repopulation in case NVD NIST API fails Partial repopulation in case NVD API fails Jul 2, 2024
@P-T-I
Copy link
Member

P-T-I commented Jul 2, 2024

@P-T-I Using the files might work for now, but we moved to the API mainly for the reason that the JSON feeds had been planned to retire in 2023, although it has other advantages, too.

I don't mean falling back to the json feeds from NIST; I was looking more along the lines of https://github.com/fkie-cad/nvd-json-data-feeds which has cached NIST data; and would be a nice fall back?

@oh2fih
Copy link
Collaborator Author

oh2fih commented Jul 2, 2024

Ah, yes, those might live longer, and the repository states they are using the same API for collecting the data.

@oh2fih
Copy link
Collaborator Author

oh2fih commented Jul 3, 2024

@P-T-I cve-search/cve-search#1110 is on tap to add a command line option -d to db_updater.py for using this feature through CVE-Search; could be merged after there is a release of CveXplore that supports it. 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants