Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Official way to synchronize the JSON 5.0 feeds #16

Closed
ncrocfer opened this issue Mar 26, 2023 · 6 comments
Closed

Official way to synchronize the JSON 5.0 feeds #16

ncrocfer opened this issue Mar 26, 2023 · 6 comments
Labels
question Further information is requested

Comments

@ncrocfer
Copy link

ncrocfer commented Mar 26, 2023

Hello,

First of all thank you for the awesome work you do concerning the CVE ecosystem!

I'm the developer of a CVE-related tool, and I would like to add the MITRE in my sources (instead of only relying on NVD for now). But to be honest I don't really know how to parse your feed.

So I would like to ask you the official and recommended way to synchronize our local databases with the new JSON 5.0 CVE list.

I searched on your blog posts and if I'm not wrong you're currently in "Soft Deploy" state, meaning CNAs now use the new format to declare CVEs. The "Hard Deploy" is targeted for 1st QT, 2023. At this moment we (as consumers) will be able to officially use the JSON 5.0 feeds.

But where to find the list please? I think the old format (csv, html, text, xml) will be removed, so maybe you will provide an API (or something similar as the NVD does) to fetch the last changes?

Or maybe this current repo (cveproject/cvelistv5) will become our official data feeds? If yes do you recommend to use the recent_activities.json file to detect the changes or simply periodically git pull and parse the new diffs?

Thank you in advance for your answer,
Nicolas

@hkong-mitre
Copy link
Collaborator

As of 3/28/2023, this repository is now the official way to download/update all published CVEs from the official CVE Project. You can think of it as a cache that is updated multiple times an hour.

There are now 3 methods to download/sync the CVEs:

  1. if you are comfortable with using git, use any git client and git clone https://github.com/CVEProject/cvelistV5.git as you would any GitHub repository. The initial git clone is quite large (about 1.7 GB), but each successive git pull will quickly update your local clone. This is the preferred approach and can be easily automated.
  2. if you prefer to use zip, use this repository's Releases Page where you can choose download a "baseline" zip containing all CVEs at midnight (GMT), an hourly zip containing all new/updated CVEs since midnight (GMT), and/or a release note enumerating all the new/updated CVEs since midnight (GMT). This approach uses about 1.5 GB of storage. Use this method if you need a daily sync (e.g., at or close to midnight GMT every night) or hourly syncs throughout the day.
  3. if you want to download all current CVEs infrequently, use GitHub's "Download Zip" link. This downloads all of the current CVEs in a single large zip file. This method is not recommended for sync purposes, since it always downloads all CVEs each time

@hkong-mitre hkong-mitre added good first issue Good for newcomers question Further information is requested labels Apr 3, 2023
@hkong-mitre hkong-mitre pinned this issue Apr 3, 2023
@hkong-mitre hkong-mitre removed the good first issue Good for newcomers label Apr 3, 2023
@bytedancer1
Copy link

None of these options are attractive. It should be easy for people to effectively subscribe to a json feed or even just list of CVE ID of updated CVEs as they happen.

Sure you can do this by cloning the repo but it requires a large amount of disk space and an environment where you can run git to operate. This is very inconvenient in lots of contexts where it would be nice to be able to obtain this information. Containers, FAAS, etc.

The hourly release files are first of all - hourly, but ok. But they are a lot of work to deal with as you have to unzip, then sort out what files are new since the last hourly release, since for some reason you include everything since midnight and publish daily the last 10 years up until midnight, which seems like an awful waste of disk space and bandwidth. But also not relevant.

And it's unnecessary. Publishing anything solely via git requires subscribers to run git, and this does not encourage people to subscribe. And this information really deserves to be easy to obtain.

@ncrocfer
Copy link
Author

I received a notification since I’m the creator of this thread, so I’m taking the liberty of responding 🙂

First, a quick disclosure: I am not part of MITRE, and the project I work on is not affiliated with MITRE in any way. The response below reflects only my personal opinion and not that of MITRE.

I disagree with your statement. MITRE’s role is clear and well-defined: The CVE Program partners with community members worldwide to grow CVE content and expand its usage. They manage the CVE dictionary by working with trusted partners. They also provide an online service (www.cve.org) that allows users to search for CVEs (e.g., SQL injection-related CVEs). However, they cannot cover all possible user needs—including yours: "subscribe to a JSON feed or even just a list of CVE IDs of updated CVEs as they happen."

Retrieving the latest updates from the Git repository and extracting the necessary information is quite straightforward. This is what most users do, and it's also the approach we took when developing OpenCVE. If the existing solutions don’t fully meet your needs, I encourage you to develop your own tooling.

And while we're on the topic, I’d like to take this opportunity to once again thank MITRE for the outstanding work they do in maintaining such a high-quality database!

@M-nj
Copy link
Collaborator

M-nj commented Feb 28, 2025

@bytedancer1 I believe the delta files might be of interest for you?

For changes that happened in the latest commit see cves/delta.json.

For changes that happened in the latest few commits see cves/deltaLog.json.
Note: The deltaLog.json does not guarantee a set time window (e.g. last X days) or number of commits (e.g. last X commits). The git log rotates out X days ago, but it is possible for it to drop a few of the older commits if there are too many in the file. The source code can be found in the cvelist-bulk-download repo here (TS/JS), specifically the Automated github action to run the update/commit command, UpdateCommand, Delta, and DeltaLog files.

Quick rundown for the delta file contents:

{
  // fetchTime: the UTC+0/Z timestamp for when this update command started
  "fetchTime": "1970-01-01T00:00:00.000Z", 

  // numberOfChanges: the total number of CVE IDs that have changed since the last update command/commit (new + updated)
  "numberOfChanges": 3,  

  // new: An array for CVEs that have been newly published since the last update command/commit (newly published, including any number of updates since it has been published until now)
  "new": [ 
    {
      // [new/updated].cveId: The CVE ID (that was newly published or updated)
      "cveId": "CVE-1970-0001",

      // [new/updated].cveOrgLink: Where you can find this record on the official CVE Project website. (The version here does not lag behind the live data).
      "cveOrgLink": "https://www.cve.org/CVERecord?id=CVE-1970-0001",

      // [new/updated].githubLink: Where you can find the raw JSON contents of this CVE (via this github repo).
      "githubLink": "https://raw.githubusercontent.com/CVEProject/cvelistV5/main/cves/1970/0xxx/CVE-1970-0001.json",

      // [new/updated].dateUpdated: The CVEID's cveMetadata.dateUpdated value (the last time it was updated before this update command/commit)
      "dateUpdated": "1969-12-31T23:59:59.999Z"
    }
  ],

  // updated: An array for CVEs that have been updated since the last update command/commit (CVE contents may have changed multiple times, but only the latest content is made available here). 
  // The items in the updated array follow the same pattern as the new array.
  "updated": [
    {
      "cveId": "CVE-1970-0002",
      "cveOrgLink": "https://www.cve.org/CVERecord?id=CVE-1970-0002",
      "githubLink": "https://raw.githubusercontent.com/CVEProject/cvelistV5/main/cves/1970/0xxx/CVE-1970-0002.json",
      "dateUpdated": "1969-12-31T23:59:59.999Z"
    },
    {
      "cveId": "CVE-1970-0003",
      "cveOrgLink": "https://www.cve.org/CVERecord?id=CVE-1970-0003",
      "githubLink": "https://raw.githubusercontent.com/CVEProject/cvelistV5/main/cves/1970/0xxx/CVE-1970-0003.json",
      "dateUpdated": "1969-12-31T23:59:59.999Z"
    }
  ],

  // error: This is meant to show CVE IDs that had an issue getting updated or for if something went wrong. 
  "error": []
}

The deltaLog file is just an array of the historical delta.json contents (including the latest delta.json contents).

Please note that the delta files are subject to change. If you wish to provide a new solution that the CVE team should implement, I suggest joining the Automation Working Group (AWG) which would be an appropriate place to suggest new features.

@bytedancer1
Copy link

thanks for the response. i will look into the AWG.

I have looked at the delta files, but the same issue remains: you are beholden to run git and keep a local copy of the repository. Or you are beholden to github's API and their rate limits and token limits and costs. This is not how free information is supposed to be distributed. Github can cut off anyone at any time and requires registration with a commercial entity just to use the API without extreme rate limit contraints.

about your specific suggestions: delta.log is promising, but changes very quickly, so unless you get it from git, its not useful. deltaLog.json can be obtained with the github APi on a regular basis, but it is SIX MEGABYTES. i want to check for new CVE releases/updates every minute or more, and I don't want to have to dl 6 megabytes to find out what might have happened in the last minute or 5.

I'll go check out the AWG, and thanks again for your response.

a delta.log for the last 24 hours or so, published on a mitre https server for anonymous users to consume would be awesome. Something that you can download quickly and see what happened over the last 24 hours. Without needing to register or pay. You don't put the most interesting link in deltaLog either, which is the best way to get any given cve's record: https://cveawg.mitre.org/api/cve/CVE-2025-0001 . this is much more useful than a link to the git repo. you can't really reliably automate anything with those github links, they are made for people to use in browsers.

@M-nj
Copy link
Collaborator

M-nj commented Mar 3, 2025

These are valid concerns that can absolutely be brought up to the AWG group.

For the time being, this repo is about as realtime as an unauthorized/anon user could get in a bulk format. I am honestly not sure if there will be a better option anytime in the near future. There is a cost concern with having many people hammer the CVE Services API with constant 'get changes since arbitrary timestamp' requests.

These might not quash your concerns but here are some dev scripts that might be of interest to you:

Check to see if remote git repo's latest commit matches your local version:

local=$(git log -n 1 --format=%H)
remote=$(git ls-remote https://github.com/CVEProject/cvelistv5.git main | awk '{print $1}')
if [ "$local" != "$remote" ]; then echo 'out of sync, needs git pull';else echo 'up to date'; fi

Get CVE IDs changed since timestamp in repo (after a git pull)

timestamp="2025-03-03T05:00:00Z"
git pull
(git log --since=$timestamp --name-only --pretty=format: | sort -u | grep -oP "CVE-\d{4}-\d{4,}")

NOTE: if a change was made at minute A, and the CVE Update command fetched at minute C, the change will show up in a query that starts at minute B

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants