Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicated projects in landscape.yml #4223

Open
ronaldpetty opened this issue Jan 26, 2025 · 2 comments
Open

duplicated projects in landscape.yml #4223

ronaldpetty opened this issue Jan 26, 2025 · 2 comments

Comments

@ronaldpetty
Copy link
Contributor

Hi,
I am doing some research for the CN AI WG and was processing the landscape.yml list. Not sure if this is ok, but found a few projects that are listed more than once. I can envision that happening, but given the the first one I checked (Apache Spark), it is unclear if its right.

% curl -s https://raw.githubusercontent.com/cncf/landscape/refs/heads/master/landscape.yml | yq -p yaml -o json | jq '.landscape[].subcategories[].items[].name'  | sort | uniq -c | sort -n
... (all above here are listed 1 time)
  1 "youki"
   1 "zot"
   2 "Alluxio"
   2 "Apache Spark"
   2 "Cassandra"
   2 "ClickHouse"
   2 "DeepFlow"
   2 "Grafana"
   2 "Kafka"
   2 "Numaflow"
   2 "Okahu"
   2 "OpenLIT"
   2 "OpenLLMetry"
   2 "Permify"
   2 "Presto"
   2 "Prometheus"
   2 "Pulsar"
   2 "Redis"
   2 "SpiceDB"
   2 "Upbound (member)"
   2 "Weaviate"

Just to on Apache Spark, it is listed as subcategories:

  • Streaming & Messaging
  • Data Architecture

I would not list Spark as streaming and messaging. Sure it streams things, but thats not the point of it. I think Data Architecture is fine.

Not to nit pick one entry, is there a rule about being listed more than once?

@ronaldpetty
Copy link
Contributor Author

Adding another related question. Is there a set of rules in general. Couple of others that came to mind while looking was the CNCF level (graduated, sandbox, etc.). Some don't have anything. Others (Spark) are not CNCF projects at all. What lead me here was trying to classify CNCF projects by there maturity level (then do other investigations). It seems unclear if I can do that with this list (or maybe I can, maybe the missing levels are non-cncf projects versus missing data).

Happy to help fill things in (did initial CNAI subcategory). Any guidance appreciated.

@ronaldpetty
Copy link
Contributor Author

ronaldpetty commented Jan 26, 2025

The numbers on the CNCF maturity blog post do seem to match the numbers I am calculating here:

% curl -s https://raw.githubusercontent.com/cncf/landscape/refs/heads/master/landscape.yml | yq -p yaml -o json | jq '.landscape[].subcategories[].items[].project' | grep incubating | wc -l
      37
% curl -s https://raw.githubusercontent.com/cncf/landscape/refs/heads/master/landscape.yml | yq -p yaml -o json | jq '.landscape[].subcategories[].items[].project' | grep sandbox | wc -l   
     128
% curl -s https://raw.githubusercontent.com/cncf/landscape/refs/heads/master/landscape.yml | yq -p yaml -o json | jq '.landscape[].subcategories[].items[].project' | grep graduated | wc -l 
      30

So maybe the projects missing a "project" fields are sandbox hopefuls or non-CNCF hosted?

Another way to merge above.

% curl -s https://raw.githubusercontent.com/cncf/landscape/refs/heads/master/landscape.yml | yq -p yaml -o json | jq -r '.landscape[].subcategories[].items[] | select(.project == "sandbox" or .project == "incubating" or .project == "graduated") | "\(.project) \(.name) \(.repo_url)"' | sort | awk '{print $1}' | uniq -c
  30 graduated
  37 incubating
 128 sandbox

@ronaldpetty ronaldpetty changed the title duplicates projects duplicated projects in landscape.yml Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant