Skip to content

March 2025 ASF Board Report (March 12) #13713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 tasks done
alamb opened this issue Dec 10, 2024 · 11 comments
Closed
4 tasks done

March 2025 ASF Board Report (March 12) #13713

alamb opened this issue Dec 10, 2024 · 11 comments
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Dec 10, 2024

Related Items:

Is your feature request related to a problem or challenge?

Per https://whimsy.apache.org/roster/committee/datafusion the DataFusion ASF board report schedule is

March, June, September, December

Describe the solution you'd like

I would like to draft a board report for the ASF board meeting, ideally with community help.

The meetings are typically in the second or third week of the month

Describe alternatives you've considered

I plan to do this in the same style that worked well in Arrow (see an example from @andygrove
here https://lists.apache.org/thread/7w4mgy98qomc6drvj2fo81gvhq6p0boc) --

  • make a google doc (in addition to this issue) that people can add relevant content to
  • Solicit community feedback for the various sub projects
  • The chair (me for the time being) submits it to the board
  • Create a ticket for the next report

Here is a past example:

Additional context

No response

@alamb alamb added the enhancement New feature or request label Dec 10, 2024
@alamb alamb self-assigned this Feb 28, 2025
@alamb
Copy link
Contributor Author

alamb commented Mar 1, 2025

Here is a google doc to coordinate board reporting: https://docs.google.com/document/d/11b2GEmPh5gblWWegeZi3G38e97vRqHSRElkLTwZHrjY/edit?tab=t.0

Please feel free to post comments to the doc or this ticket.

Current draft is below.

Description:

The mission of Apache DataFusion is the creation and maintenance of software
related to an extensible query engine

Project Status:

Current project status: New + Ongoing (high activity)
Issues for the board: None

Membership Data:

Apache DataFusion was founded 2024-04-16 (10 months ago)
There are currently 43 committers and 15 PMC members in this project.
The Committer-to-PMC ratio is roughly 3:1.

Community changes, past quarter:

  • Jonah Gao was added to the PMC on 2024-12-16
  • Piotr Findeisen was added as committer on 2024-12-03
  • Ruiqiu Cao was added as committer on 2024-12-10
  • Yongting You was added as committer on 2025-01-18

Note that almost all communication for DataFusion and its subprojects happens on github and so our dev mailing list traffic is fairly light.

Project Activity:

Overall

DataFusion core

45.0.0 was released on 2025-02-07.
44.0.0 was released on 2024-12-31.

https://github.com/apache/datafusion

Releases continue monthly and the project has been very active with many commits a day. It seems many new projects have been using DataFusion for query processing which brings more contributors but also means we are spending more time fielding questions and figuring out how many more features to accept.

Bruce Ritchie recently authored a blog about some of the features and the outlook for the next 6 months.

We have been focusing more recently on pre-release testing and making it easier for downstream consumers to use DataFusion, which is still a challenge given how fast the project is moving.

Sub project: DataFusion Python

https://github.com/apache/datafusion-python

PYTHON-45.2.0 was released on 2025-02-23.
PYTHON-44.0.0 was released on 2025-02-07.
PYTHON-43.1.0 was released on 2024-12-12.

Sub project: DataFusion Comet

https://github.com/apache/datafusion-comet

COMET-0.6.0 was released on 2025-02-17.
COMET-0.5.0 was released on 2025-01-17.

Sub project: DataFusion Ballista

https://github.com/apache/datafusion-ballista

(New!) Sub project: DataFusion Ray

https://github.com/apache/datafusion-ray

This is a new project aims to make it easier to run DataFusion in a distributed environment using the https://www.ray.io/ compute engine

Sub project: Sqlparser

We have made two releases since sqlparser became part of DataFusion.

  • SQLPARSER-0.54.0 was released on 2025-01-23.
  • SQLPARSER-0.53.0 was released on 2024-12-18.

Ifeanyi / iffyio is doing a great job reviewing PRs to keep the code consistent and flowing.

Community Health:

While we as always struggle to get enough code review capacity, we have many
active committers, and the community in general helps each other out with
reviews. We continue to actively grow our committer and PMC ranks.

We had several in person meetups in Chicago, Boston, and Amsterdam, though we don’t have any more

@alamb
Copy link
Contributor Author

alamb commented Mar 1, 2025

FYI @iffyio and @robtandy in case you wanted to suggest any additions for datafusion ray / sqlparser

@alamb
Copy link
Contributor Author

alamb commented Mar 1, 2025

Mailing list announcement: https://lists.apache.org/thread/7g8b66wdhpdj9tn77ptzy2790bj3l47d

@kevinjqliu
Copy link

I feel like there are a few items from the blog post that would be great to include in the report.

Specifically,

In the core DataFusion repo alone we reviewed and accepted almost 1600 PRs from 206 different committers, created over 1100 issues and closed 751 of them 🚀. All changes are listed in the detailed changelogs.

and

DataFusion has put in an application to be part of Google Summer of Code with a number of ideas for projects with mentors already selected. Additionally, some ideas on how to make DataFusion an ideal selection for university database projects such as the CMU database classes have been put forward.

For DataFusion Python, I think support for FFI TableProvider (#12920) would be a great callout, along with the new user documentation on FFI (apache/datafusion-python#1031)

@robtandy
Copy link
Contributor

robtandy commented Mar 5, 2025

Working so hard at the moment to get DataFusionRay 0.1.0 out! Hopefully we can do that before the announcement and then there should be plenty to add.

@alamb
Copy link
Contributor Author

alamb commented Mar 5, 2025

Working so hard at the moment to get DataFusionRay 0.1.0 out! Hopefully we can do that before the announcement and then there should be plenty to add.

I don't think the board report needs lots of detail -- just highlights are fine (that show progress). Your update is alread good in my opinion @robtandy -- no need to do more

@robtandy
Copy link
Contributor

robtandy commented Mar 5, 2025

ha! ok good. I consider myself off the hook for a more detailed update!

@alamb
Copy link
Contributor Author

alamb commented Mar 5, 2025

ha! ok good. I consider myself off the hook for a more detailed update!

Indeed -- ship the code! Not reports!

@alamb alamb changed the title March 2025 ASF Board Report March 2025 ASF Board Report (March 12) Mar 7, 2025
@alamb
Copy link
Contributor Author

alamb commented Mar 10, 2025

I have incorporated @robtandy and @kevinjqliu 's comments. Here is the current draft

## Description:
The mission of Apache DataFusion is the creation and maintenance of software 
related to an extensible query engine

## Project Status:
Current project status: New + Ongoing (high activity)
Issues for the board: None

## Membership Data:
Apache DataFusion was founded 2024-04-16 (10 months ago)
There are currently 43 committers and 15 PMC members in this project.
The Committer-to-PMC ratio is roughly 3:1.

Community changes, past quarter:
- Jonah Gao was added to the PMC on 2024-12-16
- Piotr Findeisen was added as committer on 2024-12-03
- Ruiqiu Cao was added as committer on 2024-12-10
- Yongting You was added as committer on 2025-01-18

Note that almost all communication for DataFusion and its subprojects happens on github and so our dev mailing list traffic is fairly light.

## Project Activity:

### Overall

DataFusion is participating in Google Summer of Code with a number of ideas for projects with mentors already selected[1][2][3]. Additionally, some ideas on how to make DataFusion an ideal selection for university database projects such as the CMU database classes have been put forward.

[1]: https://github.com/apache/datafusion/issues/14577 
[2]: https://summerofcode.withgoogle.com/programs/2025/organizations/apache-datafusion

[3]: https://datafusion.apache.org/contributor-guide/gsoc_application_guidelines.html


### DataFusion core

https://github.com/apache/datafusion

- 46.0.0 was released on 2025-03-07.
- 45.0.0 was released on 2025-02-07.
- 44.0.0 was released on 2024-12-31.

Releases continue monthly and the project has been very active with many commits a day. It seems more new projects have been using DataFusion for query processing, which brings more contributors but also means we are spending more time fielding questions and figuring out how many more features to accept.

Bruce Ritchie recently authored a [blog] about some of the features and the outlook for the next 6 months. A relevant quote:

> In the core DataFusion repo alone we reviewed and accepted almost 1600 PRs from 206 different committers, created over 1100 issues and closed 751 of them 🚀. 

We have been focusing more recently on pre-release testing and making it easier for downstream consumers to use DataFusion, which is still a challenge given how fast the project is moving. 

[blog]: https://datafusion.apache.org/blog/2025/02/20/datafusion-45.0.0/

### Sub project: DataFusion Python

https://github.com/apache/datafusion-python

- PYTHON-45.2.0 was released on 2025-02-23.
- PYTHON-44.0.0 was released on 2025-02-07.
- PYTHON-43.1.0 was released on 2024-12-12.

We have been working on making it easier to interoperate with other systems, including support for FFI TableProvider ([#12920]) and new user documentation on FFI [#1031]

[#12920]: https://github.com/apache/datafusion/pull/12920 
[#1031]: https://github.com/apache/datafusion-python/pull/1031 

### Sub project: DataFusion Comet

https://github.com/apache/datafusion-comet

- COMET-0.6.0 was released on 2025-02-17.
- COMET-0.5.0 was released on 2025-01-17.

You can read about the recent happenings in Comet in the [0.6.0 blog]


[0.6.0 blog]: https://datafusion.apache.org/blog/2025/02/17/datafusion-comet-0.6.0/


### Sub project: DataFusion Ballista

https://github.com/apache/datafusion-ballista

- BALLISTA-44.0.0 was released on 2025-03-05.

There has been some renewed interest in this project as the foundation for distributed query engines, and we made a new release recently.

### (New!) Sub project: DataFusion Ray

https://github.com/apache/datafusion-ray 

This is a new project aims to make it easier to run DataFusion in a distributed environment using the https://www.ray.io/ compute engine

Contributors are working hard at the moment to get DataFusionRay 0.1.0 out! Hopefully we can do that before the announcement and then there should be plenty to add.

### Sub project: sqlparser-rs

https://github.com/apache/datafusion-sqlparser-rs 

We have made two releases since sqlparser became part of DataFusion. 

- SQLPARSER-0.55.0 was released on 2025-03-05.
- SQLPARSER-0.54.0 was released on 2025-01-23.
- SQLPARSER-0.53.0 was released on 2024-12-18.

Ifeanyi Ubah (iffyio) is doing a great job reviewing PRs to keep the code consistent and flowing. 

## Community Health:

While we as always struggle with code review capacity, we have many
active committers, and the community in general helps each other out with
reviews. We continue to actively grow our committer and PMC ranks.

We had several in person meetups in Chicago, Boston, and Amsterdam, and are working on organizing one in London in April 2025[1].

[1]: https://github.com/apache/datafusion/discussions/14647

@alamb
Copy link
Contributor Author

alamb commented Mar 12, 2025

Submitted!

Final Draft

## Description:
The mission of Apache DataFusion is the creation and maintenance of software 
related to an extensible query engine

## Project Status:
Current project status: New + Ongoing (high activity)
Issues for the board: None


## Membership Data:
Apache DataFusion was founded 2024-04-16 (a year ago)
There are currently 44 committers and 15 PMC members in this project.
The Committer-to-PMC ratio is roughly 3:1.

Community changes, past quarter:
- Jonah Gao was added to the PMC on 2024-12-16
- Parth Chandra was added as committer on 2025-03-10
- Yongting You was added as committer on 2025-01-18


## Project Activity:

### Overall

DataFusion is participating in Google Summer of Code with a number of ideas
for projects with mentors already selected[1][2][3]. Additionally, some ideas
on how to make DataFusion an ideal selection for university database projects
such as the CMU database classes have been put forward.

[1]: https://github.com/apache/datafusion/issues/14577
[2]: https://summerofcode.withgoogle.com/programs/2025/
 organizations/apache-datafusion
[3]: https://datafusion.apache.org/
contributor-guide/gsoc_application_guidelines.html


### DataFusion core

https://github.com/apache/datafusion

- 46.0.0 was released on 2025-03-07.
- 45.0.0 was released on 2025-02-07.
- 44.0.0 was released on 2024-12-31.

Releases continue monthly and the project has been very active with many
commits a day. It seems more new projects have been using DataFusion for query
processing, which brings more contributors but also means we are spending more
time fielding questions and figuring out how many more features to accept.

Bruce Ritchie recently authored a [blog] about some of the features and the
outlook for the next 6 months. A relevant quote:

> In the core DataFusion repo alone we reviewed and accepted almost 1600 PRs
  from 206 different committers, created over 1100 issues and closed 751 of
  them 🚀.

We have been focusing more recently on pre-release testing and making it
easier for downstream consumers to use DataFusion, which is still a challenge
given how fast the project is moving.

[blog]: https://datafusion.apache.org/blog/ 
2025/02/20/datafusion-45.0.0/

### Sub project: DataFusion Python

https://github.com/apache/datafusion-python

- PYTHON-45.2.0 was released on 2025-02-23.
- PYTHON-44.0.0 was released on 2025-02-07.
- PYTHON-43.1.0 was released on 2024-12-12.

We have been working on making it easier to interoperate with other systems,
including support for FFI TableProvider ([#12920]) and new user documentation
on FFI [#1031]

[#12920]: https://github.com/apache/datafusion/pull/12920
[#1031]: https://github.com/apache/datafusion-python/pull/1031

### Sub project: DataFusion Comet

https://github.com/apache/datafusion-comet

- COMET-0.6.0 was released on 2025-02-17.
- COMET-0.5.0 was released on 2025-01-17.

You can read about the recent happenings in Comet in the [0.6.0 blog]


[0.6.0 blog]: https://datafusion.apache.org/blog/
 2025/02/17/datafusion-comet-0.6.0/


### Sub project: DataFusion Ballista

https://github.com/apache/datafusion-ballista

- BALLISTA-44.0.0 was released on 2025-03-05.

There has been some renewed interest in this project as the foundation for
distributed query engines, and we made a new release recently.

### (New!) Sub project: DataFusion Ray

https://github.com/apache/datafusion-ray

This is a new project aims to make it easier to run DataFusion in a
distributed environment using the https://www.ray.io/ compute engine

Contributors are working hard at the moment to get DataFusionRay 0.1.0 out!
Hopefully we can do that before the announcement and then there should be
plenty to add.

### Sub project: sqlparser-rs

https://github.com/apache/datafusion-sqlparser-rs

We have made two releases since sqlparser became part of DataFusion.

- SQLPARSER-0.55.0 was released on 2025-03-05.
- SQLPARSER-0.54.0 was released on 2025-01-23.
- SQLPARSER-0.53.0 was released on 2024-12-18.

Ifeanyi Ubah (iffyio) is doing a great job reviewing PRs to keep the code
consistent and flowing.

## Community Health:

While we as always struggle with code review capacity, we have many active
committers, and the community in general helps each other out with reviews. We
continue to actively grow our committer and PMC ranks.

We had several in person meetups in Chicago, Boston, and Amsterdam, and are
working on organizing one in London in April 2025[1].

[1]: https://github.com/apache/datafusion/discussions/14647

I will file a follow on ticket for the next report and then close this ticket

@alamb
Copy link
Contributor Author

alamb commented Mar 12, 2025

Next report tracked in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants