@@ -25,3 +25,77 @@ possible. You can find the most up to date version in the [source code].
25
25
26
26
[ crates.io documentation ] : https://docs.rs/datafusion/latest/datafusion/index.html#architecture
27
27
[ source code ] : https://github.com/apache/datafusion/blob/main/datafusion/core/src/lib.rs
28
+
29
+ ## Forks vs Extension APIs
30
+
31
+ DataFusion is a fast moving project, which results in frequent internal changes.
32
+ This benefits DataFusion by allowing it to evolve and respond quickly to
33
+ requests, but also means that maintaining a fork with major modifications
34
+ sometimes requires non trivial work.
35
+
36
+ The public API (what is accessible if you use the DataFusion releases from
37
+ crates.io) is typically much more stable (though it does change from release to
38
+ release as well).
39
+
40
+ Thus, rather than forks, we recommend using one of the many extension APIs (such
41
+ as ` TableProvider ` , ` OptimizerRule ` , or ` ExecutionPlan ` ) to customize
42
+ DataFusion. If you can not do what you want with the existing APIs, we would
43
+ welcome you working with us to add new APIs to enable your use case, as
44
+ described in the next section.
45
+
46
+ ## ` datafusion-contrib `
47
+
48
+ While DataFusions comes with enough features "out of the box" to quickly start
49
+ with a working system, it can't include everything useful feature (e.g.
50
+ ` TableProvider ` s for all data formats). The [ ` datafusion-contrib ` ] project
51
+ contains a collection of community maintained extensions that are not part of
52
+ the core DataFusion project, and not under Apache Software Foundation governance
53
+ but may be useful to others in the community. If you are interested adding a
54
+ feature to DataFusion, a new extension in ` datafusion-contrib ` is likely a good
55
+ place to start. Please [ contact] us via github issue, slack, or Discord and
56
+ we'll gladly set up a new repository for your extension.
57
+
58
+ [ `datafusion-contrib` ] : https://github.com/datafusion-contrib
59
+ [ contact ] : ../contributor-guide/communication.md
60
+
61
+ ## Creating new Extension APIs
62
+
63
+ DataFusion aims to be a general-purpose query engine, and thus the core crates
64
+ contain features that are useful for a wide range of use cases. Use case specific
65
+ functionality (such as very specific time series or stream processing features)
66
+ are typically implemented using the extension APIs.
67
+
68
+ If have a use case that is not covered by the existing APIs, we would love to
69
+ work with you to design a new general purpose API. There are often others who are
70
+ interested in similar extensions and the act of defining the API often improves
71
+ the code overall for everyone.
72
+
73
+ Extension APIs that provide "safe" default behaviors are more likely to be
74
+ suitable for inclusion in DataFusion, while APIs that require major changes to
75
+ built-in operators are less likely. For example, it might make less sense
76
+ to add an API to support a stream processing feature if that would result in
77
+ slower performance for built-in operators. It may still make sense to add
78
+ extension APIs for such features, but leave implementation of such operators in
79
+ downstream projects.
80
+
81
+ The process to create a new extension API is typically:
82
+
83
+ - Look for an existing issue describing what you want to do, and file one if it
84
+ doesn't yet exist.
85
+ - Discuss what the API would look like. Feel free to ask contributors (via ` @ `
86
+ mentions) for feedback (you can find such people by looking at the most
87
+ recently changed PRs and issues)
88
+ - Prototype the new API, typically by adding an example (in
89
+ ` datafusion-examples ` or refactoring existing code) to show how it would work
90
+ - Create a PR with the new API, and work with the community to get it merged
91
+
92
+ Some benefits of using an example based approach are
93
+
94
+ - Any future API changes will also keep your example going ensuring no
95
+ regression in functionality
96
+ - There will be a blue print of any needed changes to your code if the APIs do change
97
+ (just look at what changed in your example)
98
+
99
+ An example of this process was [ creating a SQL Extension Planning API] .
100
+
101
+ [ creating a sql extension planning api ] : https://github.com/apache/datafusion/issues/11207
0 commit comments