You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CodeFuse-Query is a powerful static code analysis platform suitable for large-scale, complex codebase analysis scenarios. Its data-centric approach and high scalability give it a unique advantage in the modern software development environment. In the future, as static code analysis technology continues to evolve, CodeFuse-Query is expected to play an even more significant role in this field.
29
+
30
+
Overall, the CodeFuse-Query code data platform is divided into three main parts: code data model, code query DSL, and platform productization services.
31
+
### Code Data Model: COREF
32
+
We have defined a code data and standardization model: COREF, which requires all code to be converted to this model through various language extractors.
Note: Since the computation difficulty of each type of information varies, not all languages' COREF information includes all the above. The basic information mainly consists of AST, ASG, Call Graph, Class Hierarchy, and Documentation, while other information (CFG and PDG) is still under construction and will be gradually supported.
36
+
### Code Query DSL
37
+
Based on the generated COREF code data, CodeFuse-Query uses a custom DSL language called **Gödel**for queries to meet code analysis needs.
38
+
Gödel is a logical reasoning language based on the logical reasoning language Datalog, which derives new facts through "facts" and "rules". Gödel is also a declarative language, which, compared to imperative programming, focuses more on describing "what is needed" and leaves the implementation to the computation engine.
39
+
Since the code has been transformed into relational data (COREF data is stored in the form of relational data tables), one might wonder why not use SQL directly or use an SDK, but instead learn a new DSL language. The reason is that Datalog has monotonicity and termination properties, meaning that Datalog sacrifices some expressive power, and Gödel inherits this characteristic.
40
+
41
+
-Compared to SDKs, Gödel's main advantage is ease of learning and use; its declarative nature means users do not need to focus on intermediate computations but can describe their needs simply, like with SQL.
42
+
-Compared to SQL, Gödel's advantages are stronger descriptive ability and faster computation speed, for example, in describing recursive algorithms and multi-table joint queries, which are difficult for SQL.
43
+
### Platformization, Productization
44
+
CodeFuse-Query includes the **Sparrow CLI** and the online service **Query Center**. Sparrow CLI contains all components and dependencies, such as extractors, data model, compiler, etc., allowing users to generate code data and conduct queries locally (for Sparrow CLI usage, please see Section 3: Installation, Configuration, and Running). If users require online queries, they can experiment using the Query Center.
45
+
## Supported Programming Languages for Analysis
46
+
As of now, CodeFuse-Query supports data analysis for 11 programming languages. Among them, support for 5 languages (Java, JavaScript, TypeScript, XML, Go) is very mature, while the remaining 6 languages (Object-C, C++, Python3, Swift, SQL, Properties) are in beta stage and have room for further improvement and perfection. The specific support status is shown in the table below:
47
+
48
+
|Language|Status|COREF Model Node Count|
49
49
| --- | --- | --- |
50
-
| Java | 成熟 | 162 |
51
-
| XML | 成熟 | 12 |
52
-
| TS/JS | 成熟 | 392 |
53
-
| Go | 成熟 | 40 |
54
-
| OC/C++ | beta | 53/397 |
55
-
| Python3 | beta | 93 |
56
-
| Swift | beta | 248 |
57
-
| SQL | beta | 750 |
58
-
| Properties | beta | 9 |
59
-
60
-
注:以上语言状态的成熟程度判断标准是根据COREF包含的信息种类和实际落地情况来进行判定,除了OC/C++外,所有语言均支持了完整的AST信息和Documentation信息,以Java为例,COREF for Java还支持了ASG、Call Graph、Class Hierarchy、以及部分CFG信息。
61
-
## 使用场景
62
-
### 查询代码特征
63
-
小开发同学想知道 Repo A 里面使用了哪些 String 型的变量,所以他写了一个 Gödel 如下,交给 CodeFuse-Query 系统给他返回了结果。
Note: The maturity level of the language status is determined based on the types of information contained in COREF and the actual implementation. Except for OC/C++, all languages support complete AST information and Documentation, and in the case of Java, COREF for Java also supports ASG, Call Graph, Class Hierarchy, and some CFG information.
61
+
62
+
## Quick Start
63
+
[Installation, Configuration, and Running](./doc/3_install_and_run.md)
64
+
65
+
## Documentation
66
+
-[Abstract](./doc/1_abstract.md)
67
+
-[Introduction](./doc/2_introduction.md)
68
+
-[User Case](./doc/user_case.en.md)
69
+
-[Installation, Configuration, and Running](./doc/3_install_and_run.md)
-`cli`: The entry point for the command-line tool, providing a unified command-line interface, calling other modules to complete specific functions
79
+
-`language`: Core data and data modeling (lib) for various languages. Regarding the degree of openness, please refer to the section "Some Notes on the Scope of Open Source"
80
+
-`doc`: Reference documents
81
+
-`examples`: Gödel query language examples
82
+
-`tutorial`:CodeFuse-Query Development Container Usage Tutorial
As of now, it is **not possible** to build an executable program from the source code because not all modules have been made open-source in this release, and missing modules will be released over the next year. Nevertheless, to ensure a complete experience, we have released **complete installation packages** for download, please see the Release page.
86
+
Regarding the openness of languages, you can refer to the table below:
136
87
137
-
|语言|数据建模开源 | 数据化核心开源 | 成熟度|
88
+
|Language|Data Modeling Open Source | Data Core Open Source | Maturity|
[](https://star-history.com/#codefuse-ai/CodeFuse-Query&Date)
0 commit comments