Skip to content

Commit

Permalink
Merge branch 'main' into feature-fix-array-lowcard
Browse files Browse the repository at this point in the history
  • Loading branch information
sharang authored Sep 13, 2023
2 parents e9e5266 + 19422a3 commit 73f38b5
Show file tree
Hide file tree
Showing 19 changed files with 176 additions and 155 deletions.
29 changes: 13 additions & 16 deletions README-CN.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
<p align="center">
<img src="./docs/deepflow-logo.png" alt="DeepFlow" width="300" />

<p align="center">DeepFlow is an automated observability platform for cloud-native developers.</p>
<p align="center">Instant Observability for Cloud-Native Applications</p>
<p align="center">Zero Code, Full Stack, eBPF & Wasm</p>
</p>
<p align="center">
<a href="https://zenodo.org/badge/latestdoi/448599559"><img src="https://zenodo.org/badge/448599559.svg" alt="DOI"></a>
Expand All @@ -16,16 +17,15 @@

# 什么是 DeepFlow

DeepFlow 是一款面向云原生开发者的**高度自动化**的可观测性平台。使用 **eBPF**、WASM、OpenTelemetry 等新技术,DeepFlow 创新的实现了 **AutoTracing****AutoMetrics****AutoTagging****SmartEncoding** 等核心机制,极大的避免了埋点插码,显著的降低了后端数仓的资源开销。基于 DeepFlow 的可编程性和开放接口,开发者可以快速将其融入到自己的可观测性技术栈中
DeepFlow 开源项目旨在为复杂的云基础设施及云原生应用提供深度可观测性。DeepFlow 基于 eBPF 实现了**零侵扰****Zero Code**)的指标、分布式追踪、调用日志、函数剖析数据采集,并结合智能标签(**SmartEncoding**)技术实现了所有观测数据的**全栈****Full Stack**)关联和高效存取。使用 DeepFlow,可以让云原生应用自动具有深度可观测性,从而消除开发者不断插桩的沉重负担,并为 DevOps/SRE 团队提供从代码到基础设施的监控及诊断能力

# 六大主要特性
# 核心特性

- **全栈**:DeepFlow 使用 eBPF 和 cBPF 技术实现的 **AutoMetrics** 机制,可以自动采集任何应用的 RED(Request、Error、Delay)性能指标,精细至每一次应用调用,覆盖从应用到基础设施的所有软件技术栈。在云原生环境中,DeepFlow 的 **AutoTagging** 机制自动发现服务、实例、API 的属性信息,自动为每个观测数据注入丰富的标签,从而消除数据孤岛,并释放数据的下钻能力。
- **全链路**:DeepFlow 使用 eBPF 技术创新的实现了 **AutoTracing** 机制,在云原生环境中自动追踪任意微服务、基础设施服务的分布式调用链。在此基础上,通过集成并自动关联来自 OpenTelemetry 的数据,DeepFlow 实现了完整的全栈、全链路分布式追踪,消除了所有盲点。
- **高性能**:DeepFlow 创新的 **SmartEncoding** 标签注入机制,能够将数据存储性能提升 10 倍,从此告别高基数和采样的焦虑。DeepFlow 使用 Rust 实现 Agent,拥有极致处理性能的同时保证内存安全。DeepFlow 使用 Golang 实现 Server,重写了 Golang 的 map、pool 基础库,数据查询和内存 GC 均有近 10 倍的性能提升。
- **可编程**:DeepFlow 目前支持了对 HTTP(S)、Dubbo、MySQL、PostgreSQL、Redis、Kafka、MQTT、DNS 协议的解析,并将保持迭代增加更多的应用协议支持。除此之外,DeepFlow 基于 WASM 技术提供了可编程接口,让开发者可以快速具备对私有协议的解析能力,并可用于构建特定场景的业务分析能力,例如 5GC 信令分析、金融交易分析、车机通信分析等。
- **开放接口**:DeepFlow 拥抱开源社区,支持接收广泛的可观测数据源,并利用 AutoTagging 和 SmartEncoding 提供高性能、统一的标签注入能力。DeepFlow 支持插件式的数据库接口,开发者可自由增加和替换最合适的数据库。DeepFlow 为所有观测数据提供统一的标准 SQL 查询能力,便于使用者快速集成到自己的可观测性平台中。
- **易于维护**:DeepFlow 的内核仅由 Agent、Server 两个组件构成,将复杂度隐藏在进程内部,将维护难度降低至极致。DeepFlow Server 集群可对多个 Kubernetes 集群、传统服务器集群、云服务器集群进行统一监控,且无需依赖任何外部组件即可实现水平扩展与负载均衡。
- **任意 Service 的全景图**:利用 eBPF **零侵扰**绘制生产环境的全景图,包括任意语言开发的服务、未知代码的第三方服务、所有的云原生基础设施服务。内置标准协议解析能力,并提供 Wasm 插件机制扩展解析任意私有协议。零侵扰计算每一次调用在应用程序和基础设施中的**全栈**黄金指标,快速定界性能瓶颈。
- **任意 Request 的分布式追踪**:基于 eBPF 的**零侵扰**分布式追踪能力,支持任意语言的应用程序,并完整覆盖网关、服务网格、数据库、消息队列、DNS、网卡等各类基础设施,不留下任何追踪盲点。**全栈**,自动采集每个 Span 关联的网络性能指标和文件读写事件。从此,分布式追踪进入零插桩的新时代。
- **任意 Function 的持续性能剖析**:以低于 1% 的开销零侵扰采集生产环境进程的性能剖析数据,绘制函数粒度的 OnCPU、OffCPU 火焰图,快速定位应用函数、库函数、内核函数的全栈性能瓶颈,并自动关联至分布式追踪数据。即使在 2.6+ 内核版本下,仍然可提供网络性能剖析能力,洞察代码性能瓶颈。
- **无缝集成流行的可观测性技术栈**:可作为 Prometheus、OpenTelemetry、SkyWalking、Pyroscope 的存储后端,亦可提供 **SQL、PromQL、OTLP** 等数据接口作为流行技术栈的数据源。自动为所有观测信号注入统一标签,包括云资源、K8s 容器资源、K8s Label/Annotation、CMDB 中的业务属性等,消除数据孤岛。
- **存储性能10x ClickHouse**:基于 **SmartEncoding** 机制,向所有观测信号注入标准化的、预编码的元标签,相比 ClickHouse 的 String 或 LowCard 方案均可将存储开销降低 10x。自定义标签与观测数据分离存储,从此你可安心注入近乎无限维度和基数的标签,且可获得像 **BigTable** 一样的轻松查询体验。

# 文档

Expand All @@ -46,21 +46,17 @@ DeepFlow 社区版由企业版的核心组件构成。通过开源,我们希

同时我们也搭建了一个完整的 [DeepFlow Community Demo](https://ce-demo.deepflow.yunshan.net/?from=github),欢迎体验。登录账号 / 密码:deepflow / deepflow。

## 体验 DeepFlow Cloud

[DeepFlow Cloud](https://deepflow.yunshan.net/) 是 DeepFlow 的全托管 SaaS 服务,目前处于测试阶段,仅支持中文。

## 体验 DeepFlow Enterprise

[DeepFlow Enterprise](https://www.yunshan.net/products/deepflow.html) 支持对混合云的全栈、全链路监控,覆盖容器、云服务器、宿主机、NFV网关,目前仅支持中文,欢迎联系我们进行体验
你可以访问 [DeepFlow Enterprise Demo](https://deepflow.yunshan.net/)目前仅支持中文。

# 从源码编译 DeepFlow

- [编译 deepflow-agent](./agent/build_cn.md)

# 软件架构

DeepFlow Community 版本主要由 Agent 和 Server 两个进程组成。每个 K8s 容器节点、传统服务器或云服务器中运行一个 Agent ,负责该服务器上所有应用进程的 AutoMetrics 和 AutoTracing 数据采集。Server 运行在一个 K8s 集群中,提供 Agent 管理、标签注入、数据写入、数据查询服务。
DeepFlow Community 版本主要由 Agent 和 Server 两个进程组成。每个 K8s 容器节点、传统服务器或云服务器中运行一个 Agent ,负责该服务器上所有应用进程的数据采集。Server 运行在一个 K8s 集群中,提供 Agent 管理、标签注入、数据写入、数据查询服务。

![DeepFlow 软件架构](./docs/deepflow-architecture.png)

Expand All @@ -80,7 +76,8 @@ DeepFlow Community 版本主要由 Agent 和 Server 两个进程组成。每个
- 感谢 [eBPF](https://ebpf.io/),革命性的 Linux 内核技术
- 感谢 [OpenTelemetry](https://opentelemetry.io/),提供了采集应用可观测性数据的标准 API

# Landscapes
# 荣誉

- DeepFlow 的论文 [Network-Centric Distributed Tracing with DeepFlow: Troubleshooting Your Microservices in Zero Code](https://dl.acm.org/doi/10.1145/3603269.3604823) 已被国际顶会 ACM SIGCOMM 2023 录用
- DeepFlow 已加入 <a href="https://landscape.cncf.io/?selected=deep-flow">CNCF CLOUD NATIVE Landscape</a>
- DeepFlow 已加入 <a href="https://ebpf.io/applications#deepflow">eBPF Project Landscape</a>
25 changes: 11 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
<p align="center">
<img src="./docs/deepflow-logo.png" alt="DeepFlow" width="300" />

<p align="center">DeepFlow is an automated observability platform for cloud-native developers.</p>
<p align="center">Instant Observability for Cloud-Native Applications</p>
<p align="center">Zero Code, Full Stack, eBPF & Wasm</p>
</p>
<p align="center">
<a href="https://zenodo.org/badge/latestdoi/448599559"><img src="https://zenodo.org/badge/448599559.svg" alt="DOI"></a>
Expand All @@ -16,16 +17,15 @@ English | [简体中文](./README-CN.md)

# What is DeepFlow

DeepFlow is a **highly automated** observability platform for cloud-native developers. Using new technologies such as **eBPF**, WASM, and OpenTelemetry, DeepFlow innovatively implements core mechanisms such as **AutoTracing**, **AutoMetrics**, **AutoTagging**, and **SmartEncoding**, which greatly avoids code instrumentation and significantly reduces the resource overhead of back-end data warehouses. With the programmability and open API of DeepFlow, developers can quickly integrate it into their own observability stack.
The DeepFlow open-source project aims to provide deep observability for complex cloud infrastructures and cloud-native applications. DeepFlow implemented **Zero Code** data collection with eBPF for metrics, distributed tracing, request logs and function profiling, and is further intergrated with **SmartEncoding** to achieve **Full Stack** correlation and efficient access to all observability data. With DeepFlow, cloud-native applications automatically gain deep observability, removing the heavy burden of developers continually instrumenting code and providing monitoring and diagnostic capabilities covering everything from code to infrastructure for DevOps/SRE teams.

# Key Features

- **Any Stack**: With the **AutoMetrics** mechanism implemented by **eBPF** and cBPF, DeepFlow can automatically collect RED (Request, Error, Delay) performance metrics of any application, down to every request, covering all software technologie stacks from application to infrastructure. In cloud-native environments, the **AutoTagging** mechanism automatically discovers the attributes of services, instances and APIs, and automatically injects rich tags into each observability data, thereby eliminating data silos and releasing data drill-down capabilities.
- **End to End**: DeepFlow innovatively implements the **AutoTracing** mechanism using **eBPF** technology. It automatically traces the distributed request of any application and infrastructure service in cloud-native environments. On this basis, by integrating and automatically correlating data from OpenTelemetry, DeepFlow implements a complete full-stack, full-path distributed tracing, eliminating all blind spots.
- **High Performance**: The innovative **SmartEncoding** tag injection mechanism can improve the storage performance by 10 times, no more high-cardinality and sampling anxiety. DeepFlow Agent is implemented in Rust for extreme processing performance and memory safety. DeepFlow Server is implemented in Golang, and rewrites standard library map and pool for a nearly 10x performance in data query and memory GC.
- **Programmability**: DeepFlow supports collect HTTP(S), Dubbo, MySQL, PostgreSQL, Redis, Kafka, MQTT and DNS at the moment, and will iterate to support more application protocols. In addition, DeepFlow provides a programmable interface based on WASM technology, allowing developers to add private protocols quickly, and can be used to construct business analysis capabilities for specific scenarios, such as 5GC signaling analysis, financial transaction analysis, vehicle computer communication analysis, etc.
- **Open Interface**: DeepFlow embraces the open source community, supports a wide range of observability data sources, and uses AutoTagging and SmartEncoding to provide high-performance, unified tag injection capabilities. DeepFlow has a plugable database interface, developers can freely add and replace the most suitable database. DeepFlow provides a unified standard SQL query capability for all observability data, which is convenient for users to quickly integrate into their own observability platform.
- **Easy to Maintain**: The core of DeepFlow only consists of two components, Agent and Server, hiding the complexity within the process and reduces the maintenance difficulty to the extreme. The DeepFlow Servers can manage Agents in multiple kubernetes clusters, legacy hosts and cloud hosts in a unified manner, and can achieve horizontal scaling and load balancing without any external components.
- **Universal Map for Any Service**: DeepFlow provides a universal map with **Zero Code** by eBPF for production environments, including your services in any language, third-party services without code and all cloud-native infrastructure services. In addition to analyzing common protocols, Wasm plugins are supported for your private protocols. **Full Stack** golden signals of applications and infrastructures are calculated, pinpointing performance bottlenecks at ease.
- **Distributed Tracing for Any Request**: **Zero Code** distributed tracing powered by eBPF supports applications in any language and infrastructures including gateways, service meshes, databases, message queues, DNS and NICs, leaving no blind spots. **Full Stack** network performance metrics and file I/O events are automatically collected for each Span. Distributed tracing enters a new era: Zero Instrumentation.
- **Continuous Profiling for Any Function**: DeepFlow collects profiling data at a cost of below 1% with **Zero Code**, plots OnCPU/OffCPU function call stack flame graphs, locates **Full Stack** performance bottleneck in application, library and kernel functions, and automatically relates them to distrubuted tracing data. DeepFlow can even analyze code performance through network profiling under old version kernels (2.6+).
- **Seamless Integration with Popular Stack**: DeepFlow can serve as storage backed for Prometheus, OpenTelemetry, SkyWalking and Pyroscope. It also provides **SQL, PromQL and OLTP** APIs to work as data source in popular observability stacks. It injects meta tags for all obervability signals including cloud resource, K8s container, K8s labels, K8s annotations, CMDB business attributes, etc., eliminating data silos.
- **Performance 10x ClickHouse**: **SmartEncoding** injects standardized and pre-encoded meta tags into all observability data, reducing storage overhead by 10x compared to ClickHouse String or LowCard method. Custom tags and observability data are stored separately, making tags available for almost unlimited dimensions and cardinalities with uncompromised query experience like **BigTable**.

# Documentation

Expand All @@ -46,13 +46,9 @@ Please refer to [the deployment documentation](https://deepflow.yunshan.net/docs

At the same time, we have also built a complete [DeepFlow Community Demo](https://ce-demo.deepflow.yunshan.net/?from=github), welcome to experience it. Login account/password: deepflow/deepflow.

## DeepFlow Cloud

[DeepFlow Cloud](https://deepflow.yunshan.net/) is the fully-managed service of DeepFlow, currently in beta and only supports Chinese.

## DeepFlow Enterprise

[DeepFlow Enterprise](https://www.yunshan.net/products/deepflow.html) supports full-stack and end-to-end monitoring of hybrid cloud, covering containers, cloud servers, hosts, and NFV gateways, currently only supports Chinese, welcome to contact us for experience.
You can visit the [DeepFlow Enterprise Demo](https://deepflow.yunshan.net/), currently available in Chinese only.

# Compile DeepFlow from Source

Expand Down Expand Up @@ -80,7 +76,8 @@ Here is our [future feature plan](https://deepflow.yunshan.net/docs/about/milest
- Thanks [eBPF](https://ebpf.io/), a revolutionary Linux kernel technology.
- Thanks [OpenTelemetry](https://opentelemetry.io/), provides vendor-neutral APIs to collect application telemetry data.

# Landscapes
# Honors

- The paper [Network-Centric Distributed Tracing with DeepFlow: Troubleshooting Your Microservices in Zero Code](https://dl.acm.org/doi/10.1145/3603269.3604823) has been accepted by ACM SIGCOMM 2023.
- DeepFlow enriches the <a href="https://landscape.cncf.io/?selected=deep-flow">CNCF CLOUD NATIVE Landscape</a>.
- DeepFlow enriches the <a href="https://ebpf.io/applications#deepflow">eBPF Project Landscape</a>.
10 changes: 8 additions & 2 deletions agent/src/ebpf/kernel/include/protocol_inference.h
Original file line number Diff line number Diff line change
Expand Up @@ -896,12 +896,13 @@ static __inline enum message_type infer_dns_message(const char *buf,
return MSG_UNKNOWN;
}

bool update_tcp_dns_prev_count = false;
struct dns_header *dns = (struct dns_header *)buf;
if (conn_info->tuple.l4_protocol == IPPROTO_TCP) {
if (__bpf_ntohs(dns->id) + 2 == count) {
dns = (void *)dns + 2;
} else {
conn_info->prev_count = 2;
update_tcp_dns_prev_count = true;
}
}

Expand Down Expand Up @@ -941,7 +942,12 @@ static __inline enum message_type infer_dns_message(const char *buf,
break;
}
}

// coreDNS will first send the length in two bytes. If it recognizes
// that it is TCP DNS and does not have a length field, it will modify
// the offset to correct the TCP sequence number.
if (update_tcp_dns_prev_count) {
conn_info->prev_count = 2;
}
return (qr == 0) ? MSG_REQUEST : MSG_RESPONSE;
}

Expand Down
13 changes: 12 additions & 1 deletion agent/src/ebpf/user/common.c
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@
#include "log.h"
#include "string.h"

static u64 g_sys_btime_msecs;

bool is_core_kernel(void)
{
return (access("/sys/kernel/btf/vmlinux", F_OK) == 0);
Expand Down Expand Up @@ -382,6 +384,9 @@ uint64_t gettime(clockid_t clk_id, int flag)
*/
u64 get_sys_btime_msecs(void)
{
if (g_sys_btime_msecs > 0)
goto done;

char buff[4096];

FILE *fp = fopen("/proc/stat", "r");
Expand All @@ -396,7 +401,11 @@ u64 get_sys_btime_msecs(void)
fclose(fp);
ASSERT(sys_boot > 0);

return (sys_boot * 1000UL);
if (g_sys_btime_msecs == 0)
g_sys_btime_msecs = sys_boot * 1000UL;

done:
return g_sys_btime_msecs;
}

/*
Expand Down Expand Up @@ -935,11 +944,13 @@ int exec_command(const char *cmd, const char *args)
return -1;
}

#ifdef PROFILE_JAVA_DEBUG
/* Read and print the output */
char buffer[1024];
while (fgets(buffer, sizeof(buffer), fp) != NULL) {
ebpf_info("%s", buffer);
}
#endif

rc = pclose(fp);
if (-1 == rc) {
Expand Down
4 changes: 2 additions & 2 deletions cli/ctl/trisolaris_check.go
Original file line number Diff line number Diff line change
Expand Up @@ -527,10 +527,10 @@ func Uint64ToMac(v uint64) net.HardwareAddr {

func formatString(data *trident.Interface) string {
buffer := bytes.Buffer{}
format := "Id: %d Mac: %s VMac: %s EpcId: %d DeviceType: %d DeviceId: %d IfType: %d" +
format := "Id: %d Mac: %s EpcId: %d DeviceType: %d DeviceId: %d IfType: %d" +
" LaunchServer: %s LaunchServerId: %d RegionId: %d AzId: %d, PodGroupId: %d, " +
"PodNsId: %d, PodId: %d, PodClusterId: %d, NetnsId: %d, VtapId: %d, IsVipInterface: %t "
buffer.WriteString(fmt.Sprintf(format, data.GetId(), Uint64ToMac(data.GetMac()), Uint64ToMac(data.GetVmac()),
buffer.WriteString(fmt.Sprintf(format, data.GetId(), Uint64ToMac(data.GetMac()),
data.GetEpcId(), data.GetDeviceType(), data.GetDeviceId(), data.GetIfType(),
data.GetLaunchServer(), data.GetLaunchServerId(), data.GetRegionId(),
data.GetAzId(), data.GetPodGroupId(), data.GetPodNsId(), data.GetPodId(),
Expand Down
2 changes: 1 addition & 1 deletion cli/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ go 1.18

require (
github.com/bitly/go-simplejson v0.5.0
github.com/deepflowio/deepflow/message v0.0.0-20230824032327-76b439a46f5f
github.com/deepflowio/deepflow/message v0.0.0-20230912070610-38fcd29f7ea4
github.com/deepflowio/deepflow/server v0.0.0-20230829023524-a5893729cef4
github.com/golang/protobuf v1.5.2
github.com/mattn/go-runewidth v0.0.14
Expand Down
Loading

0 comments on commit 73f38b5

Please sign in to comment.