Skip to content

Commit

Permalink
docs: add apache remove plan document (#1173)
Browse files Browse the repository at this point in the history
  • Loading branch information
HeyJavaBean authored Nov 29, 2024
1 parent 26300cc commit 8c4e0d3
Show file tree
Hide file tree
Showing 2 changed files with 350 additions and 0 deletions.
174 changes: 174 additions & 0 deletions content/en/docs/kitex/Best Practice/remove_apache_codec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
---
title: "Kitex Remove Apache Thrift User Guide"
linkTitle: "Kitex Remove Apache Thrift User Guide"
weight: 4
date: 2024-11-29
keywords: ["Kitex", "thrift", "apache"]
description: "This document introduce how Kitex remove Apache Thrift in the future"
---

# Background

Kitex uses the github.com/apache/thrift library for some encoding and decoding tasks and also generates native encoding and decoding code for Apache Codec (located in kitex_gen under Read, Write, and other content).

However, in actual usage, most services do not require this content, leading to code redundancy and a series of issues due to the dependency on Apache Thrift.

In future versions, we plan to gradually remove the dependency on Apache Thrift. The expected outcomes after removal include:
- A nearly 50% reduction in the size of kitex_gen output
- Elimination of issues related to the lock-in with Apache Thrift version 0.13.0


# Remove Apache Thrift Dependency

To facilitate expression, the following libraries are respectively referred to as:

> - github.com/apache/thrift => apache thrift
> - github.com/cloudwego/kitex/pkg/protocol/bthrift => kitex bthrift
> - github.com/cloudwego/gopkg/protocol/thrift => gopkg thrift
Kitex will gradually eliminate its dependency on the Apache Thrift library in two stages in the go.mod file.

## Phase 1(v0.11.0)

Kitex will remove the import dependency on Apache Thrift from the generated code and replace it with kitex bthrift and gopkg thrift.

The imports in the generated code and corresponding methods like Read, Write, etc., will also be replaced. After regenerating the code, the import dependency on Apache Thrift in kitex_gen will be transitioned to the latter two libraries.

## Phase 2(v0.12.0)

Kitex will transform the bthrift library into an independent submodule, consolidating the dependency on Apache Thrift in the go.mod file of this submodule. Additionally, with changes in the generated code, if users do not generate Apache Codec interfaces in kitex_gen (see details below), the Kitex project will no longer introduce the dependency on github.com/apache/thrift.

Furthermore, all usage of thrift encoding and decoding-related interfaces will be consolidated into the independently maintained gopkg thrift (such as the FastCodec interface definition, fastthrift tool interface, etc.).

At this point, Kitex's go.mod file will no longer actively depend on Apache Thrift, thereby resolving issues related to specific version constraints like v0.13.0.

# Remove Apache Codec Codegen

Within kitex_gen, two sets of serialization code will be generated: FastCodec (cloudwego) and Apache Codec (apache).

The usage of Apache Codec is quite low, with the majority of services utilizing FastCodec for encoding and decoding. Since most services do not make use of the Apache native interfaces,

to align with the dependency removal, the code generation within kitex_gen will gradually eliminate the generation code and interfaces related to Apache Codec. This process will also result in significant reduction in output size.

We will undertake this in three stages.

## Phase 1 (<= v0.11.0)

By adding the parameter -thrift no_default_serdes to the kitex tool, it is possible to avoid generating Apache Codec.

In versions up to v0.11.0, we will maintain the exact same generation behavior without any changes, providing users with a transitional buffer period.

If you wish to actively remove Apache Codec to reduce output size by half, you can refer to the "Appendix" at the end of the document.

## Phase 2 (v0.12.0)

We will add warnings and logging before the Apache Codec Read/Write operations generated by kitex_gen to strongly remind users. Additionally, we will assist in the transformation of some critical services.

If you encounter warnings mentioning Apache Codec after starting a service, please refer to the "How to Actively Remove" section in the appendix at the end of the document. Replace Apache Codec with Fast Codec to achieve better encoding and decoding performance, while also avoiding any impact from our future removal of Apache Codec.

## Phase 3 (v0.13.0)

In version 1.18.0 of Kitex tools, the default behavior will be to not generate Apache Codec code and solely utilize FastCodec.

This change will reduce the output size to only half of the original, and the go.mod file will no longer depend on the github.com/apache/thrift library.
- User Impact: Those directly using Apache Codec for serialization will encounter missing interfaces, resulting in an impact (RPC calls will not be affected).
- If needed, users can retain the generation of Apache Codec by specifying parameters. Specific operational manuals will be provided after the version release.
- To prevent compilation failures due to missing Read/Write interfaces, we will also offer an Apache Adaptor for generating bridge code. Subsequently, we will publish corresponding usage instructions.

## How to check if you're using Apache Codec

While the dependency on apache/thrift has been removed from kitex_gen, there may still be instances of apache/thrift encoding and decoding in other parts of your project. This approach is relatively inefficient and heavily relies on the Apache native Read, Write interfaces generated within kitex_gen.

In future plans, Kitex intends to no longer generate these contents by default. This change may impact scenarios where apache/thrift is still being used. It is recommended to replace such usage with Kitex's efficient FastCodec encoding and decoding. The specific method for doing so is outlined as follows:

### 1. Use Apache/thrift lib to marshal and unmarshal

If you have code such as:
```go
func GetThriftBinary(ctx context.Context, msg apache_thrift.TStruct) ([]byte, error) {
t := apache_thrift.NewTMemoryBufferLen(1024)
p := apache_thrift.NewTBinaryProtocolFactoryDefault().GetProtocol(t)

tser := &apache_thrift.TSerializer{
Transport: t,
Protocol: p,
}

bs, err := tser.Write(ctx, msg)
if err != nil {
return nil, err
}
return bs, nil
}

func ParseThriftBinary(msg apache_thrift.TStruct, by []byte) error {
t := apache_thrift.NewTMemoryBufferLen(1024)
p := apache_thrift.NewTBinaryProtocolFactoryDefault().GetProtocol(t)

deser := &apache_thrift.TDeserializer{Transport: t, Protocol: p}
_ = deser.Transport.Close()
err := deser.Read(msg, by)
if err != nil {
return err
}
return nil
}
```

Instead, you can use Kitex [FastCodec](https://github.com/cloudwego/gopkg/blob/main/protocol/thrift/fastcodec.go):

```go
// msg is a struct within kitex_gen, it will have methods such as FastRead and FastWriteNoCopy.

import github.com/cloudwego/kitex/pkg/utils/fastthrift

// marshal
if msg, ok := data.(thrift.FastCodec); ok {
payload := thrift.FastMarshal(msg)
}

// unmarshal
if msg, ok := data.(thrift.FastCodec);ok {
err = thrift.FastUnmarshal(buf, msg)
}
```

### 2. Make sure that FastCodec is the codec for Kitex RPC

Keyword search: WithPayloadCodec(thrift.NewThriftCodecDisableFastMode(true, true))

If you have this snippet in your code repository, it indicates that RPC requests have disabled FastCodec and are using the lower-performance Apache native Codec.

It is recommended to remove this option to enhance encoding and decoding performance.

# How to Actively Remove Apache Codec
> Removing Apache Codec can reduce output size by almost half. If your project does not fall into the minority usage scenarios mentioned earlier, you can proactively configure your settings to avoid generating this portion of code.
Make sure your Kitex version is above v0.11.0.

1. Generate code again with param: `-thrift no_default_serdes`

```shell
kitex -module xxx -thrift no_default_serdes xxx.thrift
```

Then there's no Apache Codec, the size of kitex_gen is almost half smaller.

2. Enable Skip Decoder,add the following option when creating client or server:
```go
import (
"github.com/cloudwego/kitex/pkg/remote/codec/thrift"
"demo/kitex_gen/kitex/samples/echo/echoservice"
)

func main() {
cli := echoservice.MustNewClient("kitex.samples.echo",
client.WithPayloadCodec(thrift.NewThriftCodecWithConfig(thrift.FastRead|thrift.FastWrite|thrift.EnableSkipDecoder)),
)

srv := echoservice.NewServer(handler,
server.WithPayloadCodec(thrift.NewThriftCodecWithConfig(thrift.FrugalWrite|thrift.FrugalRead|thrift.EnableSkipDecoder)),
)
}
```

In this way, when your service receives a Thrift Buffered message, decoding and encoding will be accomplished through SkipDecoder + FastCodec, no longer relying on Apache Codec. For other scenarios, such as TTHeader or Mesh scenarios, the logic remains unchanged, all directly utilizing FastCodec.
176 changes: 176 additions & 0 deletions content/zh/docs/kitex/Best Practice/remove_apache_codec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
---
title: "Kitex 去 Apache Thrift 用户手册"
linkTitle: "Kitex 去 Apache Thrift 用户手册"
weight: 4
date: 2024-11-29
keywords: ["Kitex", "thrift", "apache"]
description: "介绍 Kitex 去除 Apache Thrift 依赖的计划和注意事项。"
---

# 背景

Kitex 里使用到了 github.com/apache/thrift 库来进行一些编解码,并且也会生成 Apache Codec 原生的编解码代码(位于 kitex_gen 下的 Read、Write 等内容)

但实际使用中,大部分服务不需要这些内容,造成了代码冗余,而且对 apache thrift 的依赖也导致了一系列问题

在后续的版本里,我们打算逐步移除对 Apache Thrift 的依赖。去除后的预期效果:
- kitex_gen 减少近半的产物体积
- 不再有 apache/thrift v0.13.0 版本锁死的问题


# 去 Apache Thrift 依赖库

为了方便表述,以下库分别简称为:

> - github.com/apache/thrift => apache thrift
> - github.com/cloudwego/kitex/pkg/protocol/bthrift => kitex bthrift
> - github.com/cloudwego/gopkg/protocol/thrift => gopkg thrift
Kitex 将分为两个阶段摆脱 go.mod 中对 apache thrift 库的依赖

## 阶段1(v0.11.0)

Kitex 会从生成代码里去除对 apache thrift 的 import 依赖,用 kitex bthrift 和 gopkg thrift 来替代

生成代码里的 import 以及对应的 Read、Write 等方法也会被替换。重新生成代码后,kitex_gen 里对 apache thrift 的 import 依赖也会转为后两者。

## 阶段2(v0.12.0)

Kitex 将把 bthrift 库变成一个独立的子模块,对 apache thrift 的依赖也被收敛到这个子模块的 go.mod 里。同时,通过配合生成代码的变化,当用户不在 kitex_gen 中生成 apache codec 接口时(详见下文),kitex 项目就不再引入 github.com/apache/thrift 依赖了。

并且所有 thrift 编解码相关的接口用法都会收敛到 gopkg thrift 独立维护(例如 FastCodec interface 定义、fastthrift 工具接口等)

至此,Kitex 的 go.mod 里将不再主动依赖 apache thrift,也不会有 v0.13.0 卡版本等问题

# 去 Apache Codec 生成代码

kitex_gen 里会生成两份序列化代码:FastCodec(cloudwego)、Apache Codec(原生)。

Apache Codec 使用场景很低,大部分服务都通过 FastCodec 来编解码。由于大部分服务都没有使用到 Apache 原生接口,

为了配合移除依赖,kitex_gen 生成代码里也要逐步把 Apache Codec 的生成代码和接口剔除掉。同时也能节省很大的产物体积。

我们将分为三个阶段来完成

## 阶段1 (<= v0.11.0)

通过对 kitex tool 增加参数 -thrift no_default_serdes 可以做到不生成 Apache Codec

在 <=v0.11.0 的版本里,我们还是会保持完全一样的生成行为,不做改变,给用户提供一个过渡缓冲期

如果你想主动去除 Apache Codec,减少一半的产物体积,可以参考文末的「附录」

## 阶段 2 (v0.12.0)

我们会在 kitex_gen 生成的 Apache Codec Read/Write 前添加告警和打点,给用户进行强提醒。同时我们也会协助一些重点服务进行改造。

如果你遇到了服务启动后出现 Warning Apache Codec 的字样,请参考文末附录「如何主动去除」章节,将 Apache Codec 替换为 Fast Codec,获得更高性能的编解码,同时也避免在后续版本我们去除 Apache Codec 给你带来影响。

## 阶段 3 (v0.13.0)

Kitex 将在 v1.18.0 的工具里默认不生成 Apache Codec 代码,完全走 FastCodec。

这样生成的产物体积将只占原来的一半,而且 go.mod 里完全不依赖 github.com/apache/thrift 库
- 用户影响:如果直接使用Apache Codec 做序列化,会显示缺少接口,将受到影响。(不影响 RPC 调用)
- 如果的确有需要,可以通过参数,保留 Apache Codec 生成。版本发布后我们会补充具体的操作手册
- 为了避免 Read Write 接口缺失导致的编译失败,我们也会提供一种 Apache Adaptor 来进行桥接代码生成。后续我们会发布对应使用方法。


## 怎么检查服务是否用到 Apache Codec

虽然 kitex_gen 里去掉了对 apache/thrift 的依赖,但可能你的项目里其他地方还有用到 apache/thrift 的编解码,这种方式较为低效,而且强依赖于 kitex_gen 里生成的 Apache 原生 Read、Write 等接口

Kitex 后续计划默认不再生成这些内容,对下面提到的仍然使用 apache/thrift 的场景,会有影响,建议替换为 Kitex 提供的高效的 FastCodec 编解码,具体方式如下:

### 1. 用到了 Apache/thrift 的库做编解码

如果你的代码中有下面的编解码用法:
```go
func GetThriftBinary(ctx context.Context, msg apache_thrift.TStruct) ([]byte, error) {
t := apache_thrift.NewTMemoryBufferLen(1024)
p := apache_thrift.NewTBinaryProtocolFactoryDefault().GetProtocol(t)

tser := &apache_thrift.TSerializer{
Transport: t,
Protocol: p,
}

bs, err := tser.Write(ctx, msg)
if err != nil {
return nil, err
}
return bs, nil
}

func ParseThriftBinary(msg apache_thrift.TStruct, by []byte) error {
t := apache_thrift.NewTMemoryBufferLen(1024)
p := apache_thrift.NewTBinaryProtocolFactoryDefault().GetProtocol(t)

deser := &apache_thrift.TDeserializer{Transport: t, Protocol: p}
_ = deser.Transport.Close()
err := deser.Read(msg, by)
if err != nil {
return err
}
return nil
}
```

可以使用 Kitex 提供的更高效快速的 [FastCodec](https://github.com/cloudwego/gopkg/blob/main/protocol/thrift/fastcodec.go): 来替代它

```go
// 假设 msg 是一个 kitex_gen 里的结构体,会有 FastRead 和 FastWriteNoCopy 等方法

import github.com/cloudwego/kitex/pkg/utils/fastthrift

// 编码
if msg, ok := data.(thrift.FastCodec); ok {
payload := thrift.FastMarshal(msg)
}

// 解码
if msg, ok := data.(thrift.FastCodec);ok {
// buf 是你的 thrift 编码的二进制内容
err = thrift.FastUnmarshal(buf, msg)
}
```

### 2. 确保使用 FastCodec 作为 Kitex RPC 服务的编解码

关键词搜索:WithPayloadCodec(thrift.NewThriftCodecDisableFastMode(true, true))

如果你的代码仓库里有这段内容,表示 RPC 请求关闭了 FastCodec,采用了性能较低的 Apache 原生 Codec

建议去掉这个 Option,以获得更好的编解码性能

# 附录:如何主动去除 Apache Codec
> 去除 Apache Codec 可以减少接近一半的产物体积,如果你的项目没有上文提到的少数使用情况,可以提前主动配置,不生成这部分代码。
保证你的 Kitex 版本高于 v0.11.0

1. 重新做代码生成,添加 `-thrift no_default_serdes` 参数

```shell
kitex -module xxx -thrift no_default_serdes xxx.thrift
```

这样生成的代码就没有 Apache Codec 内容,少了一大半体积

2. 开启 Skip Decoder,在 NewClient 或 NewServer 的时候添加如下参数:
```go
import (
"github.com/cloudwego/kitex/pkg/remote/codec/thrift"
"demo/kitex_gen/kitex/samples/echo/echoservice"
)

func main() {
cli := echoservice.MustNewClient("kitex.samples.echo",
client.WithPayloadCodec(thrift.NewThriftCodecWithConfig(thrift.FastRead|thrift.FastWrite|thrift.EnableSkipDecoder)),
)

srv := echoservice.NewServer(handler,
server.WithPayloadCodec(thrift.NewThriftCodecWithConfig(thrift.FrugalWrite|thrift.FrugalRead|thrift.EnableSkipDecoder)),
)
}
```

这样,当你的服务收到 Thrift Buffered 报文时,会通过 SkipDecoder + FastCodec 完成编解码,不再依赖 Apache Codec(其他场景,比如 TTHeader 或 Mesh 场景则和之前的逻辑一样不变,都是直接走 FastCodec)

0 comments on commit 8c4e0d3

Please sign in to comment.