Skip to content

Commit

Permalink
New Result Serialization (#26)
Browse files Browse the repository at this point in the history
* added result serializer

* fixed some bugs for exotic types

* use internal function for type names

* added missing cmath include

* Added e2e tests

* Cleanup

* Moved up clang-format and tidy config so IDE and make format picks up on the right formatting rules

* Moved most logic into new compact serializer

* Added some docs for contributing

* Ignore pycache

* Cleanup

---------

Co-authored-by: Niclas Haderer <[email protected]>
  • Loading branch information
gropaul and NiclasHaderer authored Dec 23, 2024
1 parent bd381c1 commit 8015a57
Show file tree
Hide file tree
Showing 18 changed files with 784 additions and 125 deletions.
1 change: 1 addition & 0 deletions .clang-format
1 change: 1 addition & 0 deletions .clang-tidy
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,5 @@ duckdb_unittest_tempdir/
testext
test/python/__pycache__/
.Rhistory
__pycache__
venv
25 changes: 11 additions & 14 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,20 @@ set(EXTENSION_NAME ${TARGET_NAME}_extension)
set(LOADABLE_EXTENSION_NAME ${TARGET_NAME}_loadable_extension)

project(${TARGET_NAME})
include_directories(
src/include
${CMAKE_CURRENT_BINARY_DIR}
duckdb/third_party/httplib
duckdb/parquet/include
)
include_directories(src/include ${CMAKE_CURRENT_BINARY_DIR}
duckdb/third_party/httplib duckdb/parquet/include)

# Embed ./src/assets/index.html as a C++ header
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/playground.hpp
COMMAND ${CMAKE_COMMAND} -P ${PROJECT_SOURCE_DIR}/embed.cmake ${PROJECT_SOURCE_DIR}/src/assets/index.html ${CMAKE_CURRENT_BINARY_DIR}/playground.hpp playgroundContent
DEPENDS ${PROJECT_SOURCE_DIR}/src/assets/index.html
)
COMMAND
${CMAKE_COMMAND} -P ${PROJECT_SOURCE_DIR}/embed.cmake
${PROJECT_SOURCE_DIR}/src/assets/index.html
${CMAKE_CURRENT_BINARY_DIR}/playground.hpp playgroundContent
DEPENDS ${PROJECT_SOURCE_DIR}/src/assets/index.html)

set(EXTENSION_SOURCES
src/httpserver_extension.cpp
${CMAKE_CURRENT_BINARY_DIR}/playground.hpp
)
set(EXTENSION_SOURCES src/httpserver_extension.cpp src/result_serializer.cpp
${CMAKE_CURRENT_BINARY_DIR}/playground.hpp)

if(MINGW)
set(OPENSSL_USE_STATIC_LIBS TRUE)
Expand All @@ -36,7 +32,8 @@ build_static_extension(${TARGET_NAME} ${EXTENSION_SOURCES})
build_loadable_extension(${TARGET_NAME} " " ${EXTENSION_SOURCES})

include_directories(${OPENSSL_INCLUDE_DIR})
target_link_libraries(${LOADABLE_EXTENSION_NAME} duckdb_mbedtls ${OPENSSL_LIBRARIES})
target_link_libraries(${LOADABLE_EXTENSION_NAME} duckdb_mbedtls
${OPENSSL_LIBRARIES})
target_link_libraries(${EXTENSION_NAME} duckdb_mbedtls ${OPENSSL_LIBRARIES})

if(MINGW)
Expand Down
37 changes: 37 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,43 @@ Check out this flocking macro from fellow _Italo-Amsterdammer_ @carlopi @ DuckDB
<br>
## Development
### Cloning the Repository
Clone the repository and all its submodules
```bash
git clone <your-fork-url>
git submodule update --init --recursive
```
### Setting up CLion
**Opening project:**
Configuring CLion with the extension template requires a little work. Firstly, make sure that the DuckDB submodule is available.
Then make sure to open `./duckdb/CMakeLists.txt` (so not the top level `CMakeLists.txt` file from this repo) as a project in CLion.
Now to fix your project path go to `tools->CMake->Change Project Root`([docs](https://www.jetbrains.com/help/clion/change-project-root-directory.html)) to set the project root to the root dir of this repo.
**Debugging:**
To set up debugging in CLion, there are two simple steps required. Firstly, in `CLion -> Settings / Preferences -> Build, Execution, Deploy -> CMake` you will need to add the desired builds (e.g. Debug, Release, RelDebug, etc). There's different ways to configure this, but the easiest is to leave all empty, except the `build path`, which needs to be set to `../build/{build type}`. Now on a clean repository you will first need to run `make {build type}` to initialize the CMake build directory. After running make, you will be able to (re)build from CLion by using the build target we just created. If you use the CLion editor, you can create a CLion CMake profiles matching the CMake variables that are described in the makefile, and then you don't need to invoke the Makefile.
The second step is to configure the unittest runner as a run/debug configuration. To do this, go to `Run -> Edit Configurations` and click `+ -> Cmake Application`. The target and executable should be `unittest`. This will run all the DuckDB tests. To specify only running the extension specific tests, add `--test-dir ../../.. [sql]` to the `Program Arguments`. Note that it is recommended to use the `unittest` executable for testing/development within CLion. The actual DuckDB CLI currently does not reliably work as a run target in CLion.
### Testing
To run the E2E test install all packages necessary:
```bash
pip install -r requirements.txt
```
Then run the test suite:
```bash
pytest pytest test_http_api
```
##### :black_joker: Disclaimers
[^1]: DuckDB ® is a trademark of DuckDB Foundation. All rights reserved by their respective owners. [^1]
Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
httpx==0.28.1
pytest==8.3.4
127 changes: 17 additions & 110 deletions src/httpserver_extension.cpp
Original file line number Diff line number Diff line change
@@ -1,32 +1,30 @@
#define DUCKDB_EXTENSION_MAIN
#define CPPHTTPLIB_OPENSSL_SUPPORT

#include <chrono>
#include <cstdlib>
#include <thread>
#include "httpserver_extension.hpp"
#include "query_stats.hpp"
#include "duckdb.hpp"
#include "duckdb/common/exception.hpp"
#include "duckdb/common/string_util.hpp"
#include "duckdb/function/scalar_function.hpp"
#include "duckdb/main/extension_util.hpp"
#include "duckdb/common/atomic.hpp"
#include "duckdb/common/exception/http_exception.hpp"
#include "duckdb/common/allocator.hpp"
#include <chrono>
#include <thread>
#include <memory>
#include <cstdlib>

#ifndef _WIN32
#include <syslog.h>
#endif

#define CPPHTTPLIB_OPENSSL_SUPPORT
#include "result_serializer.hpp"
#include "result_serializer_compact_json.hpp"
#include "httplib.hpp"
#include "yyjson.hpp"

#include "playground.hpp"

using namespace duckdb_yyjson; // NOLINT
#ifndef _WIN32
#include <syslog.h>
#endif

namespace duckdb {

using namespace duckdb_yyjson; // NOLINT(*-build-using-namespace)

struct HttpServerState {
std::unique_ptr<duckdb_httplib_openssl::Server> server;
std::unique_ptr<std::thread> server_thread;
Expand All @@ -40,98 +38,6 @@ struct HttpServerState {

static HttpServerState global_state;

std::string GetColumnType(MaterializedQueryResult &result, idx_t column) {
if (result.RowCount() == 0) {
return "String";
}
switch (result.types[column].id()) {
case LogicalTypeId::FLOAT:
return "Float";
case LogicalTypeId::DOUBLE:
return "Double";
case LogicalTypeId::INTEGER:
return "Int32";
case LogicalTypeId::BIGINT:
return "Int64";
case LogicalTypeId::UINTEGER:
return "UInt32";
case LogicalTypeId::UBIGINT:
return "UInt64";
case LogicalTypeId::VARCHAR:
return "String";
case LogicalTypeId::TIME:
return "DateTime";
case LogicalTypeId::DATE:
return "Date";
case LogicalTypeId::TIMESTAMP:
return "DateTime";
case LogicalTypeId::BOOLEAN:
return "Int8";
default:
return "String";
}
return "String";
}

struct ReqStats {
float elapsed_sec;
int64_t read_bytes;
int64_t read_rows;
};

// Convert the query result to JSON format
static std::string ConvertResultToJSON(MaterializedQueryResult &result, ReqStats &req_stats) {
auto doc = yyjson_mut_doc_new(nullptr);
auto root = yyjson_mut_obj(doc);
yyjson_mut_doc_set_root(doc, root);
// Add meta information
auto meta_array = yyjson_mut_arr(doc);
for (idx_t col = 0; col < result.ColumnCount(); ++col) {
auto column_obj = yyjson_mut_obj(doc);
yyjson_mut_obj_add_str(doc, column_obj, "name", result.ColumnName(col).c_str());
yyjson_mut_arr_append(meta_array, column_obj);
std::string tp(GetColumnType(result, col));
yyjson_mut_obj_add_strcpy(doc, column_obj, "type", tp.c_str());
}
yyjson_mut_obj_add_val(doc, root, "meta", meta_array);

// Add data
auto data_array = yyjson_mut_arr(doc);
for (idx_t row = 0; row < result.RowCount(); ++row) {
auto row_array = yyjson_mut_arr(doc);
for (idx_t col = 0; col < result.ColumnCount(); ++col) {
Value value = result.GetValue(col, row);
if (value.IsNull()) {
yyjson_mut_arr_append(row_array, yyjson_mut_null(doc));
} else {
std::string value_str = value.ToString();
yyjson_mut_arr_append(row_array, yyjson_mut_strncpy(doc, value_str.c_str(), value_str.length()));
}
}
yyjson_mut_arr_append(data_array, row_array);
}
yyjson_mut_obj_add_val(doc, root, "data", data_array);

// Add row count
yyjson_mut_obj_add_int(doc, root, "rows", result.RowCount());
//"statistics":{"elapsed":0.00031403,"rows_read":1,"bytes_read":0}}
auto stat_obj = yyjson_mut_obj_add_obj(doc, root, "statistics");
yyjson_mut_obj_add_real(doc, stat_obj, "elapsed", req_stats.elapsed_sec);
yyjson_mut_obj_add_int(doc, stat_obj, "rows_read", req_stats.read_rows);
yyjson_mut_obj_add_int(doc, stat_obj, "bytes_read", req_stats.read_bytes);
// Write to string
auto data = yyjson_mut_write(doc, 0, nullptr);
if (!data) {
yyjson_mut_doc_free(doc);
throw InternalException("Failed to render the result as JSON, yyjson failed");
}

std::string json_output(data);
free(data);
yyjson_mut_doc_free(doc);
return json_output;
}

// New: Base64 decoding function
std::string base64_decode(const std::string &in) {
std::string out;
Expand Down Expand Up @@ -300,7 +206,8 @@ void HandleHttpRequest(const duckdb_httplib_openssl::Request& req, duckdb_httpli
std::string json_output = ConvertResultToNDJSON(*result);
res.set_content(json_output, "application/x-ndjson");
} else if (format == "JSONCompact") {
std::string json_output = ConvertResultToJSON(*result, stats);
ResultSerializerCompactJson serializer;
std::string json_output = serializer.Serialize(*result, stats);
res.set_content(json_output, "application/json");
} else {
// Default to NDJSON for DuckDB's own queries
Expand All @@ -325,9 +232,9 @@ void HttpServerStart(DatabaseInstance& db, string_t host, int32_t port, string_t
global_state.is_running = true;
global_state.auth_token = auth.GetString();

// Custom basepath, defaults to root /
// Custom basepath, defaults to root /
const char* base_path_env = std::getenv("DUCKDB_HTTPSERVER_BASEPATH");
std::string base_path = "/";
std::string base_path = "/";

if (base_path_env && base_path_env[0] == '/' && strlen(base_path_env) > 1) {
base_path = std::string(base_path_env);
Expand Down
1 change: 0 additions & 1 deletion src/include/httpserver_extension.hpp
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
#pragma once

#include "duckdb.hpp"
#include "duckdb/common/file_system.hpp"

namespace duckdb {

Expand Down
12 changes: 12 additions & 0 deletions src/include/query_stats.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#pragma once
#include <cstdint>

namespace duckdb {

struct ReqStats {
float elapsed_sec;
uint64_t read_bytes;
uint64_t read_rows;
};

} // namespace duckdb
46 changes: 46 additions & 0 deletions src/include/result_serializer.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#pragma once

#include "duckdb/main/query_result.hpp"
#include "yyjson.hpp"

namespace duckdb {
using namespace duckdb_yyjson; // NOLINT(*-build-using-namespace)

class ResultSerializer {
public:
explicit ResultSerializer(const bool _set_invalid_values_to_null = false)
: set_invalid_values_to_null(_set_invalid_values_to_null) {
doc = yyjson_mut_doc_new(nullptr);
}

virtual ~ResultSerializer() {
yyjson_mut_doc_free(doc);
}

std::string YY_ToString() {
auto data = yyjson_mut_write(doc, 0, nullptr);
if (!data) {
throw SerializationException("Could not render yyjson document");
}
std::string json_output(data);
free(data);
return json_output;
}

protected:
void SerializeInternal(QueryResult &query_result, yyjson_mut_val *append_root, bool values_as_array);

void SerializeChunk(const DataChunk &chunk, vector<string> &names, vector<LogicalType> &types,
yyjson_mut_val *append_root, bool values_as_array);

yyjson_mut_val *SerializeRowAsArray(const DataChunk &chunk, idx_t row_idx, vector<LogicalType> &types);

yyjson_mut_val *SerializeRowAsObject(const DataChunk &chunk, idx_t row_idx, vector<string> &names,
vector<LogicalType> &types);

void SerializeValue(yyjson_mut_val *parent, const Value &value, optional_ptr<string> name, const LogicalType &type);

yyjson_mut_doc *doc;
bool set_invalid_values_to_null;
};
} // namespace duckdb
Loading

0 comments on commit 8015a57

Please sign in to comment.