Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resync branch #13

Closed
wants to merge 25 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
177 changes: 132 additions & 45 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,107 @@
<img src="https://github.com/user-attachments/assets/35bfded5-3f21-46b5-91f7-014f5a09fac3" width=200 />
<!-- <img src="https://github.com/user-attachments/assets/35bfded5-3f21-46b5-91f7-014f5a09fac3" width=200 /> -->

<img src="https://github.com/user-attachments/assets/46a5c546-7e9b-42c7-87f4-bc8defe674e0" width=250 />


# DuckDB HTTP Server Extension
This very experimental extension spawns an HTTP Server from within DuckDB serving query requests.<br>
The extension goal is to replace the functionality currently offered by [Quackpipe](https://github.com/metrico/quackpipe)
This extension transforms **DuckDB** instances into tiny multi-player **HTTP OLAP API** services.<br>
Supports Authentication _(Basic Auth or X-Token)_ and includes the _play_ SQL user interface.

The extension goal is to replace the functionality currently offered by [quackpipe](https://github.com/metrico/quackpipe)

### Features

![image](https://github.com/user-attachments/assets/180fdcec-2cae-4909-a7a2-28333cd7dd44)
- Turn any [DuckDB](https://duckdb.org) instance into an **HTTP OLAP API** Server
- Use the embedded **Play User Interface** to query and visualize data
- Pair with [chsql](https://community-extensions.duckdb.org/extensions/chsql.html) extension for **ClickHouse flavoured SQL**
- Work with local and remote datasets including [MotherDuck](https://motherduck.com) 🐤
- _100% Opensource, ready to use and extend by the Community!_

<br>

![image](https://github.com/user-attachments/assets/e930a8d2-b3e4-454e-ba12-e5e91b30bfbe)

#### Extension Functions
- `httpserve_start(host, port)`
- `httpserve_stop()`
- `httpserve_start(host, port, auth)`: starts the server using provided parameters
- `httpserve_stop()`: stops the server thread

#### API Endpoints
- `/` `GET`, `POST`
- `default_format`: Supports `JSONEachRow` or `JSONCompact`
- `query`: Supports DuckDB SQL queries
- `/ping` `GET`
#### Notes

### Installation
🛑 Run DuckDB in `-readonly` mode for enhanced security

<br>

### 📦 [Installation](https://community-extensions.duckdb.org/extensions/httpserver.html)
```sql
INSTALL httpserver FROM community;
LOAD httpserver;
```

### Usage
Start the HTTP server providing the `host` and `port` parameters
### 🔌 Usage
Start the HTTP server providing the `host`, `port` and `auth` parameters.<br>
> * If you want no authentication, just pass an empty string as parameter.<br>
> * If you want the API run in foreground set `DUCKDB_HTTPSERVER_FOREGROUND=1`

#### Basic Auth
```sql
D SELECT httpserve_start('0.0.0.0',9999);
┌─────────────────────────────────────┐
│ httpserve_start('0.0.0.0', 9999) │
│ varchar │
├─────────────────────────────────────┤
│ HTTP server started on 0.0.0.0:9999 │
└─────────────────────────────────────┘
D SELECT httpserve_start('localhost', 9999, 'user:pass');

┌───────────────────────────────────────────────┐
│ httpserve_start('0.0.0.0', 9999, 'user:pass') │
│ varchar │
├───────────────────────────────────────────────┤
│ HTTP server started on 0.0.0.0:9999 │
└───────────────────────────────────────────────┘
```
```bash
curl -X POST -d "SELECT 'hello', version()" "http://user:pass@localhost:9999/"
```

#### Token Auth
```sql
SELECT httpserve_start('localhost', 9999, 'supersecretkey');

┌───────────────────────────────────────────────┐
│ httpserve_start('0.0.0.0', 9999, 'secretkey') │
│ varchar │
├───────────────────────────────────────────────┤
│ HTTP server started on 0.0.0.0:9999 │
└───────────────────────────────────────────────┘
```

Query your endpoint using the `X-API-Key` token:

```bash
curl -X POST --header "X-API-Key: secretkey" -d "SELECT 'hello', version()" "http://localhost:9999/"
```

#### QUERY UI
You can perform the same action from DuckDB using HTTP `extra_http_headers`:

```sql
D CREATE SECRET extra_http_headers (
TYPE HTTP,
EXTRA_HTTP_HEADERS MAP{
'X-API-Key': 'secret'
}
);

D SELECT * FROM duck_flock('SELECT version()', ['http://localhost:9999']);
┌─────────────┐
│ "version"() │
│ varchar │
├─────────────┤
│ v1.1.1 │
└─────────────┘
```



#### 👉 QUERY UI
Browse to your endpoint and use the built-in quackplay interface _(experimental)_

![image](https://github.com/user-attachments/assets/0ee751d0-7360-4d3d-949d-3fb930634ebd)

#### QUERY API
#### 👉 QUERY API
Query your API endpoint using curl GET/POST requests

```bash
Expand Down Expand Up @@ -72,7 +134,9 @@ curl -X POST -d "SELECT 'hello', version()" "http://localhost:9999/?default_form
}
```

You can also have DuckDB instances query each other using `NDJSON`
#### 👉 CROSS-OVER EXAMPLES

You can now have DuckDB instances query each other and... _themselves!_

```sql
D LOAD json;
Expand All @@ -93,30 +157,53 @@ D SELECT * FROM read_json_auto('http://localhost:9999/?q=SELECT version()');
└─────────────┘
```

#### Flock Macro by @carlopi
Check out this flocking macro from fellow _Italo-Amsterdammer_ @carlopi @ DuckDB Labs

![image](https://github.com/user-attachments/assets/b409ec0e-86e0-4a8d-822c-377ddbae524d)

* a DuckDB CLI, running httpserver extension
* a DuckDB from Python, running httpserver extension
* a DuckDB from the Web, querying all 3 DuckDB at the same time

<br>

<hr>

<br>

### Build steps
Now to build the extension, run:
```sh
make
```
The main binaries that will be built are:
```sh
./build/release/duckdb
./build/release/test/unittest
./build/release/extension/<extension_name>/<extension_name>.duckdb_extension
```
- `duckdb` is the binary for the duckdb shell with the extension code automatically loaded.
- `unittest` is the test runner of duckdb. Again, the extension is already linked into the binary.
- `<extension_name>.duckdb_extension` is the loadable binary as it would be distributed.
### API Documentation

## Running the extension
To run the extension code, simply start the shell with `./build/release/duckdb`. This shell will have the extension pre-loaded.
#### Endpoints Overview

## Running the tests
Different tests can be created for DuckDB extensions. The primary way of testing DuckDB extensions should be the SQL tests in `./test/sql`. These SQL tests can be run using:
```sh
make test
```
| Endpoint | Methods | Description |
|----------|---------|-------------|
| `/` | GET, POST | Query API endpoint |
| `/ping` | GET | Health check endpoint |

#### Detailed Endpoint Specifications

##### Query API

**Methods:** `GET`, `POST`

**Parameters:**

| Parameter | Description | Supported Values |
|-----------|-------------|-------------------|
| `default_format` | Specifies the output format | `JSONEachRow`, `JSONCompact` |
| `query` | The DuckDB SQL query to execute | Any valid DuckDB SQL query |

##### Notes

- Ensure that your queries are properly formatted and escaped when sending them as part of the request.
- The root endpoint (`/`) supports both GET and POST methods, but POST is recommended for complex queries or when the query length exceeds URL length limitations.
- Always specify the `default_format` parameter to ensure consistent output formatting.

<br>

##### :black_joker: Disclaimers

[^1]: DuckDB ® is a trademark of DuckDB Foundation. All rights reserved by their respective owners. [^1]
[^2]: ClickHouse ® is a trademark of ClickHouse Inc. No direct affiliation or endorsement. [^2]
[^3]: Released under the MIT license. See LICENSE for details. All rights reserved by their respective owners. [^3]
95 changes: 90 additions & 5 deletions src/httpserver_extension.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ struct HttpServerState {
std::atomic<bool> is_running;
DatabaseInstance* db_instance;
unique_ptr<Allocator> allocator;
std::string auth_token;

HttpServerState() : is_running(false), db_instance(nullptr) {}
};
Expand Down Expand Up @@ -131,6 +132,51 @@ static std::string ConvertResultToJSON(MaterializedQueryResult &result, ReqStats
return json_output;
}

// New: Base64 decoding function
std::string base64_decode(const std::string &in) {
std::string out;
std::vector<int> T(256, -1);
for (int i = 0; i < 64; i++)
T["ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"[i]] = i;

int val = 0, valb = -8;
for (unsigned char c : in) {
if (T[c] == -1) break;
val = (val << 6) + T[c];
valb += 6;
if (valb >= 0) {
out.push_back(char((val >> valb) & 0xFF));
valb -= 8;
}
}
return out;
}

// Auth Check
bool IsAuthenticated(const duckdb_httplib_openssl::Request& req) {
if (global_state.auth_token.empty()) {
return true; // No authentication required if no token is set
}

// Check for X-API-Key header
auto api_key = req.get_header_value("X-API-Key");
if (!api_key.empty() && api_key == global_state.auth_token) {
return true;
}

// Check for Basic Auth
auto auth = req.get_header_value("Authorization");
if (!auth.empty() && auth.compare(0, 6, "Basic ") == 0) {
std::string decoded_auth = base64_decode(auth.substr(6));
if (decoded_auth == global_state.auth_token) {
return true;
}
}

return false;
}


// Convert the query result to NDJSON (JSONEachRow) format
static std::string ConvertResultToNDJSON(MaterializedQueryResult &result) {
std::string ndjson_output;
Expand Down Expand Up @@ -210,6 +256,13 @@ static void HandleQuery(const string& query, duckdb_httplib_openssl::Response& r
void HandleHttpRequest(const duckdb_httplib_openssl::Request& req, duckdb_httplib_openssl::Response& res) {
std::string query;

// Check authentication
if (!IsAuthenticated(req)) {
res.status = 401;
res.set_content("Unauthorized", "text/plain");
return;
}

// CORS allow
res.set_header("Access-Control-Allow-Origin", "*");
res.set_header("Access-Control-Allow-Methods", "GET, POST, OPTIONS, PUT");
Expand Down Expand Up @@ -297,14 +350,15 @@ void HandleHttpRequest(const duckdb_httplib_openssl::Request& req, duckdb_httpli
}
}

void HttpServerStart(DatabaseInstance& db, string_t host, int32_t port) {
void HttpServerStart(DatabaseInstance& db, string_t host, int32_t port, string_t auth = string_t()) {
if (global_state.is_running) {
throw IOException("HTTP server is already running");
}

global_state.db_instance = &db;
global_state.server = make_uniq<duckdb_httplib_openssl::Server>();
global_state.is_running = true;
global_state.auth_token = auth.GetString();

// CORS Preflight
global_state.server->Options("/",
Expand All @@ -331,12 +385,41 @@ void HttpServerStart(DatabaseInstance& db, string_t host, int32_t port) {
});

string host_str = host.GetString();
global_state.server_thread = make_uniq<std::thread>([host_str, port]() {

const char* run_in_same_thread_env = std::getenv("DUCKDB_HTTPSERVER_FOREGROUND");
bool run_in_same_thread = (run_in_same_thread_env != nullptr && std::string(run_in_same_thread_env) == "1");

if (run_in_same_thread) {
#ifdef _WIN32
throw IOException("Foreground mode not yet supported on WIN32 platforms.");
#else
// POSIX signal handler for SIGINT (Linux/macOS)
signal(SIGINT, [](int) {
if (global_state.server) {
global_state.server->stop();
}
global_state.is_running = false; // Update the running state
});

// Run the server in the same thread
if (!global_state.server->listen(host_str.c_str(), port)) {
global_state.is_running = false;
throw IOException("Failed to start HTTP server on " + host_str + ":" + std::to_string(port));
}
});
#endif

// The server has stopped (due to CTRL-C or other reasons)
global_state.is_running = false;
} else {
// Run the server in a dedicated thread (default)
global_state.server_thread = make_uniq<std::thread>([host_str, port]() {
if (!global_state.server->listen(host_str.c_str(), port)) {
global_state.is_running = false;
throw IOException("Failed to start HTTP server on " + host_str + ":" + std::to_string(port));
}
});
}

}

void HttpServerStop() {
Expand All @@ -361,17 +444,19 @@ static void HttpServerCleanup() {

static void LoadInternal(DatabaseInstance &instance) {
auto httpserve_start = ScalarFunction("httpserve_start",
{LogicalType::VARCHAR, LogicalType::INTEGER},
{LogicalType::VARCHAR, LogicalType::INTEGER, LogicalType::VARCHAR},
LogicalType::VARCHAR,
[&](DataChunk &args, ExpressionState &state, Vector &result) {
auto &host_vector = args.data[0];
auto &port_vector = args.data[1];
auto &auth_vector = args.data[2];

UnaryExecutor::Execute<string_t, string_t>(
host_vector, result, args.size(),
[&](string_t host) {
auto port = ((int32_t*)port_vector.GetData())[0];
HttpServerStart(instance, host, port);
auto auth = ((string_t*)auth_vector.GetData())[0];
HttpServerStart(instance, host, port, auth);
return StringVector::AddString(result, "HTTP server started on " + host.GetString() + ":" + std::to_string(port));
});
});
Expand Down
Loading