The Deephaven Core Roadmap 1H24

Our roadmap for the first half of 2024 is centered around Deephaven’s core usage patterns as (1) a UI framework for live data (and batch data, too), (2) a versatile query engine for live workloads in Python and Java, and (3) a live data pipeline utility.

Annotations:

Mark	Status
*	work not yet started
🏃	work in progress
✅	work completed
💪	stretch goal
💡	needs research
🟡	particularly important

Project organization

We intend to release new versions of the project at the end of each month. At any time, deliveries intended for the subsequent two months should be found in the appropriate GitHub Milestone, respectively.

Themes

Work will fall into the following categories:

UI UX framework capabilities. (To be delivered in this deephaven-core project, as well as web-client-ui, plugins, ipywidgets and other complementary projects.)
Live ingestion.
Data lake interoperability.
Engine capabilities, Python interoperability.
Client APIs and the “Barrage” wire protocol (as found in this repo and github/barrage).

UI UX framework

Live ingestion

🟡 Packaging of powerful, general-purpose abstractions for ingesting live data from a variety of sources.
🏃 Live mapping of 1-to-N and streaming data with nested formats into live Deephaven tables.
🏃 Improved Kafka support: Payload coverage and performance.
🏃 Improved JSON capabilities:
- Support metadata and common _metadata files in Parquet (-- this supports adding partitions to Parquet).
- 💪 💡 Integrate with Iceberg’s dynamic capabilities.

Data lake interoperability

🟡 🏃 Efficient reading of Parquet files from S3.
- 🟡 🏃 Continued Parquet feature coverage; predicate pushdown.
- 🟡 💡 Iceberg integration.
  - Metadata management.
  - Data cataloging and discovery.

Engine capabilities & Python interoperability

🟡 🏃 Multi-key indexes for batch and live data.
🏃 A multi-dimensional integration with NumPy.
Data exhaust utilities:
- More elegant publishing of cell-, array-, and chunk-data to client applications.
- General-purpose abstraction for streaming egress.
🟡 Formula decomposition and improved parsing to accelerate the processing of UDFs.
🟡 💡 Battle-hardening of the deephaven.learn library to support bidirectional support of live (real-time, updating) arrays into and out of NumPy, Torch, Tensor Flow, and RAG-related libraries.
Increased concurrency for more sophisticated uses of Deephaven’s select() and where() operations.

Deephaven's wire format & client APIs

The JavaScript API:
- Refactor the JS API’s Barrage subscription.
- Better documentation of the JS API.
- More examples of JS applications.
Greater coverage of operating systems for the C++ API.

Leading possibilities for the remainder of 2024

Support for custom data types.
💡 Direct-from-CDC connector (without a Debezium translation).
Support for multi-dimensional arrays for Python data science and Gen-AI-integrated use cases.
💡 Integration with RAG-frameworks to deliver Deephaven’s live arrays into best-of-breed, enterprise-grade “LLM-meets-proprietary-data-sets” solution.

The Deephaven team is excited by the strategic and tactical initiatives underway. Please help us imagine, design, and prioritize work by tracking our GitHub milestones, filing issues, and communicating with us on Slack.

Deephaven’s live dataframes offer unique, broad, and exciting capabilities at the intersection of modern data-driven workloads, analytics, and applications – particularly those driven by live, real-time data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Deephaven Core Roadmap 1H24

Project organization

Themes

UI UX framework

Live ingestion

Data lake interoperability

Engine capabilities & Python interoperability

Deephaven's wire format & client APIs

Leading possibilities for the remainder of 2024

Project Management

Help, Issues, Communication

Contributing

Clone this wiki locally