Tao Wu | Product Manager
One and a half years ago, in April 2022, we open-sourced RisingWave, the distributed SQL streaming database. A quarter ago, in July 2023, we released the first official version of RisingWave, RisingWave 1.0, a battle-tested system that can be used in production. More recently, RisingWave 1.3 has been released
As an open-source streaming database released under Apache 2.0 license, the development team behind RisingWave actively collects feedback from users and strives to democratize stream processing: to make it simple, affordable, and accessible.
As a system that has been deployed in production in dozens of enterprises and fast-growing startups, how will RisingWave evolve? We plan to make it transparent and periodically update our roadmap. Here’s what you can anticipate in the future release of RisingWave.
Note that the roadmap is not final, and we will frequently update our roadmap to reflect the item priority to better serve users.
Short-term goals (within the next 3 months)
Adaptive Scaling
Implement adaptive scaling to automatically adjust materialized view parallelism based on the number of CPU cores in the cluster.Improvements to the Existing External Sinks
Optimize performance and improve stability of supported external sinks like Doris, Clickhouse, and Elasticsearch. We’ll also expand supported encoding formats for Kafka sink, including Protobuf, Avro, and the support for Schema Registry.Iceberg Sink V2
We recently introduced a native integration with Iceberg, which is no longer based on the official Java library. It’s fully rewritten by Rust for performance and stability. We plan to stabilize it in the next few months.Enhanced Observability
Expand system tables and add metrics for stateful operators to provide greater visibility into system health and performance.Improved Open-source Web UI
Enhance RisingWave's open-source web UI with additional system information and monitoring capabilities.Sink into table
Users may want to dynamically union the results of multiple views into a single table. For example, a view may correspond to a department in a company while there can be new departments once in a while. With this feature, users can seamlessly merge data from new views as they are added.CDC Connection Sharing
RisingWave currently creates one CDC connection per table. Each connection will individually consume the replication logs, which consist of transactions not only to the source table but also to other tables in the same database. Therefore, multiple connections will lead to the duplicate consumption and a heavy load on the upstream database. Shared CDC connections can thus reduce the load and improve the stability of the CDC.Recoverable
CREATE MATERIALIZED VIEW
Persist materialized view progress to allow recovering from failures without losing work already completed.CDC Transaction Atomicity
CDC transactions in RisingWave currently apply by events, which may contain only partial content in a transaction. With the new feature, RisingWave will buffer all CDC events within a transaction until it can be fully applied atomically.Parallel CDC Snapshot Loading
Introduce parallelism during CDC snapshot loading to improve the user experience for large upstream tables.
Mid-term goals (within the next 6 months)
SSL/TLS Secured Connection
Implement SSL/TLS encryption for client/server communications to enhance security.Alter Materialized View
Add the ability to modify existing materialized views.Session Window
Introduce session window functionality for advanced streaming analytics.MemTable Spill
A refresh to a small table could suddenly cause 1k times amplification on write throughput. Such a case typically happens when there is a 10+ way join. A way to mitigate this is to use the local disk as a buffer for the flooded writes, thus avoiding OOM.Dedicated Computes for Materialized View Creation
Some users complained that RisingWave’s materialized view creation is too slow, as it requires a resource-intensive ad-hoc computation. On the other hand, since the streaming (incremental computations) is long-running, it requires fewer resources at the same time. As a result, it’s possible to allocate dedicated resources for MV creation separately when needed and deallocate them once finished.More External Sinks
Redshift Sink and Snowflake Sink are in the plan.Recursive CTE
Enable recursive common table expressions (CTE) to traverse hierarchical data like the organizational tree in a company.Shared Meta Plane
Enable RisingWave clusters to share the meta plane, including Etcd (or Postgres in the future), to better utilize compute resources across clusters.
Long-term goals
Optimize analytical query performance on third-party systems like Presto and Trino
GraphQL API
To allow retrieving results from RisingWave directly through the browser.Serverless Compaction
Automatically scale Compactor instances in and out to match workload demands in a serverless model.
CONCLUSION
RisingWave is an open-source streaming database aiming at democratizing stream processing: to make stream processing ease of use and cost-efficient. Its development direction is highly influenced by user requests. We would love to hear from the community and update our agenda accordingly. If you have any questions or comments regarding RisingWave’s roadmap, please don’t hesitate to let us know by commenting here. Your voice will help shape the future of real-time stream processing!
About RisingWave Labs
RisingWave is an open-source distributed SQL database for stream processing. It is designed to reduce the complexity and cost of building real-time applications. RisingWave offers users a PostgreSQL-like experience specifically tailored for distributed stream processing.
Official Website: https://www.risingwave.com/
Documentation: https://docs.risingwave.com/docs/current/intro/
Tutorial:https://tutorials.risingwave.com/
Slack:https://risingwave-community.slack.com
GitHub:https://github.com/risingwavelabs/risingwave
LinkedIn:linkedin.com/company/risingwave-labs