You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Configurable reorg buffer
- Create table ahead of spinning up parallel workers to ensure it's ready for all of them and avoid complexity of thread locking
- SQL variables for string replacement
- Better docs, including limitations
Copy file name to clipboardExpand all lines: docs/parallel_streaming_usage.md
+27-4Lines changed: 27 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -233,7 +233,23 @@ except KeyboardInterrupt:
233
233
print("\nStopped by user")
234
234
```
235
235
236
-
**Note on Reorg Buffer**: When transitioning from parallel catchup to continuous streaming, the system automatically starts continuous streaming from `detected_max_block - 200`. This 200-block overlap ensures that any reorgs that occurred during the parallel catchup phase are detected and handled properly. With reorg detection enabled, duplicate blocks are automatically handled correctly.
236
+
**Note on Reorg Buffer**: When transitioning from parallel catchup to continuous streaming, the system automatically starts continuous streaming from `detected_max_block - reorg_buffer` (default: 200 blocks). This overlap ensures that any reorgs that occurred during the parallel catchup phase are detected and handled properly. With reorg detection enabled, duplicate blocks are automatically handled correctly. The `reorg_buffer` can be customized via `ParallelConfig(reorg_buffer=N)`.
237
+
238
+
## Limitations
239
+
240
+
Currently, parallel streaming has the following limitations:
241
+
242
+
1.**Block-based partitioning only**: Only supports partitioning by block number columns (`block_num` or `_block_num`). Tables without block numbers cannot use parallel execution.
243
+
244
+
2.**Schema detection requires data**: Pre-flight schema detection requires at least 1 row in the source table. Empty tables will skip pre-flight creation and let workers handle it.
245
+
246
+
3.**Static partitioning**: Partitions are created upfront based on the block range. The system does not support dynamic repartitioning during execution.
247
+
248
+
4.**Thread-level parallelism**: Uses Python threads (ThreadPoolExecutor), not processes. For CPU-bound transformations, performance may be limited by the GIL.
249
+
250
+
5.**Single table queries**: The partitioning strategy works best with queries against a single table. Complex joins or unions may require careful query structuring.
251
+
252
+
6.**Reorg buffer configuration**: The `reorg_buffer` parameter (default: 200 blocks) is configurable but applies uniformly. Per-chain customization requires separate `ParallelConfig` instances.
237
253
238
254
## Performance Characteristics
239
255
@@ -301,16 +317,23 @@ Result: Zero data gaps, all reorgs caught ✓
0 commit comments