Skip to content

Conversation

@MarcusHsieh
Copy link
Collaborator

No description provided.

Marcus Jake Hsieh and others added 30 commits July 8, 2025 19:23
+ Performance limit tests
+ Pytest performance markers
bgunnar5 and others added 6 commits August 7, 2025 13:24
TWO LAYERS:
1. Task Distribution Layer (TaskServerInterface)
2. Worker Management Layer (MerlinWorkerHandler)
Integration Components
- Kafka worker extended by worker interface
- Handler for managing Kafka workers
- WorkerHandlerFactory to register and support Kafka backend
- --backend flag
- Replace direct Celery function calls with script references
- Context-free execution enabling any message broker backend
- Message size optimization: 75% reduction (427-442 bytes vs 1,715 bytes)
- All messages under 1MB target for efficient network transmission

- Add TaskScriptGenerator class for converting steps to executable scripts
- Add MessageOptimizer for compact task message generation
- Rename kafka_worker.py to kafka_worker_manager.py
- Rename implementations/kafka_worker.py to kafka_task_consumer.py
…factory system

Core Components:
- Universal Task Factory (merlin/factories/) - Backend-agnostic task creation
- Signature Adapters (merlin/adapters/) - Celery & Kafka backend support
- Sample Expansion Optimization (merlin/optimization/) - Range-based batching
- Compressed JSON Serialization (merlin/serialization/) - 63-85% compression

Features:
- 1,123,274 tasks/sec creation rate
- 63-85% message size reduction via gzip+field optimization
- Complete coordination patterns (group: group_id + partitions, chain: task dependencies, chord: group_id + callback tasks)
- Pytest suite (+5 test files)

Performance Results:
- Simple tasks: 433B -> 157B (63.7% reduction)
- Large tasks: 3466B -> 536B (84.5% reduction)
- Sample processing: 158M+ samples/sec (data organization/division)
@MarcusHsieh MarcusHsieh requested a review from bgunnar5 August 18, 2025 21:19
@MarcusHsieh MarcusHsieh self-assigned this Aug 18, 2025
@MarcusHsieh MarcusHsieh added the enhancement New feature or request label Aug 18, 2025
- merlin/common/tasks.py: +Universal Task support
- merlin/coordination/: +Coordination pattern modules
- merlin/task_servers/: +Celery server implementation
- merlin/workers/: +Universal Task System worker integration
- Fix test_kafka_worker.py broken import of non-existent KafkaWorker class
- Update imports to use actual KafkaTaskConsumer implementation
- Fix kafka_server.py import paths to reference correct modules
- Add compatibility methods to KafkaTaskConsumer for test interface
- Fix test_integration.py broken StudyExecutor reference

NOTE:
- KafkaWorkerManager: Worker lifecycle management (merlin/workers/)
- KafkaTaskConsumer: Actual task processing implementation (merlin/task_servers/)
- KafkaTaskWorkerRuntime: Legacy embedded runtime for compatibility
- Fix mock path from module-level to correct kafka.KafkaConsumer
- Update NameError references from KafkaWorker to KafkaTaskConsumer
- Fix task_registry import path to merlin.execution.task_registry
- Fix _initialize_consumer to properly call consumer.subscribe()
- Separate consumer creation from subscription for proper test mocking
- Add merlin_control topic to all subscription calls
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants