Skip to content

Conversation

@elijah-rou
Copy link

Introduces a new DaemonSet deployment mode for JuiceFS mount pods when using StorageClass with mount sharing enabled. This feature provides better resource management by deploying mount pods as DaemonSets instead of individual shared pods, with configurable node affinity control through ConfigMaps.

  • Three mount modes: per-pvc (default), shared-pod, and daemonset

  • ConfigMap-based configuration: Control mount mode and node affinity via juicefs-mount-config ConfigMap

  • Node affinity support: Restrict DaemonSet deployment to specific nodes using standard Kubernetes node affinity

  • Automatic fallback: Falls back to shared-pod mode if DaemonSet cannot schedule on a node

  • Seamless transition: Works with existing StorageClasses without modification

  • pkg/juicefs/mount/mount_selector.go: Dynamic mount type selection based on configuration

  • pkg/juicefs/mount/daemonset_mount.go: DaemonSet mount implementation with scheduling error handling

  • pkg/config/mount_config.go: ConfigMap parsing for mount mode and node affinity

  • pkg/config/mount_config_helper.go: Helper functions for DaemonSet configuration

  • pkg/juicefs/mount/builder/daemonset.go: DaemonSet resource builder

Mount modes are configured via ConfigMap in kube-system namespace:

apiVersion: v1
kind: ConfigMap
metadata:
  name: juicefs-mount-config
  namespace: kube-system
data:
  default: |
    mode: shared-pod  # Options: per-pvc, shared-pod, daemonset
  my-storageclass: |
    mode: daemonset
    nodeAffinity:  # Required for daemonset mode
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: node.kubernetes.io/workload
            operator: In
            values: ["compute"]
  • pod_driver.go: Skip DaemonSet pods in deletion/recreation handlers

  • juicefs.go: Add MountSelector for dynamic mount type selection and AuthFs serialization

  • k8sclient: Add DaemonSet CRUD operations

  • RBAC: Add DaemonSet permissions to CSI node service account

  • docs/en/guide/daemonset-mount.md: Comprehensive usage documentation

  • docs/en/guide/mount-pod-configuration.md: Mount pod configuration guide

  • deploy/kubernetes/csi-daemonset-mount/: Example DaemonSet configurations

  • deploy/kubernetes/mount-config/: Example ConfigMap configurations

  • Unit tests for mount selector logic

  • Unit tests for DaemonSet mount implementation

  • ConfigMap parsing and validation tests

  1. Better resource utilization: One mount pod per node instead of per PVC

  2. Improved control: Node affinity ensures mount pods only run where needed

  3. Simplified operations: Easier to manage and monitor as DaemonSets

  4. Automatic lifecycle: DaemonSet controller handles pod creation/deletion

  5. Backward compatible: Works with existing StorageClasses and mount sharing

  6. Enable STORAGE_CLASS_SHARE_MOUNT in CSI Controller and Node

  7. Grant DaemonSet RBAC permissions (included in updated manifests)

  8. Create ConfigMap with desired mount mode configuration

  9. New PVCs will automatically use configured mount mode

  • Updated deploy/k8s.yaml and deploy/k8s_before_v1_18.yaml with DaemonSet RBAC
  • Added DaemonSet resource configurations
  • Updated Docker build process for consistency

This feature enhances the JuiceFS CSI Driver's flexibility in managing mount pods, particularly beneficial for large-scale deployments where mount pod proliferation can become a resource management challenge.

@zwwhdls zwwhdls requested review from zwwhdls and zxh326 and removed request for CaitinChen September 8, 2025 02:34
@zwwhdls
Copy link
Member

zwwhdls commented Sep 8, 2025

@elijah-rou Hi, please fix the conflicts, thanks!

@elijah-rou elijah-rou force-pushed the feat/daemonset-mounts-for-shared-mountpod branch from c5579c6 to f3fe857 Compare September 8, 2025 03:29
Introduces a new DaemonSet deployment mode for JuiceFS mount pods when
using StorageClass with mount sharing enabled. This feature provides
better resource management by deploying mount pods as DaemonSets instead
of individual shared pods, with configurable node affinity control
through ConfigMaps.

- **Three mount modes**: per-pvc (default), shared-pod, and daemonset
- **ConfigMap-based configuration**: Control mount mode and node
affinity via `juicefs-mount-config` ConfigMap
- **Node affinity support**: Restrict DaemonSet deployment to specific
nodes using standard Kubernetes node affinity
- **Automatic fallback**: Falls back to shared-pod mode if DaemonSet
cannot schedule on a node
- **Seamless transition**: Works with existing StorageClasses without
modification

- `pkg/juicefs/mount/mount_selector.go`: Dynamic mount type selection
based on configuration
- `pkg/juicefs/mount/daemonset_mount.go`: DaemonSet mount implementation
with scheduling error handling
- `pkg/config/mount_config.go`: ConfigMap parsing for mount mode and
node affinity
- `pkg/config/mount_config_helper.go`: Helper functions for DaemonSet
configuration
- `pkg/juicefs/mount/builder/daemonset.go`: DaemonSet resource builder

Mount modes are configured via ConfigMap in kube-system namespace:
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: juicefs-mount-config
  namespace: kube-system
data:
  default: |
    mode: shared-pod  # Options: per-pvc, shared-pod, daemonset
  my-storageclass: |
    mode: daemonset
    nodeAffinity:  # Required for daemonset mode
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: node.kubernetes.io/workload
            operator: In
            values: ["compute"]
```

- **pod_driver.go**: Skip DaemonSet pods in deletion/recreation handlers
- **juicefs.go**: Add MountSelector for dynamic mount type selection and
AuthFs serialization
- **k8sclient**: Add DaemonSet CRUD operations
- **RBAC**: Add DaemonSet permissions to CSI node service account

- `docs/en/guide/daemonset-mount.md`: Comprehensive usage documentation
- `docs/en/guide/mount-pod-configuration.md`: Mount pod configuration
guide
- `deploy/kubernetes/csi-daemonset-mount/`: Example DaemonSet
configurations
- `deploy/kubernetes/mount-config/`: Example ConfigMap configurations

- Unit tests for mount selector logic
- Unit tests for DaemonSet mount implementation
- ConfigMap parsing and validation tests

1. **Better resource utilization**: One mount pod per node instead of
per PVC
2. **Improved control**: Node affinity ensures mount pods only run where
needed
3. **Simplified operations**: Easier to manage and monitor as DaemonSets
4. **Automatic lifecycle**: DaemonSet controller handles pod
creation/deletion
5. **Backward compatible**: Works with existing StorageClasses and mount
sharing

1. Enable `STORAGE_CLASS_SHARE_MOUNT` in CSI Controller and Node
2. Grant DaemonSet RBAC permissions (included in updated manifests)
3. Create ConfigMap with desired mount mode configuration
4. New PVCs will automatically use configured mount mode

- Updated `deploy/k8s.yaml` and `deploy/k8s_before_v1_18.yaml` with
DaemonSet RBAC
- Added DaemonSet resource configurations
- Updated Docker build process for consistency

This feature enhances the JuiceFS CSI Driver's flexibility in managing
mount pods, particularly beneficial for large-scale deployments where
mount pod proliferation can become a resource management challenge.
@elijah-rou elijah-rou force-pushed the feat/daemonset-mounts-for-shared-mountpod branch from f3fe857 to 025e12d Compare September 8, 2025 03:39
@elijah-rou
Copy link
Author

@zwwhdls I based this off the latest release version, has MountMode been refactored to MountShareMode in master?

@zwwhdls
Copy link
Member

zwwhdls commented Sep 12, 2025

@elijah-rou Thanks a lot. It seems like this the whole process, please correct me if I say something wrong.

When mount:

  1. Set nodeAffinity in ConfigMap for storageClass;
  2. When app pod mount pvc, CSI create a daemonset, if it already exits, add a ref in annotation;
  3. Then the mount pods will be created in nodes described in ConfigMap;
  4. Only if the app pod is in node which is inside the scope described in ConfigMap, it can mount successfully and work well. Otherwise, the app pod failed.

When umount:

  1. Remove its ref in annotation;
  2. When there is no refs, delete the daemonset;

Here are some issues that may not be considered:

  1. The daemonset will be updated when config is changed. But its updateStrategy is RollingUpdate, that means all existed pod will be recreated, in this situation, automatic recovery of juicefs client does not work and all mountpoints in app pods will be broken.
  2. We have a periodic backend task in pod_driver.go which detect all mount pods by polling kubelet (or watch apiserver) and recover them when something is wrong or upgraded. In the daemonset mode, it does not work neither.
  3. Cache clean not work.

And the most important thing I want to know is: What is the biggest benefit of daemonset mode compared by mount pod? In other words, what problem did it solve?

@zwwhdls
Copy link
Member

zwwhdls commented Sep 12, 2025

Besides, CSI already has a global configmap where users can put configurations of mount pod. There is no need to introduce a new one, please use the global one. Thanks!

@zxh326
Copy link
Member

zxh326 commented Sep 12, 2025

I cannot test this mode in my clusters because it always falls back to per-pvc mode

I0912 03:28:53.521652       7 mount_config.go:170] "Loaded mount configuration" logger="mount-config" storageClass="juicefs-local-fs-sc" deploymentMode="daemonset" hasNodeAffinity=true
I0912 03:28:53.521686       7 mount_selector.go:89] "Using per-PVC pod mount" logger="NodePublishVolume" appName="busybox-default-6d884fb446-wpgg9" volumeId="pvc-47cbeefe-6c18-4dee-ba80-f9ec4076e7ee"

config:

apiVersion: v1
data:
  juicefs-local-fs-sc: |
    deploymentMode: daemonset
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: juicefs.io/cg-worker
            operator: In
            values:
            - "true"
kind: ConfigMap

This PR will introduce breaking changes, I updated to this version, which broke my exist service, this is unacceptable.

@elijah-rou
Copy link
Author

elijah-rou commented Sep 12, 2025

And the most important thing I want to know is: What is the biggest benefit of daemonset mode compared by mount pod? In other words, what problem did it solve?

This mount mode has 2 main benefits:

  1. When a workload that requires a juicefs mount comes onto a node, the CSI driver does not need to start a mount pod on demand. It will already be there, since it is a daemonset. This saves time on startup significantly (the mount pod is both already running and has mounted the correct FS)
  2. When a node is shutting down, the daemonsets are taken after usual workload pods. This guarentees on a k8s level that the user pods will finish with their workloads and interactions with juicefs before shutting down. In cases of high node churn (like ours), this guarantees that fs operations will complete, and avoids the current fault in the CSI driver that causes the mount pod to instantly shut down when a node goes into termination.

Besides, CSI already has a global configmap where users can put configurations of mount pod. There is no need to introduce a new one, please use the global one. Thanks!

I can put this in the global configmap

I cannot test this mode in my clusters because it always falls back to per-pvc mode

This is likely due to the rebase onto master. Based off the 0.29.2 branch (latest release version), it works fine and is not a breaking change. The issue with the rebase likely is coming from the MountShareMode changes in the master branch. Since this is new I cannot read any documentation on this, and I am not sure what the intention of this change in master is to fully integrate this proposed change of the PR.

@zxh326
Copy link
Member

zxh326 commented Sep 15, 2025

And the most important thing I want to know is: What is the biggest benefit of daemonset mode compared by mount pod? In other words, what problem did it solve?

This mount mode has 2 main benefits:

  1. When a workload that requires a juicefs mount comes onto a node, the CSI driver does not need to start a mount pod on demand. It will already be there, since it is a daemonset. This saves time on startup significantly (the mount pod is both already running and has mounted the correct FS)
  2. When a node is shutting down, the daemonsets are taken after usual workload pods. This guarentees on a k8s level that the user pods will finish with their workloads and interactions with juicefs before shutting down. In cases of high node churn (like ours), this guarantees that fs operations will complete, and avoids the current fault in the CSI driver that causes the mount pod to instantly shut down when a node goes into termination.

Besides, CSI already has a global configmap where users can put configurations of mount pod. There is no need to introduce a new one, please use the global one. Thanks!

I can put this in the global configmap

I cannot test this mode in my clusters because it always falls back to per-pvc mode

This is likely due to the rebase onto master. Based off the 0.29.2 branch (latest release version), it works fine and is not a breaking change. The issue with the rebase likely is coming from the MountShareMode changes in the master branch. Since this is new I cannot read any documentation on this, and I am not sure what the intention of this change in master is to fully integrate this proposed change of the PR.

Version 0.30.0 introduces a new mode similar to shareStorage that reuses mount points for the same file system.

docs are here https://juicefs.com/docs/csi/guide/resource-optimization/#share-mount-pod-for-the-same-file-system.

The breaking change might be here, you shouldn't return the unique id of storage class for all modes.

@zxh326 zxh326 marked this pull request as draft September 22, 2025 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants