Introduction

It's recommended to familiarize yourself with inspecting workloads on Kubernetes. This particular cheat sheet is very useful to have readily available.

Sanity Checks

Once the CSI driver has been deployed either through object configuration files, Helm or an Operator. This view should be representative of what a healthy system should look like after install. If any of the workload deployments lists anything but Running, proceed to inspect the logs of the problematic workload.

kubectl get pods --all-namespaces -l 'app in (nimble-csp, hpe-csi-node, hpe-csi-controller)'
NAMESPACE     NAME                                  READY   STATUS    RESTARTS   AGE
hpe-storage   hpe-csi-controller-7d9cd6b855-zzmd9   9/9     Running   0          15s
hpe-storage   hpe-csi-node-dk5t4                    2/2     Running   0          15s
hpe-storage   hpe-csi-node-pwq2d                    2/2     Running   0          15s
hpe-storage   nimble-csp-546c9c4dd4-5lsdt           1/1     Running   0          15s
kubectl get pods --all-namespaces -l 'app in (primera3par-csp, hpe-csi-node, hpe-csi-controller)'
NAMESPACE     NAME                                  READY   STATUS    RESTARTS   AGE
hpe-storage   hpe-csi-controller-7d9cd6b855-fqppd   9/9     Running   0          14s
hpe-storage   hpe-csi-node-86kh6                    2/2     Running   0          14s
hpe-storage   hpe-csi-node-k8p4p                    2/2     Running   0          14s
hpe-storage   hpe-csi-node-r2mg8                    2/2     Running   0          14s
hpe-storage   hpe-csi-node-vwb5r                    2/2     Running   0          14s
hpe-storage   primera3par-csp-546c9c4dd4-bcwc6      1/1     Running   0          14s

A Custom Resource Definition (CRD) named hpenodeinfos.storage.hpe.com holds important network and host initiator information.

Retrieve list of nodes.

kubectl get hpenodeinfos
$ kubectl get hpenodeinfos
NAME               AGE
tme-lnx-worker1   57m
tme-lnx-worker3   57m
tme-lnx-worker2   57m
tme-lnx-worker4   57m

Inspect a node.

kubectl get hpenodeinfos/tme-lnx-worker1 -o yaml
apiVersion: storage.hpe.com/v1
kind: HPENodeInfo
metadata:
  creationTimestamp: "2020-08-24T23:50:09Z"
  generation: 1
  managedFields:
  - apiVersion: storage.hpe.com/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        .: {}
        f:chap_password: {}
        f:chap_user: {}
        f:iqns: {}
        f:networks: {}
        f:uuid: {}
    manager: csi-driver
    operation: Update
    time: "2020-08-24T23:50:09Z"
  name: tme-lnx-worker1
  resourceVersion: "30337986"
  selfLink: /apis/storage.hpe.com/v1/hpenodeinfos/tme-lnx-worker1
  uid: 3984752b-29ac-48de-8ca0-8381532cbf06
spec:
  chap_password: RGlkIHlvdSByZWFsbHkgZGVjb2RlIHRoaXM/
  chap_user: chap-user
  iqns:
  - iqn.1994-05.com.redhat:828e7a4eef40
  networks:
  - 10.2.2.2/16
  - 172.16.6.115/24
  - 172.16.8.115/24
  - 172.17.0.1/16
  - 10.1.1.0/12
  uuid: 0242f811-3995-746d-652d-6c6e78352d77

NFS Server Provisioner Resources

The NFS Server Provisioner consists of a number of Kubernetes resources per PVC. The default Namespace where the resources are deployed is "hpe-nfs" but is configurable in the StorageClass. See base StorageClass parameters for more details.

Object Name                           Purpose
ConfigMap hpe-nfs-config This ConfigMap holds the configuration file for the NFS server. Local tweaks may be wanted. Please see the config file reference for more details.
Deployment hpe-nfs-UID The Deployment that is running the NFS Pod.
Service hpe-nfs-UID The Service the NFS clients perform mounts against.
PVC hpe-nfs-UID The RWO claim serving the NFS workload.

Tip

The UID stems from the user request RWX PVC for easy tracking. Use kubectl get pvc/my-pvc -o jsonpath='{.metadata.uid}{"\n"}' to retrieve it.

Tracing NFS resources

When troubleshooting NFS deployments it's common that only the source RWX PVC and Namespace is known. The next few steps explains how resources can be easily traced.

Retrieve the "hpe-nfs-UID" from the NFS Pod by specifying PVC and Namespace of the RWX PVC:

kubectl get pods -l provisioned-by=my-pvc,provisioned-from=my-namespace -A -o jsonpath='{.items[].metadata.labels.app}{"\n"}'

Next, enumerate the resources from the "hpe-nfs-UID":

kubectl get pvc,svc,deploy -A -o name --field-selector metadata.name=hpe-nfs-UID

Example output:

persistentvolumeclaim/hpe-nfs-UID
service/hpe-nfs-UID
deployment.apps/hpe-nfs-UID

If only the PV name is known, looking from the backend storage perspective, the PV name (and .spec.claimRef.uid) contains the UID, for example: "pvc-UID".

Clarification

The hpe-nfs-UID is abbreviated, it will contain a real UID added on, for example "hpe-nfs-98ce7c80-13f9-45d0-9609-089227bf97f1".

Volume and Snapshot Groups

If there's issues with VolumeSnapshots not being created when performing SnapshotGroup snapshots, checking the logs of the "csi-volume-group-provisioner" and "csi-volume-group-snapshotter" in the "hpe-csi-controller" Deployment.

kubectl logs -n hpe-storage deploy/hpe-csi-controller csi-volume-group-provisioner
kubectl logs -n hpe-storage deploy/hpe-csi-controller csi-volume-group-snapshotter

Logging

Log files associated with the HPE CSI Driver logs data to the standard output stream. If the logs need to be retained for long term, use a standard logging solution for Kubernetes such as Fluentd. Some of the logs on the host are persisted which follow standard logrotate policies.

CSI Driver Logs

Node driver:

kubectl logs -f  daemonset.apps/hpe-csi-node  hpe-csi-driver -n hpe-storage

Controller driver:

kubectl logs -f deployment.apps/hpe-csi-controller hpe-csi-driver -n hpe-storage

Tip

The logs for both node and controller drivers are persisted at /var/log/hpe-csi.log

Log Level

Log levels for both CSI Controller and Node driver can be controlled using LOG_LEVEL environment variable. Possible values are info, warn, error, debug, and trace. Apply the changes using kubectl apply -f <yaml> command after adding this to CSI controller and node container spec as below. For Helm charts this is controlled through logLevel variable in values.yaml.

          env:
            - name: LOG_LEVEL
              value: trace

CSP Logs

CSP logs can be accessed from their respective services.

kubectl logs -f deploy/nimble-csp -n hpe-storage
kubectl logs -f deploy/primera3par-csp -n hpe-storage

Log Collector

Log collector script hpe-logcollector.sh can be used to collect the logs from any node which has kubectl access to the cluster.

curl -O https://raw.githubusercontent.com/hpe-storage/csi-driver/master/hpe-logcollector.sh
chmod 555 hpe-logcollector.sh

Usage:

./hpe-logcollector.sh -h
Collect HPE storage diagnostic logs using kubectl.

Usage:
     hpe-logcollector.sh [-h|--help] [--node-name NODE_NAME] \
                         [-n|--namespace NAMESPACE] [-a|--all]
Options:
-h|--help                  Print this usage text
--node-name NODE_NAME      Collect logs only for Kubernetes node
                           NODE_NAME
-n|--namespace NAMESPACE   Collect logs from HPE CSI deployment in namespace
                           NAMESPACE (default: kube-system)
-a|--all                   Collect logs from all nodes (the default)

Tuning

HPE provides a set of well tested defaults for the CSI driver and all the supported CSPs. In certain case it may be necessary to fine tune the CSI driver to accommodate a certain workload or behavior.

Data Path Configuration

The HPE CSI Driver for Kubernetes automatically configures Linux iSCSI/multipath settings based on config.json. In order to tune these values, edit the config map with kubectl edit configmap hpe-linux-config -n hpe-storage and restart node plugin using kubectl delete pod -l app=hpe-csi-node to apply.

Important

HPE provide a set of general purpose default values for the IO paths, tuning is only required if prescribed by HPE.