Introduction and Prerequisites

This document provides information to the user about how to deploy and consume CNF related operators and additional objects. The goal is to understand how to deploy those elements and how to properly use them.

The topics covered in this document are:

  • performance addon operator
  • SR-IOV operator
  • PTP operator
  • SCTP module
  • DPDK s2i

NOTE: In each case, we deliberately choose to use plain yaml manifests to deploy the components to provide the user a clear view of what assets are involved.

General Prerequisites

The following items are needed in order to be able to complete the lab from beginning to end:

  • An openshift cluster with at least one physical worker, ideally two.
  • In a disconnected environment, imagecontentsources and catalog sources in place.
  • oc installed and configured with KUBECONFIG env variable.
  • imageregistry properly configured with a valid backend for dpdk s2i push to work.
  • git tool and repository https://github.com/karmab/performance-operators-lab cloned
  • tasty binary available here to simplify olm operator installation

NOTE: Both ctlplanes and workers can be virtual. If using virtual workers, you will only be able to properly use the performance profile operator and sctp module, as the other ones have hardware dependency.

NOTE: If running disconnected, you will have to adapt the source field in the install samples to match your environment.

Clone the repo where we host the sample assets:

git clone https://github.com/karmab/performance-operators-lab
cd performance-operators-lab

Performance addon operator

In this section, we install and configure the performance addon operator, which is an operator aimed at simplifying the configuration of workers needing enhancements in order to run compute sensitive workloads.

Deployment

Launch the following command:

tasty install performance-addon-operator

Expected Output

operatorgroup.operators.coreos.com/openshift-performance-addon-operatorgroup created
subscription.operators.coreos.com/performance-addon-operator-subscription created

You can then wait for operator to show in the openshift-performance-addon namespace with oc get pod -n openshift-performance-addon

Expected Output

NAME                                    READY   STATUS    RESTARTS   AGE
performance-operator-6694c8c9fb-q64qw   1/1     Running   1          123m

How to use

The performance addon operator deploys a custom resource (CR) named performanceprofile that we can use to configure the nodes. Creating such profile will trigger the creation of underlying objects such as machineconfigs or tuned objects, with handling done by more lowlevel operators.

The full list of attributes handled by the operator can be see here.

The workflow goes as follows:

  • Label the workers that need specific configuration. We will use the commonly used label worker-cnf:
oc label node your_worker node-role.kubernetes.io/worker-cnf='' --overwrite
  • Create machine config pools that matches previously set labels. As previous task, it’s admin’s responsability to create those assets to match its platforms

for instance we can use the following definition and create it with oc apply -f

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-cnf
  labels:
    machineconfiguration.openshift.io/role: worker-cnf
spec:
  machineConfigSelector:
    matchExpressions:
      - {
          key: machineconfiguration.openshift.io/role,
          operator: In,
          values: [worker-cnf, worker],
        }
  paused: false
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-cnf: ""

In This definition, note that:

  • We can only indicate one extra label beyond worker in the machineConfigSelector. Note that it implies that a given node can only belong to a single machine config pool at once.
  • The machineconfig pool has the paused field to False. If paused were set to True, performance profile operator will create the assets but nothing will effectively be applied.
  • We specify which labels to match in the nodeSelector section.
  • Create a performance profile CR that matches the label:
apiVersion: performance.openshift.io/v1alpha1
kind: PerformanceProfile
metadata:
  name: worker-cnf
spec:
  cpu:
    isolated: 0-8
    reserved: 9-15
  hugepages:
    defaultHugepagesSize: "1G"
    pages:
    - size: "1G"
      count: 16
      node: 0
  realTimeKernel:
    enabled: true
  nodeSelector:
    node-role.kubernetes.io/worker-cnf: ""

In this definition, note the following elements:

  • We are setting 16 hugepages of 1GB on numa node 0. For a testing/virtual env, you’ll want to lower this number to 1
  • We are enabling realtime kernel, which will effectively add an extra label to one of the created machineconfigs so that the machineconfig operator reboots the node using already installed kernel with realtime.
  • We’re targeting the machineconfigpool indirectly by matching the proper label in the nodeSelector section.

After creating this CR, you can monitor the machineconfigpools master and worker-cnf with oc get mcp to see the progress towards enabling the features.

NOTE: All the nodes will initially be rebooted the first time, as a feature gate for the topology manager gets enabled.

SR-IOV Operator

Deployment

NOTE: In order to connect our pod to a real dhcp network, we need to patch the openshift network operator to Add a dummy dhcp network to start the dhcp daemonset by the operator.

oc patch networks.operator.openshift.io cluster --type='merge' -p='{"spec":{"additionalNetworks":[{"name":"dummy-dhcp-network","simpleMacvlanConfig":{"ipamConfig":{"type":"dhcp"},"master":"eth0","mode":"bridge","mtu":1500},"type":"SimpleMacvlan"}]}}'

Launch the following command:

tasty install sriov-network-operator

Expected Output

namespace/openshift-sriov-network-operator created
operatorgroup.operators.coreos.com/sriov-network-operators created
subscription.operators.coreos.com/sriov-network-operator created

You can then wait for operators to show in the openshift-sriov-network-operator namespace with oc get pod -n openshift-sriov-network-operator

Expected Output

NAME                                      READY   STATUS        RESTARTS   AGE
network-resources-injector-hntx4          1/1     Running       0          176m
network-resources-injector-rdgqt          1/1     Running       0          176m
network-resources-injector-zth79          1/1     Running       0          176m
operator-webhook-8npdk                    1/1     Running       0          176m
operator-webhook-hnnz2                    1/1     Running       0          176m
operator-webhook-vqhjg                    1/1     Running       0          176m
sriov-cni-zjff2                           1/1     Running       0          3m50s
sriov-device-plugin-4wf9x                 1/1     Running       0          109s
sriov-network-config-daemon-bwdw9         1/1     Running       0          88m
sriov-network-config-daemon-jhhwp         1/1     Running       1          88m
sriov-network-operator-5f8cb9fb58-ql648   1/1     Running       0          113m

Beyond operator, sriov-network-config-daemon pods appear for each node.

How to use

After the operator gets installed, We have the following CRS:

  • SriovNetworkNodeState
  • SriovNetwork
  • SriovNetworkNodePolicy

SriovNetworkNodeState CRS are readonly and provide information about SR-IOV capable devices in the cluster. We can list them with oc get sriovnetworknodestates.sriovnetwork.openshift.io -n openshift-sriov-network-operator  -o yaml

Expected Output

apiVersion: v1
items:
- apiVersion: sriovnetwork.openshift.io/v1
  kind: SriovNetworkNodeState
  metadata:
    creationTimestamp: "2020-05-25T22:08:04Z"
    generation: 19
    name: cnf10-worker-0.xxx.kni.lab.bonka.mad.hendrix.com
    namespace: openshift-sriov-network-operator
    ownerReferences:
    - apiVersion: sriovnetwork.openshift.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: SriovNetworkNodePolicy
      name: default
      uid: 642fc098-d30c-4638-8851-edaf68b00357
    resourceVersion: "426718"
    selfLink: /apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworknodestates/cnf10-worker-0.xxx.lab.mad.hendrix.com
    uid: b92037d2-c1bb-43c6-84a0-59973e7815bd
  spec:
    dpConfigVersion: "425914"
    interfaces:
    - name: eno1
      numVfs: 5
      pciAddress: "0000:19:00.0"
      vfGroups:
      - deviceType: netdevice
        resourceName: testresource
        vfRange: 2-4
  status:
    interfaces:
    - Vfs:
      - deviceID: "1016"
        driver: mlx5_core
        mtu: 1500
        pciAddress: "0000:19:00.2"
        vendor: 15b3
        vfID: 0
      - deviceID: "1016"
        driver: mlx5_core
        mtu: 1500
        pciAddress: "0000:19:00.3"
        vendor: 15b3
        vfID: 1
      - deviceID: "1016"
        driver: mlx5_core
        mtu: 1500
        pciAddress: "0000:19:00.4"
        vendor: 15b3
        vfID: 2
      - deviceID: "1016"
        driver: mlx5_core
        mtu: 1500
        pciAddress: "0000:19:00.5"
        vendor: 15b3
        vfID: 3
      - deviceID: "1016"
        driver: mlx5_core
        mtu: 1500
        pciAddress: "0000:19:00.6"
        vendor: 15b3
        vfID: 4
      deviceID: "1015"
      driver: mlx5_core
      mtu: 1500
      name: eno1
      numVfs: 5
      pciAddress: "0000:19:00.0"
      totalvfs: 5
      vendor: 15b3
    - deviceID: "1015"
      driver: mlx5_core
      mtu: 1500
      name: eno2
      pciAddress: "0000:19:00.1"
      totalvfs: 5
      vendor: 15b3
    - deviceID: "1015"
      driver: mlx5_core
      mtu: 1500
      name: ens1f0
      pciAddress: 0000:3b:00.0
      totalvfs: 5
      vendor: 15b3
    - deviceID: "1015"
      driver: mlx5_core
      mtu: 1500
      name: ens1f1
      pciAddress: 0000:3b:00.1
      totalvfs: 5
      vendor: 15b3
    syncStatus: Succeeded
- apiVersion: sriovnetwork.openshift.io/v1
  kind: SriovNetworkNodeState
  metadata:
    creationTimestamp: "2020-05-26T09:21:48Z"
    generation: 2
    name: cnf11-worker-0.xxx.lab.mad.hendrix.com
    namespace: openshift-sriov-network-operator
    ownerReferences:
    - apiVersion: sriovnetwork.openshift.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: SriovNetworkNodePolicy
      name: default
      uid: 642fc098-d30c-4638-8851-edaf68b00357
    resourceVersion: "425937"
    selfLink: /apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworknodestates/cnf11-worker-0.xxx.lab.mad.hendrix.com
    uid: fcda2f57-b0bf-444f-ae8d-c9329f574544
  spec:
    dpConfigVersion: "425914"
  status:
    interfaces:
    - deviceID: "1015"
      driver: mlx5_core
      mtu: 1500
      name: eno1
      pciAddress: "0000:19:00.0"
      totalvfs: 5
      vendor: 15b3
    - deviceID: "1015"
      driver: mlx5_core
      mtu: 1500
      name: eno2
      pciAddress: "0000:19:00.1"
      totalvfs: 5
      vendor: 15b3
    - deviceID: "1015"
      driver: mlx5_core
      mtu: 1500
      name: ens1f0
      pciAddress: 0000:3b:00.0
      totalvfs: 5
      vendor: 15b3
    - deviceID: "1015"
      driver: mlx5_core
      mtu: 1500
      name: ens1f1
      pciAddress: 0000:3b:00.1
      totalvfs: 5
      vendor: 15b3
    syncStatus: Succeeded
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

We can get a given nic configured by the operator by creating a SriovNetworkNodePolicy CR, by specifying it with nicSelector and targetting specific nodes with nodeSelector, for instance to configure eno1:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: sriov-network-node-policy
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  isRdma: true
  nicSelector:
    pfNames:
      - eno1
  nodeSelector:
    node-role.kubernetes.io/worker-cnf: ""
  numVfs: 5
  resourceName: sriovnic

Once the node policy is created, the operator will update the node (its nic) accordingly, which can be viewed using the previous SriovNetworkNodeState. Note it might imply that the node gets rebooted as some elements are BIOS specific.

NOTE: You might have to adapt the spec depending on your nic model.Consult https://docs.openshift.com/container-platform/4.4/networking/hardware_networks/about-sriov.html#supported-devices_about-sriov for details

Finally, we create a SriovNetwork CR which refer to the ‘resourceName’ defined in SriovNetworkNodePolicy. Then a network-attachment-definitions CR will be generated by operator with the same name and namespace, for instance:

---
apiVersion: v1
kind: Namespace
metadata:
  name: sriov-testing
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: sriov-network
  namespace: openshift-sriov-network-operator
spec:
  ipam: |
    {
      "type": "dhcp"
    }
  networkNamespace: sriov-testing
  resourceName: sriovnic
  vlan: 0

A new network-attachment-definition got created, which we can then use in our pod definition, as we would for other multus backends.

oc get network-attachment-definitions -n sriov-testing

Expected Output

NAME           AGE
sriov-network   3d14h

Pods can be created making use of this network attachment definition:

apiVersion: v1
kind: Pod
metadata:
  name: sriovpod
  namespace: sriov-testing
  annotations:
    k8s.v1.cni.cncf.io/networks:  sriov-network
spec:
  containers:
  - name: sriovpod
    command: ["/bin/sh", "-c", "trap : TERM INT; sleep 600000& wait"]
    image: alpine

PTP Operator

Deployment

Launch the following command:

tasty install ptp-operator

Expected Output

namespace/openshift-ptp created
operatorgroup.operators.coreos.com/ptp-operators created
subscription.operators.coreos.com/ptp-operator-subscription created

We wait for operators to show in the openshift-ptp namespace with oc get pod -n openshift-ptp

Expected Output

NAME                           READY   STATUS    RESTARTS   AGE
linuxptp-daemon-9tvk8          1/1     Running   0          18m
linuxptp-daemon-qv9w9          1/1     Running   0          18m
linuxptp-daemon-r6dr4          1/1     Running   0          18m
linuxptp-daemon-sbfgs          1/1     Running   0          18m
linuxptp-daemon-w4tbx          1/1     Running   0          18m
ptp-operator-8844cc676-7d6hc   1/1     Running   0          112m

Beyond operator, we can see linuxptp-daemons pods for each node, which encapsulates the ptp4l daemon.

How to use

The operator deploys the CR PtpConfig that we can use to configure the nodes by matching them with a specific label. For instance, to configure a node as PTP grandmaster, we can inject the following CR

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: grandmaster
  namespace: openshift-ptp
spec:
  profile:
  - name: "grandmaster"
    interface: "eno1"
    ptp4lOpts: ""
    phc2sysOpts: "-a -r -r"
  recommend:
  - profile: "grandmaster"
    priority: 4
    match:
    - nodeLabel: "ptp/grandmaster"

Or, for a slave:

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: slave
  namespace: openshift-ptp
spec:
  profile:
  - name: "slave"
    interface: "eno1"
    ptp4lOpts: "-s"
    phc2sysOpts: "-a -r"
  recommend:
  - profile: "slave"
    priority: 4
    match:
    - nodeLabel: "ptp/slave"

The difference between those two CRS lies in :

  • the ptp4lOpts and phc2sysOpts attributes of the profile.
  • the matching done between a profile and nodeLabel.

We can then Label the workers that need specific configuration. For instance, for the nodes to be used as grandmaster:

oc label node your_worker ptp/grandmaster='' --overwrite

We can then monitor the linuxptp-daemon pods of each node to check how the profile gets applied (and sync occurs, if a grandmaster is found).

SCTP module

Launch the following command:

oc create -f sctp/install.yml

Expected Output

machineconfig.machineconfiguration.openshift.io/load-sctp-module created

The SCTP module consists of a single machineconfig, which makes sure that the sctp module is not blacklisted and loaded at boot time. We can inject the following manifest with oc apply -f

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker-cnf
  name: load-sctp-module
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
        - contents:
            source: data:,
            verification: {}
          filesystem: root
          mode: 420
          path: /etc/modprobe.d/sctp-blacklist.conf
        - contents:
            source: data:text/plain;charset=utf-8,sctp
          filesystem: root
          mode: 420
          path: /etc/modules-load.d/sctp-load.conf

Once done, and provided there is a matching mcp in the cluster, the node will get rebooted and have the module loaded, which can be checked by sshing in the node (or running oc debug node/$node) and running sudo lsmod | grep sctp.

DPDK base image

This part covers a base mage containing DPDK framework. It depends on sriov beeing deployed and working on the cluster.

This image can be used as a base to build and package a dpdk based application.

For instance, we will use source to image as a example of mechanism allowing to do the build from a git repository.

Deployment

We launch the following yamls, which will trigger the building of the image and its pushing against image registry

oc create -f dpdk/dpdk-network.yml
oc create -f dpdk/scc.yml

We will create a secret so that we can pull the dpdk base image from redhat.io

SECRET='registrysecret'
REGISTRY='registry.redhat.io'
USERNAME='XXX'
PASSWORD='YYY'
MAIL="ZZZ"
oc create secret docker-registry $SECRET --docker-server=$REGISTRY --docker-username=$USERNAME --docker-password=$PASSWORD --docker-email=$MAIL
oc secrets link default $SECRET --for=pull
oc secrets link builder $SECRET --for=pull

Then we launch the building of the image from source code and its pushing against image registry:

oc create -f dpdk/build-config.yml

NOTE: the build config points to https://github.com/openshift-kni/cnf-features-deploy/tree/master/tools/s2i-dpdk/test/test-app as a sample app. In a real world, you would point to the source code where your application lives.

NOTE: the build config makes use of the dpdk-base-rhel8 image fetching it from registry.redhat.io. In a disconnected environment, you would edit the yaml so that it points to your disconnected registry.

Once we create thosse assets, we can check the building of the image in the dpdk namespace with oc get pod -n dpdk, which will eventuall show as a completed pod.

NAME               READY   STATUS    RESTARTS   AGE
s2i-dpdk-1-build   1/1     Running   0          80s

How to use

We can then create a nodepolicy to configure a given nic, the corresponding sriovnetwork and a deployment config to actually launch the resulting application.

oc create -f dpdk/sriov-networknodepolicy-dpdk.yml
oc create -f dpdk/deployment-config.yml

The app will show as pod named s2i-dpdk-app-* in the dpdk namespace. One can then oc rsh in the pod and run testpmd commands.