Node memory hog

Node memory hog causes memory resource exhaustion on the Kubernetes node.

It is injected using a helper pod running the Linux stress-ng tool.
The chaos affects the application for a specific duration.

Node Memory Hog

Use cases

Node memory hog fault:

Verifies application restarts on OOM kills.
Verifies resilience of applications whose replicas may be evicted on account on nodes becoming unschedulable (in NotReady state) due to lack of memory resources.
Simulates the situation of memory leaks in the deployment of microservices.
Simulates application slowness due to memory starvation.
Simulates noisy neighbour problems due to hogging.
Verifies pod priority and QoS setting for eviction purposes.

Permissions required

Below is a sample Kubernetes role that defines the permissions required to execute the fault.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: hce
  name: node-memory-hog
spec:
  definition:
    scope: Cluster
permissions:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["create", "delete", "get", "list", "patch", "deletecollection", "update"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "get", "list", "patch", "update"]
  - apiGroups: [""]
    resources: ["chaosEngines", "chaosExperiments", "chaosResults"]
    verbs: ["create", "delete", "get", "list", "patch", "update"]
  - apiGroups: [""]
    resources: ["pods/log"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["pods/exec"]
    verbs: ["get", "list", "create"]
  - apiGroups: ["batch"]
    resources: ["jobs"]
    verbs: ["create", "delete", "get", "list", "deletecollection"]
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["get", "list"]

Prerequisites

Kubernetes > 1.16
The target nodes should be in the ready state before and after injecting chaos.

Mandatory tunables

Tunable	Description	Notes
TARGET_NODES	Comma-separated list of nodes subject to node I/O stress.	For example, `node-1,node-2`. For more information, go to target nodes.
NODE_LABEL	It contains the node label that is used to filter the target nodes.	It is mutually exclusive with the `TARGET_NODES` environment variable. If both are provided, `TARGET_NODES` takes precedence. For more information, go to target nodes with labels.

Optional tunables

Tunable	Description	Notes
TOTAL_CHAOS_DURATION	Duration that you specify, through which chaos is injected into the target resource (in seconds).	Default: 120 s. For more information, go to duration of the chaos.
LIB_IMAGE	Image used to run the stress command.	Default: `harness/chaos-go-runner:main-latest`. For more information, go to image used by the helper pod.
MEMORY_CONSUMPTION_PERCENTAGE	Percent of the total node memory capacity.	Default: 30. For more information, go to memory consumption percentage.
MEMORY_CONSUMPTION_MEBIBYTES	Amount of the total available memory (in mebibytes). It is mutually exclusive with `MEMORY_CONSUMPTION_PERCENTAGE`.	For example, 256. For more information, go to memory consumption bytes.
NUMBER_OF_WORKERS	Number of VM workers involved in the stress.	Default: 1. For more information, go to workers for stress.
RAMP_TIME	Period to wait before and after injecting chaos (in seconds).	For example, 30 s. For more information, go to ramp time.
NODES_AFFECTED_PERC	Percentage of the total nodes to target. It takes numeric values only.	Default: 0 (corresponds to 1 node). For more information, go to node affected percentage.
SEQUENCE	Sequence of chaos execution for multiple target pods.	Default: parallel. Supports serial sequence as well. For more information, go to sequence of chaos execution.

Memory consumption percentage

Memory consumed (in percentage). Tune it by using the MEMORY_CONSUMPTION_PERCENTAGE environment variable.

The following YAML snippet illustrates the use of this environment variable:

# stress the memory of the targeted node with MEMORY_CONSUMPTION_PERCENTAGE of node capacity
# it is mutually exclusive with the MEMORY_CONSUMPTION_MEBIBYTES.
# if both are provided then it will use MEMORY_CONSUMPTION_PERCENTAGE for stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  annotationCheck: "false"
  chaosServiceAccount: litmus-admin
  experiments:
  - name: node-memory-hog
    spec:
      components:
        env:
        # percentage of total node capacity to be stressed
        - name: MEMORY_CONSUMPTION_PERCENTAGE
          value: '10' # in percentage
        - name: TOTAL_CHAOS_DURATION
          VALUE: '60'

Memory consumption mebibytes

Memory available (in mebibytes). Tune it by using the MEMORY_CONSUMPTION_MEBIBYTES environment variable. It is mutually exclusive with the MEMORY_CONSUMPTION_PERCENTAGE environment variable. If MEMORY_CONSUMPTION_PERCENTAGE environment variable is set, the fault uses this value for the stress.

The following YAML snippet illustrates the use of this environment variable:

# stress the memory of the targeted node with given MEMORY_CONSUMPTION_MEBIBYTES
# it is mutually exclusive with the MEMORY_CONSUMPTION_PERCENTAGE.
# if both are provided then it will use MEMORY_CONSUMPTION_PERCENTAGE for stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  annotationCheck: "false"
  chaosServiceAccount: litmus-admin
  experiments:
  - name: node-memory-hog
    spec:
      components:
        env:
        # node memory to be stressed
        - name: MEMORY_CONSUMPTION_MEBIBYTES
          value: '500' # in MiBi
        - name: TOTAL_CHAOS_DURATION
          VALUE: '60'

Workers for stress

Number of workers for stress. Tune it by using the NUMBER_OF_WORKERS environment variable.

The following YAML snippet illustrates the use of this environment variable:

# provide for the workers count for the stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: engine-nginx
spec:
  engineState: "active"
  annotationCheck: "false"
  chaosServiceAccount: litmus-admin
  experiments:
  - name: node-memory-hog
    spec:
      components:
        env:
        # total number of workers involved in stress
        - name: NUMBER_OF_WORKERS
          value: '1'
        - name: TOTAL_CHAOS_DURATION
          VALUE: '60'

Use cases​

Permissions required​

Prerequisites​

Mandatory tunables​

Optional tunables​

Memory consumption percentage​

Memory consumption mebibytes​

Workers for stress​

Use cases

Permissions required

Prerequisites

Mandatory tunables

Optional tunables

Memory consumption percentage

Memory consumption mebibytes

Workers for stress