Scaling Best Practices - Self-Managed Repository Integrations

To estimate the number of container replicas required for your workload, please contact your Customer Success Manager to get in touch with Mend Solutions Engineer.
The following guidance is designed for large deployments of 10,000 repositories or greater.

Cluster Worker Nodes

Recommended per node compute resources:
- 8 core X 64GB RAM
- 1 TB SSD storage
- AWS: r6i-2xl (or equivalent)

Controller

Scaling out the controller requires an ingress load balancer configured to round-robin requests across controller replicas. Controllers are stateless. Do not configure sticky sessions.
Container request/limit sizes:
- Request:
  - CPU: 2 core (2000m)
  - Memory: 6GB RAM (6G)
- Limit:
  - CPU: 2 core (2000m)
  - Memory: 6GB RAM (6G)
JVM
- Specify the following JAVA_OPTS settings:

Environment Variable	Value
JAVA_OPTS	-Xms4G -Xmx4G

In order to monitor and validate Scanner horizontal scaling, monitor and track pending and active scans via the scan queue statistics API.
Monitor controller logs for JVM out-of-memory errors.

Prior to v23.7.1, setting JVM options requires modification of the Controller shell.sh launch script. Contact Mend Professional Services for specific guidance. Upgrading to the latest version is recommended.
Specifying 4GB for the JVM heap space leaves ~2GB for the system and other JVM memory requirements.

Scanner

Scanners clone repositories and cache open source artifacts resolved using package manager manifest files. Storage limits will be determined by the size of cloned repos and the number of dependencies contained in repositories under scan. They will vary greatly from customer to customer and repo to repo.
Scanner container memory and storage limits should be determined by the largest build machine requirements used to build the largest repositories being scanned. In the absence of that information, use the following and adjust as necessary based on scanner performance.

Container request/limit sizes:
- Request:
  - CPU: 0.5 core (500m)
  - Memory: 2.5GB RAM (2500M)
  - Ephemeral Storage: 250GB (250G)
- Limit:
  - CPU: 1 core (1000m)
  - Memory: 5GB RAM (5G)
  - Ephemeral Storage: 500GB (500G)**
AutoScale
- DO NOT configure your container orchestrator (e.g. K8s) to automatically scale out the number of Scanner replicas based on CPU or memory utilization. These system metrics do not indicate additional scanners are needed. The only metric useful for scaling out is the number of pending scans from the Controller scan statistics API mentioned above.
JVM
- Do not adjust default JVM options
Git Connector
- The scanner uses JGit by default to clone repositories. JGit has several limitations including file size and lack of support for shallow cloning.
- For large-scale deployments, switch to the Git Connector to take advantage of shallow Git clones reducing ephemeral storage requirements during scans.
- If the Git Connector is enabled, Git will need to be configured for custom certificate authorities and proxies, if required. See our Custom Certificate guidance for more information. For proxy support, see Git documentation.
- To enable the Git Connector, set the following environment variable:

Environment Variable	Value
WS_GIT_CONNECTOR	true

Remediate

Remediate scales out using a Server/Worker model. The Remediate Server is stateful and manages an in-memory job queue. There can only be one instance of a Remediate Server per SCM integration cluster. Remediate Workers pull jobs off the Server queue and perform the R/R work. Workers are stateless and can scale out as required. Worker/Server modes are controlled by environment variables. See the product documentation for more information.

Server request/limit sizes:
- Request:
  - CPU: 0.6 core (600m)
  - Memory: 1GB (1Gi)
  - Disk: 20GB
- Limit:
  - CPU: 1.0 core (1000m)
  - Memory: 4GB (4Gi)
  - Disk: 20GB
Worker request/limit sizes:
- Request:
  - CPU: 0.6 core (600m)
  - Memory: 1GB (1Gi)
  - Disk: 20G
- Limit:
  - CPU: 1.0 core (1000m)
  - Memory: 2GB (4Gi)
  - Disk: 40GB (minimum, depending on repo size)
JVM - N/A
- Remediate is a Node.js application and does not require JVM settings
Other Settings
- To reduce the load on your SCM system and possible API rate limits, decrease the Remediate Server cron schedule from default hourly to daily at midnight UTC:
  - Add the following environment variable to the Remediate-Server pod:

Environment Variable	Value
SCHEDULER_CRON	0 0 * * *

Add the following environment variable to the Worker service if you experience hung Remediate Workers:

Environment Variable	Value
REMEDIATE_JOB_TIMEOUT_MINUTES	60

Deploy REDIS for caching across your Remediate container pool
- See the Redis Deployment Guide - Self-Managed Repository Integrations
Monitor the Remediate status API, and track the number of queued jobs over time. Use this information to determine the number of R/R workers required to handle your workload. See the Remediate docs for more detail on this status API.