NIST SPECIAL PUBLICATION 1800-16B


Securing Web Transactions

TLS Server Certificate Management


Volume B:

Security Risks and Recommended Best Practices



Donna Dodson

William Haag

Murugiah Souppaya

NIST


Paul Turner

Venafi


William C. Barker

Strativia


Mary Raguso

Susan Symington

The MITRE Corporation



June 2020


Final


This publication is available free of charge from: http://doi.org/10.6028/NIST.SP.1800-16


The first draft of this publication is available free of charge from: https://www.nccoe.nist.gov/projects/building-blocks/tls-server-certificate-management


nccoenistlogos




DISCLAIMER

Certain commercial entities, equipment, products, or materials may be identified by name or company logo or other insignia in order to acknowledge their participation in this collaboration or to describe an experimental procedure or concept adequately. Such identification is not intended to imply special status or relationship with NIST or recommendation or endorsement by NIST or NCCoE; neither is it intended to imply that the entities, equipment, products, or materials are necessarily the best available for the purpose.

National Institute of Standards and Technology Special Publication 1800-16B, Natl. Inst. Stand. Technol. Spec. Publ. 1800-16B, 108 pages, (June 2020), CODEN: NSPUE2

FEEDBACK

As a private-public partnership, we are always seeking feedback on our practice guides. We are particularly interested in seeing how businesses apply NCCoE reference designs in the real world. If you have implemented the reference design, or have questions about applying it in your environment, please email us at tls-cert-mgmt-nccoe@nist.gov.

All comments are subject to release under the Freedom of Information Act.

National Cybersecurity Center of Excellence
National Institute of Standards and Technology
100 Bureau Drive
Mailstop 2002
Gaithersburg, MD 20899

NATIONAL CYBERSECURITY CENTER OF EXCELLENCE

The National Cybersecurity Center of Excellence (NCCoE), a part of the National Institute of Standards and Technology (NIST), is a collaborative hub where industry organizations, government agencies, and academic institutions work together to address businesses’ most pressing cybersecurity issues. This public-private partnership enables the creation of practical cybersecurity solutions for specific industries, as well as for broad, cross-sector technology challenges. Through consortia under Cooperative Research and Development Agreements (CRADAs), including technology partners—from Fortune 50 market leaders to smaller companies specializing in information technology (IT) security—the NCCoE applies standards and best practices to develop modular, easily adaptable example cybersecurity solutions using commercially available technology. The NCCoE documents these example solutions in the NIST Special Publication 1800 series, which maps capabilities to the NIST Cybersecurity Framework and details the steps needed for another entity to recreate the example solution. The NCCoE was established in 2012 by NIST in partnership with the State of Maryland and Montgomery County, Maryland.

To learn more about the NCCoE, visit https://www.nccoe.nist.gov/. To learn more about NIST, visit

NIST CYBERSECURITY PRACTICE GUIDES

NIST Cybersecurity Practice Guides (Special Publication 1800 series) target specific cybersecurity challenges in the public and private sectors. They are practical, user-friendly guides that facilitate the adoption of standards-based approaches to cybersecurity. They show members of the information security community how to implement example solutions that help them align more easily with relevant standards and best practices, and provide users with the materials lists, configuration files, and other information they need to implement a similar approach.

The documents in this series describe example implementations of cybersecurity practices that businesses and other organizations may voluntarily adopt. These documents do not describe regulations or mandatory practices, nor do they carry statutory authority.

ABSTRACT

Transport Layer Security (TLS) [B5] server certificates [B3] are critical to the security of both internet-facing and private web services. A large- or medium-scale enterprise may have thousands or even tens of thousands of such certificates, each identifying a specific server in its environment. Despite the critical importance of these certificates, many organizations lack a formal TLS certificate management program and do not have the ability to centrally monitor and manage their certificates. Instead, certificate management tends to be spread across each of the different groups responsible for the various servers and systems in an organization. Central security teams struggle to make sure that certificates are being properly managed by each of these disparate groups. Where there is no central certificate management service, the organization is at risk because once certificates are deployed, it is necessary to maintain current inventories to support regular monitoring and certificate maintenance. Organizations that do not properly manage their certificates face significant risks to their core operations, including:

- application outages caused by expired TLS [B5] server certificates

- hidden intrusion, exfiltration, disclosure of sensitive data, or other attacks resulting from encrypted threats or server impersonation

- application outages or attacks resulting from delayed replacement of certificates and private keys in response to either certificate authority compromise or discovery of vulnerabilities in cryptographic algorithms or libraries

Despite the mission-critical nature of TLS server certificates, many organizations have not defined the clear policies, processes, roles, and responsibilities needed for effective certificate management. Moreover, many organizations do not leverage available automation tools to support effective management of the ever growing numbers of certificates. The consequence is continuing susceptibility to security incidents.

This NIST Cybersecurity Practice Guide shows large and medium enterprises how to employ a formal TLS certificate management program to address certificate-based risks and challenges. It describes the TLS certificate management challenges faced by organizations; provides recommended best practices for large-scale TLS server certificate management; describes an automated proof-of-concept implementation that demonstrates how to prevent, detect, and recover from certificate-related incidents; and provides a mapping of the demonstrated capabilities to the recommended best practices and to NIST security guidelines and frameworks.

This NIST Cybersecurity Practice Guide consists of the following volumes:

- Volume A: Executive Summary

- Volume B: Security Risks and Recommended Best Practices (you are here)

- Volume C: Approach, Architecture, and Security Characteristics

- Volume D: How-To Guides – instructions for building the example solution

KEYWORDS

Authentication; certificate; cryptography; identity; key; key management; PKI; private key; public key; public key infrastructure; server; signature; TLS; Transport Layer Security

ACKNOWLEDGMENTS

We are grateful to the following individuals for their generous contributions of expertise and time.

Name

Organization

Dean Coclin

DigiCert

Tim Hollebeek

DigiCert

Clint Wilson

DigiCert

Dung Lam

F5

Robert Smith

F5

Elaine Barker

NIST

William Polk

NIST

Andrew Regenscheid

NIST

Rob Clatterbuck

Thales Trusted Cyber Technologies (Thales TCT)

Jane Gilbert

Thales TCT

Alexandros Kapasouris

Symantec

Mehwish Akram

The MITRE Corporation

Brian Johnson

The MITRE Corporation

Sarah Kinling

The MITRE Corporation

Bob Masucci

The MITRE Corporation

Susan Prince

The MITRE Corporation

Mary Raguso

The MITRE Corporation

Aaron Aubrecht

Venafi

Justin Hansen

Venafi

DOCUMENT CONVENTIONS

The terms “shall” and “shall not” indicate requirements to be followed strictly in order to conform to the publication and from which no deviation is permitted.

The terms “should” and “should not” indicate that among several possibilities one is recommended as particularly suitable, without mentioning or excluding others, or that a certain course of action is preferred but not necessarily required, or that (in the negative form) a certain possibility or course of action is discouraged but not prohibited.

The terms “may” and “need not” indicate a course of action permissible within the limits of the publication.

The terms “can” and “cannot” indicate a possibility and capability, whether material, physical or causal.

CALL FOR PATENT CLAIMS

This public review includes a call for information on essential patent claims (claims whose use would be required for compliance with the guidance or requirements in this Information Technology Laboratory (ITL) draft publication). Such guidance and/or requirements may be directly stated in this ITL Publication or by reference to another publication. This call also includes disclosure, where known, of the existence of pending U.S. or foreign patent applications relating to this ITL draft publication and of any relevant unexpired U.S. or foreign patents.

ITL may require from the patent holder, or a party authorized to make assurances on its behalf, in written or electronic form, either:

a) assurance in the form of a general disclaimer to the effect that such party does not hold and does not currently intend holding any essential patent claim(s); or

b) assurance that a license to such essential patent claim(s) will be made available to applicants desiring to utilize the license for the purpose of complying with the guidance or requirements in this ITL draft publication either:

i) under reasonable terms and conditions that are demonstrably free of any unfair discrimination; or
ii) without compensation and under reasonable terms and conditions that are demonstrably free of any unfair discrimination.

Such assurance shall indicate that the patent holder (or third party authorized to make assurances on its behalf) will include in any documents transferring ownership of patents subject to the assurance, provisions sufficient to ensure that the commitments in the assurance are binding on the transferee, and that the transferee will similarly include appropriate provisions in the event of future transfers with the goal of binding each successor-in-interest.

The assurance shall also indicate that it is intended to be binding on successors-in-interest regardless of whether such provisions are included in the relevant transfer documents.

Such statements should be addressed to: tls-cert-mgmt-nccoe@nist.gov

List of Figures

Figure 2‑1 TLS Certificates Are Broadly Used for Communications in Organizations

Figure 2‑2 Server Address, Public Key, and Issuer Information on Four of the Organization’s TLS Server Certificates

Figure 2‑3 Upon Connecting to the Server, the Client Receives the Server’s TLS Certificate, Which Includes the Server’s Public Key

Figure 2‑4 Browsers and Various Automated Processes (Web Servers, Containers, and IoT Devices) Connect as Clients to TLS Servers

Figure 2‑5 A Public Root CA’s Root Certificate Is Delivered to the User, Installed on a Software Vendor’s Software

Figure 2‑6 A Root CA Issues a Certificate to an Intermediate/Issuing CA, Which Issues TLS Server Certificates

Figure 2‑7 Upon Connecting to the Server, the Client Receives Both the Server’s TLS Certificate and Its CA Certificate Chain

Figure 2‑8 Certificate Issuance Process

Figure 3‑1 How an Attacker Leverages Encrypted Connections to Hide Attacks

Figure 3‑2 Methods for Gaining Visibility into Encrypted Communications

Figure 4‑1 TLS Certificates Are Distributed Broadly Across Enterprise Environments and Groups

Figure 5‑1 Various Options for Automated Discovery and the Import of Certificates

Figure 5‑2 Example Timeline of Processes and Notifications Triggered by Impending Certificate Expiration

List of Tables

Table 1 Mapping the Recommended Best Practices for TLS Server Certificate Management to the Cybersecurity Framework

Table 2 Application of Specific Controls to TLS Server Certificate Management Recommended Best Practices

1 Introduction

Organizations risk losing revenue, customers, and reputation, and exposing internal or customer data to attackers if they do not properly manage Transport Layer Security (TLS) server certificates. TLS is the most widely used security protocol to secure web transactions and other communications on the internet and internal networks. TLS server certificates are central to the security and operation of internet-facing and internal web services. Improper TLS server certificate management results in significant outages to web applications and services—such as government services, online banking, flight operations, and mission-critical services within an organization—and increased risk of security breaches. Organizations should ensure that TLS server certificates are properly managed to avoid these issues.

The broad distribution of TLS server certificates across multiple groups and technologies within an enterprise requires that organizations establish formal management programs that include clear policies and responsibilities, a central Certificate Service, automation, and education. Successful implementation of a certificate management program relies on executive sponsorship, clear objectives, an action plan, and regular progress reviews.

1.1 Objective

The objective of this volume is to describe risks and challenges related to TLS server certificates and address those challenges by providing recommended best practices for large-scale TLS server certificate management. This document recommends that organizations establish a formal TLS certificate management program, and it enumerates elements that should be considered for inclusion in such a program. It is important to note that the best practices recommended in this guide are just that—recommendations.

1.2 Scope

The scope of this document is confined to recommendations regarding TLS server certificate management. TLS client certificate management is out of scope. This document is not intended to provide an extensive explanation of what TLS certificates and keys are or how they are used. Also, certificate management policies need to be considered within the context of an organization’s overall enterprise security policies.

It is also beyond the scope of this document to discuss the broader aspects of organizational policies and procedures [B1] with which TLS server certificate management should be consistent. For example, general recommendations regarding security policy, vulnerability management, incident response, disaster recovery, security testing, etc. that are not specifically related to certificate management are out of scope. Discussion of general security protections for certificate management system components is also beyond the scope of this document. This document assumes the security of these components is protected by recommended security best practices, e.g., patching, strong authentication, and access control that the organization has in place as part of its overall security policy.

An organization’s business operations may be internally or externally supported. For those organizations that have third parties supporting key business operations, those third parties may use TLS certificates. If a function is outsourced, the organization should ensure that its requirements are met by the third party performing the function. The TLS certificate management recommendations provided in this document can be applied to these third parties as well as to the organization itself.

In accordance with their security policies, some organizations may choose to perform inspection of internal traffic that has been encrypted using TLS, by intercepting and decrypting TLS traffic at the network edge or by performing passive decryption at locations deeper within the network. The question of whether to perform such inspection is complex, and it involves important tradeoffs between traffic security and traffic visibility that organizations should weigh carefully. It is beyond the scope of this document to advocate for or against TLS traffic inspection. Some organizations have determined that the security risks posed by inspection of internal TLS traffic are not worth the potential benefits of having visibility into the encrypted traffic. Other organizations, however, have determined that it is in their best interests to perform TLS traffic inspection. For those organizations that have a policy of performing TLS traffic inspection, this document provides recommended best practices regarding how to securely manage the TLS private keys required for this purpose.

The security and integrity of TLS relies on secure implementation and configuration of TLS servers and effective TLS server certificate management. Guidance regarding the implementation and configuration of TLS servers is outside the scope of this document. The secure implementation and configuration of TLS servers is addressed in NIST Special Publication (SP) 800-52 [B13]. Organizations should provide clear instruction to groups and individuals deploying TLS servers in their environments to read, understand, and follow the guidance provided in 800-52.

Lastly, the recommendations included in this document are generic. Each organization should determine for itself how to best apply these recommendations to its own enterprise. Volumes C and D of this Practice Guide describe a specific implementation used to demonstrate the application of these recommendations.

2 TLS Server Certificate Background

TLS [B5] is the security protocol used to authenticate and protect internet and internal network communications for a broad number of other protocols—including Hypertext Transfer Protocol (http) [B17] for web servers; Lightweight Directory Access Protocol (LDAP) [B18] for directory servers; and Simple Mail Transfer Protocol [B7], Post Office Protocol [B10], and Internet Message Access Protocol [B4] for email.

TLS server certificates serve as machine identities that enable clients to authenticate servers via cryptographic means. For example, when a bank customer connects across the internet to an online banking website, the customer’s browser (i.e., the TLS client) will present an error message if the server does not provide a valid certificate that matches the address the user entered in the browser. Further, TLS server certificates are used extensively inside corporate and government networks to establish trust between machines — servers, applications, devices, micro-services, etc. Most large enterprises have thousands of certificates, each identifying a specific server in their environment. (Note: Web browsers play the role of clients to web servers. As such, they contain functionality to automatically establish TLS connections on behalf of users, evaluate certificates received during the TLS handshake process, and present errors when unexpected certificate issues are encountered.) Figure 2-1 illustrates the pervasive use of certificates within organizations.

Figure 2‑1 TLS Certificates Are Broadly Used for Communications in Organizations

Figure showing an organization that has four tiers of interconnected TLS servers as well as TLS servers in the cloud, all of which have TLS certificates.

Each TLS server certificate contains the address of the server that it identifies (e.g., www.organization1.com) and a cryptographic key, called a public key, which is unique to the server and used by clients in securely authenticating the server (see Figure 2-2).

Figure 2‑2 Server Address, Public Key, and Issuer Information on Four of the Organization’s TLS Server Certificates

Figure depicts servers holding private keys corresponding to public keys in associated certificates.

As shown in Figure 2-3, each server holds a private key that corresponds to the public key in the certificate so each server can prove it is the holder of the certificate. While the certificate is shared with any client that connects to the server, it is critical that the private key is kept secure and secret so it cannot be obtained by an attacker and used to impersonate the server. However, common operational practices may increase the risk of private key disclosure. Many private keys used with TLS are stored in plaintext files on TLS servers. Alternatively, private keys can be stored in files encrypted with a password; however, the passwords are generally stored in plaintext configuration files so they are accessible by the TLS server software when it is started. These common practices make it possible for private keys to be viewed and copied by system administrators or malicious actors.

Figure 2‑3 Upon Connecting to the Server, the Client Receives the Server’s TLS Certificate, Which Includes the Server’s Public Key

Figure showing the server sending its certificate, which contains its public key, to clients so they can authenticate the server. The server keeps its private key secret.

In addition to users with browsers connecting to servers that have TLS server certificates, automated processes also connect as clients to TLS servers and must trust TLS server certificates. Examples of automated processes acting as TLS clients include a web server making requests to an application server, one cloud container connecting to another, or an Internet of Things (IoT) device connecting to a cloud service. (See Figure 2-4.)

Figure 2‑4 Browsers and Various Automated Processes (Web Servers, Containers, and IoT Devices) Connect as Clients to TLS Servers

Figure depicts browsers, web servers, containers, and IoT devices connected as clients to TLS servers.

2.1 Certificate Authorities

TLS server certificates are issued by entities called certificate authorities (CAs). CAs digitally sign certificates so that their authenticity can be validated — to prevent attackers from easily impersonating servers. Clients (e.g., browsers, devices, applications, services) validate certificates by using a CA’s certificate to verify the signature. Clients, such as browsers, are configured to trust specific CAs (called root CAs). This is done by installing a CA’s certificate, commonly called a root certificate, on the client.

Some CAs arrange for their root certificate to get installed by software manufacturers in their software (e.g., browser, application, or operating system) so the certificates issued by the CAs are trusted broadly. These CAs are commonly called public root CAs. (See Figure 2-5.)

Figure 2‑5 A Public Root CA’s Root Certificate Is Delivered to the User, Installed on a Software Vendor’s Software

Figure depicting a Public Root CA creating a self-signed root certificate, securely delivering this certificate to a software vendor, and the software vendor delivering software with the root certificate embedded in it to the user for installation on his machine

To protect them from attacks, root CAs are generally not connected to the internet and do not issue TLS server certificates directly. Root CAs certify other CAs, generally called intermediate or issuing CAs, which issue TLS server certificates. (See Figure 2-6.)

Figure 2‑6 A Root CA Issues a Certificate to an Intermediate/Issuing CA, Which Issues TLS Server Certificates

Figure 2-6 depicts a client connecting to a TLS server and th the server returning its certificate as well as the certificate for the CA that issued its certificate.

As shown in Figure 2-7, when a client, such as a browser, connects to a TLS server, the server will return its certificate as well as the certificate for the CA that issued its certificate (called the CA certificate chain).

Figure 2‑7 Upon Connecting to the Server, the Client Receives Both the Server’s TLS Certificate and Its CA Certificate Chain

Figure depicting a three-step process of (1) user's client initiating a TLS session with a server; (2) the server responding by returning its certificate and the CA chain; and (3) the client using the Root certificate and the CA chain to verify signatures, thereby validating the server certificate

Public CAs are regularly audited to ensure they operate in compliance with the CA/Browser Forum Baseline Requirements, which are standards intended to minimize the possibility of CA compromises and fraudulent certificates. When CAs have been found to violate the requirements, their root certificates have been removed from and distrusted by browsers, requiring customers of those CAs to rapidly replace their TLS server certificates.

There are three different types of certificates issued by public CAs (as specified by the CA/Browser Forum, which defines standards for public CAs), each with a different level of validation required by the CA to confirm the identity of the requester and its authority to receive a certificate for the domain in question:

  • Domain Validated (DV): The CA validates that the requester is the owner of the domain, by verifying that the requester can reply to an email address associated with the domain, has operational control of the website at the domain address, or is able to make modifications to the Domain Name System (DNS) [B8] record for the domain.

  • Organization Validated (OV): In addition to the checks for DV certificates, the CA conducts additional vetting of the requester’s organization.

  • Extended Validation (EV): EV certificates undergo the most rigorous checks, including verifying the identity and the legal, physical, and operational existence of the entity requesting the certificate, by using official records.

Organizations that wish to issue certificates to their internal TLS servers can establish their own CAs, commonly called internal CAs. Organizations using internal CAs must ensure that all clients connecting to their servers trust the internal CAs by installing the internal CAs’ root certificates on each system acting as a client (e.g., browsers, operating systems, applications, appliances).

2.2 Certificate Request and Installation Process

The following steps, shown in Figure 2‑8 and detailed below, are typically followed by a system administrator to get a TLS certificate for a server that he or she manages.

Figure 2‑8 Certificate Issuance Process

Figure depicting the nine steps a system administrator takes to get a TLS certificate for a server.

  1. The system administrator for the TLS server uses utilities on the server to generate a cryptographic key pair (a public key and a private key).

  2. The system administrator enters the address of the server (e.g., www.organization1.com). The utilities create a request for a certificate, called a certificate signing request (CSR), which contains the address of the server and the public key. The system administrator retrieves a copy of the CSR (which is contained in a file) from the server.

  3. The system administrator submits the CSR to the registration authority (RA), who acts as a reviewer and approver of the certificate request.

  4. The RA/approver reviews the CSR, performs necessary checks to confirm the validity of the request and the authority of the requester, and then sends an approval to the CA.

  5. The CA issues the certificate.

  6. The CA notifies the system administrator that the certificate is ready, either by emailing a copy of the certificate or providing a link from which it can be downloaded. The system administrator retrieves the server certificate.

  7. The system administrator retrieves the CA certificate chain from the CA.

  8. The system administrator installs the server certificate on the server.

  9. The system administrator installs the CA certificate chain on the server.

The CA certificate chain is used by TLS clients to validate the signature on the server certificate. When a client connects to a TLS server, the server returns its certificate and the CA certificate chain, which can contain one or more CA certificates. The client starts with one of its locally trusted root CA certificates and successively validates the signatures on certificates in the CA certificate chain until it reaches the server certificate.

The system administrator must note the expiration date in the certificate to ensure that a new certificate is requested and installed before the existing certificate expires.

3 TLS Server Certificate Risks

When TLS server certificates are not properly managed, organizations risk negative impacts to their revenue, customers, and reputation. There are four primary types of negative incidents that result from certificate mismanagement: outages to important business applications, caused by expired certificates; security breaches resulting from server impersonation; outages or security breaches resulting from a lack of crypto-agility; and increased vulnerability to attack via encrypted threats. (Note: While TLS server certificates enable confidentiality for legitimate communications, they can also allow attackers to hide their malicious activities within encrypted TLS connections. When a TLS server certificate is installed and enabled on a server, all users who connect (including attackers) can establish an encrypted connection to the server.)

3.1 Outages Caused by Expired Certificates

TLS server certificates contain an expiration date to ensure that the cryptographic keys are changed regularly; this reduces the impact of a security breach caused by a compromised private key. If a server certificate is not changed before its expiration date, then clients should generate an error message and stop the connection process to the server. This causes the application supported by the server with the expired certificate to become unavailable.

Application outages can also be caused by the mismanagement of CA certificate chains that results in expired intermediate CA certificates. The TLS server is responsible for providing the client with the intermediate CA certificates (CA certificate chain) necessary for the client to link the server’s end-entity certificate with the root CA certificate trusted by the client. The absence or expiration of an intermediate certificate means the client will not trust the server, even though the server may have a perfectly trustworthy end-entity certificate. Intermediate CA certificates are typically renewed every few years, and it is possible for a TLS server to fail to use the most current version. As a result, although the server certificate has been updated, the installed intermediate CA certificate may expire, resulting in an outage due to expiration. Such outages are often difficult to diagnose because the focus of investigation is typically on the server certificate, which is still valid and not the cause of the outage.

Nearly every enterprise has experienced an application outage due to an expired certificate, including outages to major applications such as online banking, stock trading, health records access, and flight operations. Organizations’ increased use of TLS server certificates to secure the organizations’ applications increases the likelihood of outages, because there are more certificates to track and more certificates per business application that can impact operations.

Various scenarios result in a certificate expiring while still in use, causing an outage, including these:

  • The system administrator forgets about the certificate.

  • The system administrator ignores notifications that the certificate will soon expire.

  • The system administrator does not properly install or update the CA certificate chain.

  • The system administrator is reassigned, and nobody else receives expiry notifications.

  • The system administrator enrolls for a new certificate but does not install it on the server(s) in time or installs it incorrectly.

  • The application relies on multiple load-balanced servers, and the certificate is not updated on all of them.

  • The certificate is installed on a backup system, but the certificate has expired before the backup system is brought online.

Troubleshooting an incident where an application is unavailable due to an expired certificate can be complex and often requires hours to discover the source of the problem. If the server on which an expired certificate is deployed is being accessed by people using browsers, then each of those people will receive an error message, making it clear that the cause of the issue is an expired certificate. If, on the other hand, the clients connecting to the server with the expired certificate are automated systems (e.g., the clients are web servers and the server with the expired certificate is an application server) then the web servers acting as clients will stop operations when they encounter the expired certificate. They may log an error message, but that message may not be immediately discovered in the log file, increasing the amount of time required to identify the root cause of the outage and fix it. If certificates that are deployed on backup systems are not updated when they expire, an outage can occur if operations are shifted to the backup systems.

3.2 Server Impersonation

An attacker may be able to impersonate a legitimate TLS server (e.g., a banking website) if the attacker is able to get a fraudulent certificate containing the address of the server and the attacker’s own public key by tricking a trusted CA into issuing the certificate to the attacker or by compromising the CA and issuing the certificate. A client connecting to the attacker’s server will accept the certificate because the certificate contains the address to which the client intended to connect and because the certificate has been issued by a trusted CA. Because the certificate contains the attacker’s public key (and the attacker also holds the private key corresponding to this public key), the attacker can decrypt the communications from the client (including passwords intended for login to the legitimate server). Alternatively, if the attacker can access a copy of the legitimate server’s private key, then the attacker can eavesdrop or impersonate that server by using the legitimate server’s certificate. To successfully perform these attacks, the attacker must redirect traffic destined for the legitimate server to a system that the attacker is operating (e.g., using Border Gateway Protocol [BGP] hijacking or DNS compromise). (Note: BGP [B16] is used to communicate optimal routes between internet service providers on the internet. It is possible for an attacker to hijack traffic by falsely advertising that the fastest route to one or more internet protocol [IP] addresses is via systems that the attacker is operating, thereby causing traffic to be rerouted through the attacker’s systems. The DNS provides translation between human‑readable addresses [e.g., www.company123.com] and IP addresses. If an attacker can compromise an organization’s DNS account, then the attacker can change the IP address to which traffic intended for that organization will be sent.)

Most private keys used on TLS servers are stored in files. The private keys are directly managed and handled by system administrators, who can make copies of the private keys. In addition, many TLS servers are clustered (for load balancing); in many cases, the same TLS server certificate and the private key will be copied to each server in the cluster. The manual handling and copying of private keys significantly increase the possibility of a key compromise and the confidentiality and data integrity consequences of key compromise (including but not limited to server impersonation).

3.3 Lack of Crypto-Agility

There are several types of incidents that have required organizations to replace [B2] large numbers of TLS certificates and private keys, including the following:

  • CA compromise: If a CA is breached by an attacker, then the attacker can cause that CA to issue fraudulent certificates. After the CA breach is discovered and forensics are performed, it may be concluded that certificates issued by the CA cannot be trusted and that new certificates must be installed on all servers with certificates from the compromised CA.

  • Vulnerable algorithm: Cryptographic algorithms are constantly evaluated for vulnerabilities, by parties with both positive and negative intent. When an algorithm is found to be vulnerable (e.g., Secure Hash Algorithm 1 (SHA-1) [B6] for signature generation), TLS server certificates that are dependent on the algorithm must be replaced. Ongoing advancements in quantum computing require that organizations establish the ability to rapidly replace all existing certificates and keys and be prepared for implementation of post-quantum algorithms.

  • Cryptographic library bug: Because cryptographic operations are quite complex, a few groups have specialized in developing cryptographic libraries that are used by TLS servers and other systems. If a bug is found with the key-generation functions of a cryptographic library, then all keys generated since the bug was introduced must be replaced. (Note: In 2008, a key-generation bug in the cryptographic libraries in Debian Linux was discovered. That bug was introduced in 2006. In 2017, a key-generation bug was discovered in the Infineon cryptographic libraries used in smart cards and trusted platform module chips.)

Most enterprises are not prepared to respond to the large-scale cryptographic failure that results from these types of incidents. Many organizations do not have comprehensive inventories of their TLS server certificates. In addition, they cannot contact the certificate owners, because they do not have up-to-date information about the certificate owners responsible for each certificate. Finally, many organizations rely on manual processes to manage certificates and do not have processes for tracking the progress in replacing large numbers of certificates — leaving the organizations to guess how many systems have been updated. All these factors can result in organizations requiring several weeks or months to replace all affected certificates, during which time business applications can be unavailable or vulnerable to security breaches.

3.4 Encrypted Threats

Many organizations are working to encrypt all communications by using TLS server certificates to prevent interception of plaintext credentials and eavesdropping on communications. While TLS server certificates enable confidentiality for legitimate communications, they can also allow attackers to hide their malicious activities within encrypted TLS connections. When a TLS server certificate is installed and enabled on a server, all users who connect (including attackers) can establish an encrypted connection to the server. An attacker who establishes an encrypted connection can then begin to probe the server for vulnerabilities within that encrypted connection.

The following steps, shown in Figure 3-1 and detailed below, describe how an attacker can leverage encrypted connections in his or her attacks.

Figure 3‑1 How an Attacker Leverages Encrypted Connections to Hide Attacks

Figure depicting four steps an attacker could take to leverage encrypted connections to hide an attack: (1) exploit a vulnerability, (2) install a web shell, (3) send command and control messages over TLS connections and pivot to attack other systems, and (4) exfiltrate data

  1. The attacker begins by connecting to a server and establishing an encrypted TLS session. Within that encrypted session, the attacker can probe for vulnerabilities that exist on the server and its software.

  2. If the attacker discovers a vulnerability and sufficiently elevates his or her privileges, then the attacker can load malware, generally called a “web shell,” onto the server.

  3. With this web shell loaded, the attacker can send commands over TLS connections (i.e., encrypted connections facilitated by the server’s certificate). The attacker can then work to pivot to other systems by probing for vulnerabilities in servers accessible from the compromised system. The increased use of encryption enables an attacker who has compromised one system to pivot and attack other systems via encrypted connections, without being detected.

  4. Once the attacker has successfully reached data that he or she desires, the attacker is able to use the web shell to exfiltrate data. Because the attacker is establishing TLS connections by using the server’s certificate to connect to the web shell, all the exfiltrated data is encrypted while in transit.

As stated in Section 1.2, in accordance with their security policies, some organizations may choose to perform inspection of internal traffic that has been encrypted using TLS. The question of whether to perform such inspection is complex, and it involves important tradeoffs between traffic security and traffic visibility that each organization should weigh for itself.

Some organizations are concerned about the risk posed by attackers who leverage encrypted connections to hide their attacks, as illustrated in Figure 3-1 above. If these attackers gain access to trusted internal systems via malware or some other exploit, they may be able to move about the network without being detected by hiding their traffic within TLS connections. Organizations that are concerned about these risks want the option of decrypting internal TLS traffic so it can be inspected. Such inspection may be used not only for intrusion and malware detection, but also for troubleshooting, fraud detection, forensics, and performance monitoring. These organizations have concluded that the visibility into their internal traffic that can be provided by TLS inspection is worth the tradeoff of the weaker encryption and other risks that come with such inspection. For these organization, TLS inspection may be considered standard practice and may represent a critical component of their threat detection and service assurance strategies. Some of these organizations have complex networks that are several tiers deep, so it would not be realistic to expect them to be able to manage the movement of keys required to perform such inspection securely using purely manual processes. For those organizations that have a policy to perform inspection of TLS traffic, this document provides recommendations regarding how to securely move the TLS private keys needed for this inspection.

On the other hand, inspection creates a single location where traffic may be decrypted, creating an attractive target for hackers. It also may have compliance implications if sensitive data is being decrypted. An organization that performs decryption on border devices or that performs passive internal decryption runs the risk of such devices being taken over by a malicious attacker who would then have access to private keys and traffic. In addition, passive decryption requires the use of static key exchange, which results in weaker encryption than can be achieved when using ephemeral key exchange methods. If an attacker captures a server’s private key and that key was negotiated using static key exchange, the attacker will also be able to decrypt traffic that had been captured in the past. If, instead, that key was negotiated using an ephemeral key exchange method, the key will provide forward secrecy, meaning the attacker will not be able to decrypt past traffic. For some organizations, the reduced security of performing inspection or using static keys is unacceptable. These organizations have determined that the security risks posed by inspection of internal TLS traffic are not worth the potential benefits of having visibility into the encrypted traffic. These organizations should have a policy against performing TLS inspection. As an alternative to inspection, they may choose to perform traffic analysis to try to detect illegitimate internal TLS traffic. None of the discussion or recommendations in this document are intended to mandate or encourage an organization to begin performing TLS inspection of its traffic if that organization has determined that the risks of TLS inspection are not worth the benefits.

An organization that has a policy to perform inspection of TLS traffic so it can monitor and detect malicious activity has several methods it can use to gain visibility into encrypted communications. Some examples are listed below and are illustrated in Figure 3-2:

  • placing a threat detection system that acts as a reverse proxy in front of servers

  • installing end point software on each server to monitor communications

  • passively decrypting communications

Figure 3‑2 Methods for Gaining Visibility into Encrypted Communications

Figure depicts proxy, end point software, and passive decryption methods for gaining visibility into encrypted communications.

The use of threat detection proxies is ideal at the perimeters of organizations for monitoring inbound internet communications for attacks. The threat detection proxy is connected in-line, requiring all inbound traffic to pass through it before moving on to the next device. The threat detection proxy terminates the TLS connection. It decrypts and examines incoming traffic. If the traffic is determined to be malicious, the proxy drops it. Because the threat detection proxy is terminating all TLS connections, it must have a certificate for each server to which clients are attempting to connect. After the threat detection proxy decrypts and examines the traffic, it can establish a TLS session with the appropriate server behind it and send the traffic to that server in an encrypted TLS session.

While a threat detection proxy is ideal for use at the perimeter of an organization, many organizations also want to inspect their internal TLS traffic. Many enterprise applications include multiple tiers of servers and services (e.g., load balancers, web servers, application servers, databases, identity services) that communicate with each other internally via encrypted TLS sessions, making it impractical to place threat detection proxies between all systems on internal networks.

End point software can be installed on each server to monitor communications, alleviating the need to install proxies, but may impose additional processing requirements on servers that are already under a high load. In addition, because of the diversity of TLS server systems, it may be difficult to find an end point solution that operates on all platforms and provides comprehensive and consistent visibility and monitoring of all communications.

Passive, out-of-band decryption and threat analysis are performed by using devices that decrypt TLS‑encrypted communications but that do not terminate TLS connections. The TLS connection is established between the client and the server. The passive decryption device listens to the TLS traffic without affecting it and decrypts it. Threat analysis is performed either by the passive decryption device or via other systems to which decrypted traffic is forwarded. Security-focused passive decryption devices can detect malicious traffic that has been sent on TLS connections, but these devices do not react in real time to block this traffic. Passive decryption does not require a change in network architecture or loading additional software on TLS servers. However, passive decryption poses a TLS server certificate management challenge, because private keys must be copied to decryption devices from each TLS server whose communications will be monitored. The transfer of private keys must be done securely to avoid a key compromise and rapidly to avoid blind spots in monitoring for attacks. Automation can significantly aid in securely transferring private keys from TLS servers to the decryption device and keeping keys up-to-date when certificates are replaced.

4 Organizational Challenges

Despite the mission-critical nature of TLS server certificates, many organizations do not have clear policies, processes, and roles and responsibilities defined to ensure effective certificate management. Moreover, many organizations do not leverage available technology and automation to effectively manage the large and growing number of TLS server certificates. As a result, many organizations continue to experience significant incidents related to TLS server certificates.

As illustrated by Figure 4-1, the management of TLS server certificates is challenging due to the broad distribution of certificates across enterprise environments and groups, the complex processes needed to manage certificates, the multiple roles involved in certificate management and issuance, and the speed at which new TLS servers are being deployed. TLS server certificates are typically issued by a Certificate Services team (often called the public key infrastructure team). However, the certificates are commonly installed and managed by the certificate owners — the groups and the system administrators responsible for individual web servers, application servers, network appliances, and other devices for which certificates are used.

Figure 4‑1 TLS Certificates Are Distributed Broadly Across Enterprise Environments and Groups

Figure depicting a variety servers distributed throughout an enterprise and organized into groups that represent various lines of business; each server has its own TLS certificate

4.1 Certificate Owners

The term “certificate owner” is used to denote a group responsible for systems where certificates are deployed. Typically, there are several roles within a certificate owner group, including executives who have ultimate accountability for ensuring that certificate-related responsibilities are addressed, system administrators who are responsible for managing individual systems and the certificates on them, and application owners who can review and approve certificate requests from system administrators to ensure that only authorized certificates are issued. The certificate owners typically are not knowledgeable about the risks associated with certificates or the best practices for effectively managing certificates.

With the advent of virtualization, the development and operations (DevOps) teams provision systems and software through programmatic means. This introduces a new type of certificate owner and new TLS server certificate challenges for organizations. As organizations push for more rapid and efficient deployment of business applications, many DevOps teams deploy certificates without coordination with the Certificate Services team. This can result in certificates for mission-critical applications not being tracked. This can be particularly problematic if bugs in DevOps programs/scripts cause certificates to be improperly deployed or updated. In addition, as DevOps teams adopt newer frameworks and tools, it is important to continue to monitor certificates and applications deployed and maintained by older DevOps frameworks and tools.

4.2 Certificate Services Team

The Certificate Services team is typically the group that has been given responsibility for managing relationships with public CAs and for the internal CAs. The Certificate Services team typically comprises one to three people. Though the team members have good knowledge and expertise about TLS server certificates, they do not have the resources or access required to directly manage certificates on the extensive number of systems where certificates are deployed. However, the Certificate Services team is often blamed when TLS certificate incidents, such as outages, occur.

6 Implementing a Successful Program

The broad distribution of TLS server certificates across distinct groups, networks, and systems can present unique challenges in implementing an effective certificate management program across an enterprise environment. The following resources are helpful for successful implementation:

  • Executive owner: It is essential to have an executive owner for the certificate management program. This executive owner should be prepared to educate the executives of each group of certificate owners on TLS server certificate risks and the executives’ responsibilities.

  • Prioritization of risks: Each organization has different challenges and priorities related to TLS server certificates. Although the best practices detailed in this practice guide are intended to help address all the risks related to TLS server certificates, it is helpful to prioritize those risks based on historical certificate issues and business needs. This prioritization can help in communications with certificate owners and with setting objectives and prioritizing tasks.

  • Objectives: Establishing clear and achievable objectives provides targets, helps focus efforts, and improves the likelihood of successful implementation. For example, if an organization finds it does not have an inventory and recognizes there are two groups that may be difficult to inventory in the near term, then one objective may be to create an inventory of all other groups’ TLS server certificates in the next 12 months.

  • Action plan: An action plan with specific tasks, responsibilities, and milestones, geared to achieve the objectives, should be created, communicated, and reviewed by all stakeholders (e.g., certificate owners, Certificate Services team, executive owner). The action plan should be prioritized to address the most important objectives first. For example, an action plan might include the following objectives:

    • 30 days from the start of the project:

      • complete certificate imports from CA1, CA2, and CA3

      • require certificate enrollment through the central Certificate Service portal and prevent enrollment directly to CAs

    • 90 days from the start of the project:

      • complete network discovery across all North American and European data centers

      • complete the assignment of certificate owners for all certificates in inventory

    • 180 days from the start of the project:

      • automate certificate enrollment and installation on all load balancers

      • automate certificate enrollment and installation for all e-commerce web servers

      • complete network discovery across all Asia-Pacific data centers

  • Regular executive reviews: The objectives and action plan should be reviewed with the executive owner at commencement of the project, and regular reviews should be scheduled (e.g., every 90 days) to track progress. During these reviews, the executive owner should note areas where additional action by certificate owners is needed so the executive owner can proactively communicate with peer executives to ensure action is taken

  • Periodic audits: Due to the critical role that TLS server certificates play in the security and operations of organizations, and the risks resulting from improper management, regular audits should confirm the Certificate Services team and certificate owners are fulfilling their responsibilities in TLS server certificate management.

Security testing should be defined as part of the organization’s policies. Before going live with any recommendations in this document, authorization from the security team should be provided, as specified by security policy.