GDPR Compliance in Elasticsearch

Managing data in Elasticsearch in compliance with European regulation GDPR

May 30, 2020

Achieving GDPR Compliance with ELK Stack

 
The acronym ELK stands for three open-source projects—Elasticsearch, Logstash, and Kibana—and is a common term among software developers. Elasticsearch is a program that works as a search and analytics engine. Logstash is a tool used to manage events and logs. It collects data from multiple sources and converts it before sending to your preferred “stash.” Kibana lets users analyze data with charts and graphs while using Elasticsearch.
The EU’s General Data Protection Regulation (also known as GDPR) is a law that was put into place to protect the privacy of citizens of the EU and the European Economic Area (EEA). Its primary objective is to give individuals control over their private and personal data, while also regulating the environment for international businesses by unifying regulations within the EU. The GDPR defines personal data as any information that is related to an identified or identifiable natural person.
In this article, we'll cover how you can ensure that your ELK Stack, recently rebranded as Elastic Stack, is GDPR compliant in three stages: Prepare, Protect, and Privacy Processes.

Using the Elastic Stack in the Prepare Stage

In this stage, an organization must identify all data flows where personal data is controlled or processed. Elastic Stack can be used to assist in GDPR readiness in the following ways:

Data Flow Mapping

Mapping data flow is usually the first step in GDPR preparation and is essential if an organization’s GDPR initiative is to be effective. Information about data flow should be indexed into Elasticsearch, which is fast and has full-text search capabilities for quick identification of queries, reports, tables, and applications that depend on personal data.

Personal Data Retention Planning

In the Prepare stage, an organization also needs to decide how long it will store collected personal data. GDPR specifies how long this data should be stored and also states that GDPR-compliant organizations are required to delete personal data that is no longer in use, or when a data subject (person) withdraws consent.
Elasticsearch’s use of index management makes data retention management easy. Elasticsearch also supports time-based indices that are easily deleted when the retention period runs out. This operation can also be automated via Crond, Ansible, Chef, or Puppet.

Vendor and Sub-Processor Review

Today, supply chains in many domains can be comprised of thousands of vendors and sub-processors. GDPR requires a separate data processing agreement with each supply chain partner that accesses or processes an organization’s personal data. The data from these agreements can be stored and indexed in Elasticsearch so that rapid full-text searches can be performed across what may be thousands of agreements, facilitating real-time vendor status reports.

Using the Elastic Stack in the Protect Stage

In this stage, Elastic Stack can help with GDPR readiness in various ways. These include utilizing security measures to protect the Elastic Stack when used as the main data store for personal data and acting as a security and analytics platform if personal data is stored in a different data store.
The entire Elastic Stack is considered a personal data store since log data can also contain personal data. With the help of commercial extensions like ReadonlyREST, the Elastic Stack can be configured to meet GDPR data security requirements through several approaches, including:

Access Control

Authentication measures are put in place to prevent unauthorized access to personal data that is stored in an Elasticsearch cluster. This simply means that a user’s identity is verified to ensure they are who they claim to be. Elasticsearch’s security measures include an authentication mechanism that enables the password protection of a particular cluster. These security measures can also integrate with other external authentication mechanisms like Active Directory, LDAP, or PKI, and can work with those mechanisms to provide user authentication. These authentication connectors are not part of the free edition of Elastic Stack; ReadonlyREST is the only solution that offers these connectors at no charge.
Another security feature of the ReadonlyREST plugin for Elasticsearch is IP-based filtering. This makes it easy to whitelist or blacklist specified IP addresses and to control network-level access to the Elasticsearch cluster.

Logging and Auditing

An Elastic Stack helps organizations maintain audit trails by auditing security events. ReadonlyREST for Elasticsearch can produce an audit log, which makes it easy to see who is accessing a cluster and what that person is doing. Access patterns and failed attempts to access a cluster can be analyzed to gain insights about attempted attacks and possible data breaches.
Elastic Stack can also be used as a security analytics solution if another primary store is utilized to store personal data. In this case, Elasticsearch is used as a centralized logging platform to manage security-related logs for the whole organization’s infrastructure and application base.

Monitoring and Threat Detection

Monitoring is a free feature in the Elastic Stack, which applies basic data security principles to implement tasks like overseeing data store health and log continuity (i.e., ensuring that the flow of logs from data stores and other infrastructures is uninterrupted). It also helps detect malicious or suspicious activity within the environment. Elastic Stack monitoring helps administrators closely track the health of the Elasticsearch cluster, enable automated monitoring of log continuity, and send out alert notifications when an interrupt or failure occurs.
Users who install ReadonlyREST as a security plugin can continue to use Elastic Stack’s monitoring capabilities as before. However, they will also benefit from ReadonlyREST’s advanced, highly customizable, and security-focused audit logging collector.

Resilience and Disaster Recovery

Elasticsearch is primarily designed to be a distributed data store and search engine. It is capable of handling extremely high event rates while automatically managing the distribution of indices and queries across the cluster for smooth operations. Its architecture includes replicas of index components (shards) to add resilience and failover. Elastic Stack’s Snapshots and Restore functions are built in to assist with backups.

Cryptography and Pseudonymization

Multiple levels of protection are put in place to ensure data is not lost, destroyed, or accessed by unauthorized users. Elasticsearch can be deployed on systems with enabled system-wide, disk-based encryption at rest. This reduces the chances of an unauthorized person accessing the underlying file system to read cleartext personal data.
Pseudonymization is when personal data is processed to ensure that it can no longer be attributed to a specific data subject without the use of additional information. This is one of GDPR’s principles for securing personal data. Logstash’s Fingerprint plugin offers a set of capabilities that can be used to implement pseudonymization.
  • Encryption: You can encrypt data flows between Elasticsearch, Logstash, Beats, and Kibana. In addition, ReadonlyREST transforms the Elasticsearch API from HTTP to HTTPS.
  • Authentication: ReadonlyREST supports a wide array of authentication protocols, such as basic auth, JWT, LDAP, API Key, and proxy auth.
  • Authorization: Permissions to access indices and other actions may be assigned to either individual users or groups. Groups of users can be defined in the Users section of the ACL document itself, or pulled from external systems like LDAP, SAML, JWT claims, or custom JSON API services.
  • Access Control: Written in easy-to-understand YAML, you can define an ACL (access control list) using logic as simple or as complex as the task requires. Each ReadonlyREST ACL block contains rules that grant permissions to access resources (indices, actions, fields, and so on).
Notably, the “filter” rule enables document-level security, letting you filter the results of a READ request using a boolean query written in Elasticsearch Query DSL. The “fields” rule offers field-level security by allowing you to selectively omit specific fields from documents returned as a search response to certain users.
Once a ReadonlyREST ACL is defined, you can associate it with a user or a group.
  • Audit Logs: To keep tabs on the cluster activity, access events are written into a series of indices named by default readonlyrest_audit-YYYY-MM-DD. Access requests can be logged to file or index (or both).
Overall, our open-source Elasticsearch plugin has an efficient and simple architecture. It is easy to configure, which makes it less error-prone and, by extension, more secure. Additionally, it comes with free enterprise features, such as LDAP integration. Our plugin preceded X-Pack and is currently utilized by many banks and top tech companies, often in conjunction with our commercial Pro and Enterprise Kibana plugins that offer additional features such as secure login/logout and various levels of Kibana UX customization.

Using the Elastic Stack in the Privacy Processes Stage

Elasticsearch enables quick identification of a data subject’s personal data in tables, queries, reports, or applications. Features like “Delete by Query API” and “Update by Query API” let teams take appropriate action to satisfy GDPR’s requirements regarding upholding data subject rights.

Conclusion

You can make an Elastic Stack GDPR compliant in three stages: Prepare, Protect, and Privacy Processes. Here is a summary of what to do in each of those stages:
  • Prepare: data flow mapping, personal data retention planning, and vendor sub-processor review
  • Protect: data protection, access controls, logging and auditing, monitoring and threat detection, resilience and disaster recovery, and cryptography and pseudonymization
  • Privacy Processes: upholding data subject rights
As we discussed, Elasticsearch technology can be used to ensure that data management processes are suitable for long-term use. Using Elasticsearch as a data store for personal data can provide a strong starting point for organizations looking to build a data store that is GDPR compliant. Taking this a step further, enterprise tools like those we offer at ReadonlyREST can simplify GDPR compliance tasks, allowing you to stop worrying about compliance and focus instead on running your business.