Compliance is all about adhering to regulations established by governments and industries. The main focus of those regulations as they pertain to IT professionals can be summed up in one word: privacy.
Most of our compliance efforts are directed at protecting the privacy of personal identifiable data coming through our networks, stored on our systems, or processed by our organizations in the course of doing business.
In August, I wrote about how data privacy is the heart and soul of your IT compliance strategy, and the first step in protecting the privacy of data is accurate and efficient identification and classification of that data. This time, we’ll delve deeper into that aspect and why and how you should implement a good data ID/classification system.
Laws and industry regulations that mandate privacy protection generally lay out specific types of data that must be protected. Some regulations define different or stricter protections for some types of data than for others. For example, we’ll look at the EU’s General Data Protection Regulation (GDPR) since it’s one of the broadest of all compliance laws, affecting organizations of all sizes, in all industries, throughout all parts of the world.
The GDPR devotes Article 9 to the processing of special categories of personal data. Whereas the regulation requires many standards to be met in processing all personal identifiable data, this article specifically prohibits the processing of the following types of data except in the specific circumstances listed. These data types include information pertaining to:
- racial or ethnic origin
- political opinions
- religious or philosophical beliefs
- trade union membership
- genetic data
- biometric data for the purpose of uniquely identifying a natural person
- data concerning health
- data concerning a natural person’s sex life or sexual orientation
You can find out more about the processing of these special categories in the full text of Article 9 of the GDPR. Another, separate category of personal data that must be treated differently under the GDPR is that of personal data relating to criminal convictions and offences. This is addressed in Article 10.
This means you not only need to be able to identify which data on your network is personal data, but you also must be able to classify it to identify these subclasses so you can apply the proper controls to them.
Setting up an efficient process for identifying and classifying your data can be a complex undertaking, but it’s made easier if you follow a standard protocol:
Define objectives. The first step is to define and prioritize the objectives of your data classification strategy, and document them in a written policy. Whereas our focus in this article is data classification for the purpose of applying mandated privacy protections and responding to data subject requests, secondary objectives may be to make data easier to search and find for other purposes, such as user access or ediscovery, or to make better use of storage space.
Inventory your data. Once you’ve determined your objectives and priorities, the next step is to determine what types of data you have and where it’s located, and what security measures are currently in place to protect it. For the purpose of privacy compliance, this includes backups and copies of the data as well as the primary data sources. There are data discovery solutions that can automate this process.
Determine your data categories. What are the categories into which you’ll place data? For compliance purposes, you’ll probably want to create categories based on the sensitivity level of the data, e.g. personal data, sensitive personal data, etc. You can also classify based on impact levels (would the impact of a breach of this data be high, moderate, or low?)
Some organizations that deal with many different types of data will want to further classify data into additional categories such as attorney/client privilege, health/medical, financial, law enforcement/criminal information, proprietary intellectual property, student education information, and so forth. There are various regulations that apply individually to some of these different categories.
Consider the storage architecture. Since different data categories may have different levels of security, you need to think about how to design a tiered storage architecture for efficiently applying those security measures. Is some or all of your data stored in the cloud? Appropriate controls can and should be applied regardless of the physical location of the files.
Organize and label data. Data classification and labeling can be done manually by employees who create and/or own the data, and/or by automation tools that classify data based on the content and/or context (location, application, how and by whom it’s used, etc.).
Determine who will have access. Next you need to figure out which users will have access to which data categories. This should be included in your data protection policy. Role based access controls can help you enforce these policies based on who actually needs access in order to do their jobs.
Develop training. Data classification policies should be understood by all users who will work with the data. Formal training in responsibilities for labeling user-created data, how to determine what classification data fits into, data sharing policies, and the consequences of compromising personal data or sensitive personal data should be required for employees, independent contractors, and others who will create or access data.
Evaluate and fine-tune. When you have all your policies and controls in place, it’s time for a thorough evaluation of the effectiveness. You can then identify weak areas and tweak the plan to fill any gaps.
One of the most common mistakes organizations make in implementing a data classification program is failing to consider all of the data processed and stored on their networks. Don’t forget data in email, log files, graphics and video files, chat messages, PDFs, on the hard drives of individual client systems, and all sorts of unstructured data (data that isn’t stored in traditional relational databases).
Structured data is easily understood by computers and automated classification systems work well with it. Humans communicate via unstructured data and are better able to understand and accurately classify it. Sensitive information can be in the form of either structured or unstructured data. Unstructured data is more likely to fly under the radar because it isn’t automatically identified, labeled, and secured.
Another common mistake is not knowing where your sensitive data resides. You can’t properly protect it if you don’t know where it is, and in today’s hybrid datacenter/cloud environment, it’s easy to lose track of where all of the data is located.
Beware of making your data classification system overly complex. Too many classification categories with only subtle differences between them will confuse data owners and users and lead to mistakes in classification. Detailed descriptions of each category should be part of your written policy so there are guidelines to ensure consistency and nobody has to “wing it.”
Finally, don’t attempt to do it all, all at once. If you are just now implementing a data classification system, there’s no need to approach it in a monolithic fashion. Not all data needs to be classified immediately. Set priorities and focus on protecting the important information – for compliance purposes, that’s usually personal identifiable data of various types.
Take it one step at a time, and having a good data classification program will make it easier for you to comply with regulatory requirements and pass those audits with flying colors.