Further advice on defining a taxonomy
Published 07 March 2019
Who else is involved?
Different groups in the company may have an interest in Data Catalog, and they may have different needs. Typically, we find that the following groups may be involved: information security team, database administrators, application developers, CTO, CSO, non-technical people interested in what data is stored within the organization and how it is protected.
What do you ultimately want to achieve?
To share information across functions
To support and evidence technical policy (whether formally defined or more pragmatic), such as which columns to mask in dev/test copies, which databases to protect with TDE
To guide remediation work; where are the priorities for security access reviews?
How do people in your organization want to view the SQL data estate?
What questions do you think they’ll ask? Will some want to know ‘where is the data behind system x or application y?’, ‘what systems is this data exposed in?’, ‘who looks after this database?’ ‘which data is externally accessible?’, ‘where did this come from?’
We have found a custom taxonomy is a key step for most companies looking to achieve data classification.
As a guide, the most common additions to the taxonomy we have seen are:
- IT Ops
- Web Team
Treatment (or Intended Protection, or Protection Measures. Sometimes the current and intended or recommended protection measure), e.g.
- Static data masking
- Encryption (TDE)
- Encryption (Always encrypted)
- Row-level security
- Dynamic data masking
Systems used (normally a multi-value tag category). Which applications or services make use of this data, e.g.
- Settlement Services
Pick an approach
There is a split between those who opt for completeness – specify a classification label of some sort for every column in a given database / schema (even if it’s just ‘Out of Scope’), and those who prioritize seeking out the highest risk columns.
You can start with reviewing suggestions first.
Then, you can start using the API consumed with PowerShell, there are also examples in the #automation Slack channel. You can start to apply simple rules, such as:
De-scope empty tables
De-scope key columns (unless you’re using ‘natural’ keys, such as SSN)
Programatically set a default across a whole database (e.g. Owner = ‘Marketing’ for CRM system)