SQL Data Catalog

Further advice on defining a taxonomy

Who else is involved?

Different groups in the company may have an interest in Data Catalog, and they may have different needs. Typically, we find that the following groups may be involved: information security team, database administrators, application developers, CTO, CSO, non-technical people interested in what data is stored within the organization and how it is protected.

What do you ultimately want to achieve?

  1. To share information across functions

  2. To support and evidence technical policy (whether formally defined or more pragmatic), such as which columns to mask in dev/test copies, which databases to protect with TDE

  3. To guide remediation work; where are the priorities for security access reviews?

How do people in your organization want to view the SQL data estate?

What questions do you think they’ll ask? Will some want to know ‘where is the data behind system x or application y?’, ‘what systems is this data exposed in?’, ‘who looks after this database?’ ‘which data is externally accessible?’, ‘where did this come from?’


Custom taxonomy

We have found a custom taxonomy is a key step for most companies looking to achieve data classification.

As a guide, the most common additions to the taxonomy we have seen are:

  1. Ownership, e.g.

    • -  HR

    • -  IT Ops

    • -  Web Team

  2. Treatment (or Intended Protection, or Protection Measures. Sometimes the current and intended or recommended protection measure), e.g.

    • -  Static data masking

    • -  Encryption (TDE)

    • -  Encryption (Always encrypted)

    • -  Row-level security

    • -  Dynamic data masking

  3. Systems used (normally a multi-value tag category). Which applications or services make use of this data, e.g.

    • -  Procurement

    • -  TradeOrders

    • -  WebSales

    • -  Settlement Services


Pick an approach

There is a split between those who opt for completeness – specify a classification label of some sort for every column in a given database / schema (even if it’s just ‘Out of Scope’), and those who prioritize seeking out the highest risk columns.

You can start with reviewing suggestions first.

Then, you can start using the API consumed with PowerShell, there are also examples in the #automation Slack channel. You can start to apply simple rules, such as:

  1. De-scope empty tables

  2. De-scope key columns (unless you’re using ‘natural’ keys, such as SSN)

  3. Programatically set a default across a whole database (e.g. Owner = ‘Marketing’ for CRM system)


Didn't find what you were looking for?