Enabling deterministic masking
Published 07 June 2024
Deterministic masking ensures that the same input value always produces the same masked output value. This can be useful when you need to maintain consistency across multiple databases, tables and columns.
Enabling Deterministic Masking
By default, some datasets in Anonymize are masked deterministically (see Default classifications and datasets for more information).
To enable or disable deterministic masking for a specific column, you can set the deterministic
property in the masking file:
"tables": [ { "schema": "Person", "name": "Address", "columns": [ { "name": "FirstName", "dataset": "GivenNames", "deterministic": true, "maxLength": 50 } ] } ]
In this example, the FirstName
column will be masked deterministically using the GivenNames
dataset.
Deterministic Seed
To control the output of deterministic masking across multiple runs of Anonymize, you can provide a seed value using the --deterministic-seed
command-line option.
The seed is used to ensure that the same input values will always produce the same masked output values when the same seed is used.
Example usage:
rganonymize mask --deterministic-seed "my-secret-seed"
Seed Requirements
The deterministic seed must meet the following requirements:
- It must be at least 4 characters long
- It cannot consist of a single repeated character (e.g., "111111")
- It cannot be an empty GUID (e.g., "00000000-0000-0000-0000-000000000000")
Security Considerations
By default Anonymize uses random and single use seeds, and does not store them. It is possible, where necessary, to provide your own seed. The deterministic seed could be used to reverse engineer masked data back to its original values if an attacker gains access to both the seed and the masked data. Therefore, it is crucial to treat the seed as a sensitive secret and store it securely, such as in a key vault or secret management system.
Avoid sharing the seed widely or including it in easily accessible locations like source code repositories or configuration files. Limit access to the seed to only those individuals who absolutely require it.
Remember, the security of your deterministic masking depends on your ability to keep the seed secret. Always follow best practices for managing sensitive information and consult with your organization's security team for guidance on securely storing and handling the deterministic seed. And if you don't need to retain the seed, don't. Use a random string and discard it after use.