Data masking software, also known as “data obfuscation” or “data sanitization” software, protects sensitive data against unauthorized viewing and exfiltration. It does this by replacing the values of the original, sensitive dataset, with non-sensitive “dummy” data that’s structurally and functionally similar to the original dataset. This enables organizations to maintain the authenticity and structure of their data so they can still use it, whilst ensuring its confidentiality.
Data masking can be used for software development and testing, sharing data with third parties, sales demos, user training, and even just when uploading data to the cloud. It can also help organizations to achieve compliance with data privacy and protection regulations.
In order to do this, data masking software offers a variety of features, including data discovery and classification, real-time, automatic masking using multiple masking methods, compatibility with multiple data types, and masking policy management. The features you need will largely depend on the type and scale of the data you need to protect, your budget, and the level of security you’d like to implement.
To help you find the right solution for your business, we’ve put together a guide to the best data masking software currently on the market, including an overview of the key use cases and features of each solution.
Broadcom Test Data Manager is a comprehensive solution designed to efficiently locate, secure, design, create, and provision test data for various organizations. The platform enhances the quality of production data by filling gaps in test data coverage and ensuring that teams receive the right data at the right time. This synthetic data creation can help accelerate the delivery of quality software, as well as help organizations achieve compliance.
Test Data Manager’s Discovery and Profiling feature enables the identification of personally identifiable information (PII) across multiple data sources. A heat map is displayed, classifying PII data according to severity levels. Test data engineers and compliance officers can subsequently review and tag the data, as well as generate detailed reports in PDF format to demonstrate compliance. Broadcom Test Data Manager also offers synthetic test data generation, which combines powerful synthetic data generation with sophisticated coverage analysis. This creates a minimal data set needed for comprehensive testing and allows for the creation of future scenarios and unexpected results to test boundary conditions.
The platform also allows users to centrally store data as reusable assets and create virtual copies of test data on-demand for individual testers. This approach reduces data volumes, test durations, and costs, ultimately enabling development and test teams to deliver better applications to the market faster and more cost-effectively.
Delphix Data Masking is a solution that identifies sensitive data, such as names, email addresses, and credit card numbers across various data sources including relational databases and files. The software offers over 50 out-of-the-box profile sets that cover 30 types of sensitive data and allows users to define custom profiling expressions. With Delphix, sensitive data can be automatically masked with no programming required, producing realistic values while retaining referential integrity within and across sources. Customizable algorithm frameworks and the ability to define new algorithms ensure masked data remains effective for development, testing, and analysis purposes.
Delphix Data Masking helps organizations comply with privacy regulations like GDPR, CCPA, and HIPAA, as well as enabling them to implement reversible or irreversible masking based on internal standards. Delphix also provides tokenization capabilities for obfuscating sensitive data for analysis or processing and maintaining referential integrity, ensuring consistent masking across different tables and databases. The software facilitates a consistent masking policy across non-production environments, where the majority of sensitive data copies reside. In addition, it seamlessly integrates data masking with data virtualization, allowing teams to promptly deliver masked virtual data copies on-premises or in multi-cloud environments.
IBM InfoSphere Optim Data Privacy is a comprehensive solution for masking sensitive data across nonproduction environments, including development, testing, QA, and training. The platform protects confidential information by using a variety of transformation techniques that substitute sensitive data with realistic, fully functional masked data, while maintaining contextual accuracy.
InfoSphere Optim Data Privacy offers prepackaged data masking routines to transform complex data elements while retaining contextual meaning. It integrates with the Information Governance Catalog and provides 30 predefined data classifications and data privacy rules. IBM InfoSphere Optim Data Privacy also features a stand-alone API to access predefined and user-developed data masking services, as well as a data privacy app for masking data in CSV, XML, and Hadoop formats. Optim Data Privacy aids organizations in meeting compliance requirements such as HIPAA, GLBA, DDP, and PIPEDA. It supports Format Preserving Encryption (FPE) based on the AES-256 algorithm, producing varied masked values without discernible patterns and repeatable masked values when using the same encryption key.
Additionally, predefined data privacy reports offer insights into risk exposure and compliance. The platform also integrates with commonly used applications like Oracle E-Business Suite, PeopleSoft Enterprise, and Siebel, and supports various database management systems including IBM Db2, IBM Information Management System, Postgres, Informix, Oracle, and Sybase.
Informatica Cloud Data Masking is a scalable solution for creating secure, anonymized data that can be used in various environments. It focuses on protecting sensitive information such as personal details, payment card data, and identification numbers by anonymizing them.
The software’s compatibility extends to a range of databases, allowing businesses a consistent approach to data masking policies, with a single audit trail for tracking results.
As part of the Intelligent Data Management Cloud (IDMC), Informatica Cloud Data Masking provides a single, high-performance cloud-native environment for centrally managing data masking processes. It leverages IDMC’s scalability and robustness to handle large volumes of data from various database sources, platforms, and locations.
The software supports various masking algorithms including substitution, blurring, sequential, randomization, shuffling, and nullification. Prepackaged proxy data and custom data sets can also be used to replace production data while maintaining its structural integrity. Informatica Cloud Data Masking also offers broad connectivity and custom application support, enabling businesses to apply masking algorithms to different data formats across a wide variety of databases, mainframes, and business applications, including Oracle and Microsoft SQL Server.
Mage offers iScramble and iMask, two data masking solutions designed for businesses of all sizes. iScramble focuses on providing static data masking, offering over 60 anonymization methods to protect sensitive information, while maintaining referential integrity between applications. This ensures data compliance with regulations such as HIPAA, GDPR, and CCPA. Fuzzy logic and artificial intelligence help generate a fake dataset similar to the original, preserving demographic information while minimizing re-identification risk. Mage iScramble also employs AI and natural language processing to detect sensitive data within unstructured fields and log files.
On the other hand, iMask offers a comprehensive dynamic data masking solution for both the application and database layers. It allows businesses to create customizable role-based, user-based, program-based, and location-based access controls to sensitive data. iMask’s data classification-centric anonymization techniques are designed to preserve the integrity of the data while complying with the highest security standards, such as the NIST-approved FIPS 140 algorithm for encryption and tokenization. This tool focuses on providing secure anonymization without compromising performance, maintaining data consistency between production and non-production instances, and facilitating secure cloud migration for individual applications.
Oracle Data Masking and Subsetting enables organizations to securely utilize data without increasing risk, while also minimizing storage costs. This solution is designed for different scenarios including testing, development, and partner environments, and it ensures that application integrity is maintained throughout the masking and subsetting process, offering efficient and secure access to data for various use cases.
Oracle Data Masking and Subsetting automates the discovery of sensitive data columns and the corresponding parent-child relationships in the database. The platform then provides an extensive library of masking formats for sensitive data such as credit card numbers, national identifiers, and personally identifiable information (PII). Users can create custom masking formats to address specific requirements. Available options include shuffle masking, encryption, format preserving randomization, conditional masking, compound masking, deterministic masking, and user-defined PL/SQL masking. These options cater to various needs like preserving data format, retaining relationships between related columns, and generating consistent outputs across application schemas and databases.
OpenText Voltage SecureData Enterprise is a data protection solution designed to address compliance, privacy, and data security needs across multi-cloud, on-premises, and hybrid IT infrastructures. It employs format-preserving data protection techniques and is validated by FIPS 140-2 and Common Criteria standards, using NIST Standard FF1 mode of AES encryption.
Voltage SecureData Enterprise offers a range of interfaces—including REST APIs, local client libraries, proxy and driver interceptors, and cloud-native functions—to integrate with various databases, operating systems, applications, and platforms. It also features integrations with major cloud service providers, stateless key management, and Voltage Structured Data Manager for data discovery, analysis, and classification. Voltage SecureData provides flexible deployment options for high availability and performance across hybrid IT infrastructures, supporting both virtual appliances and containerized microservices. The platform’s SecureData Sentry feature enables transparent data protection, simplifying hybrid IT migration and accelerating time to value for security compliance and end-to-end data protection.
Everything You Need To Know About Data Masking Software (FAQs)
What Is Data Masking Software?
Most organizations today must comply with strict data privacy and protection regulations, which often require them to prove that they’re taking steps to secure any sensitive data they handle—including customer data such as personally identifiable information (PII), protected health information (PHI), and financial information. Data masking software, also known as “data obfuscation” or “data sanitization” software, can help achieve compliance with data privacy regulations by hiding the original values (letters and numbers) of sensitive data, and replacing them with a realistic and structurally similar, but fictitious counterpart.
There are numerous techniques for data masking, which you can read more about below. Once the data has been masked, only someone with the original dataset can restore it back to its original values. This keeps the original data safe from unauthorized viewing and exfiltration, while maintaining most of its functional properties so it can still be used in situations where the real values aren’t needed.
Because of this, achieving compliance isn’t the only use case for data masking—it can also be applied in user training and sales demos, and is particularly useful in the world of software development. Software developers need to use real-world data for testing purposes. However, they need to do so without compromising security. Data masking enables developers to build and test their products effectively using realistic, but non-sensitive data—eliminating exposure of production data and allowing them to share and innovate freely.
How Does Data Masking Software Work?
There are a few different types of data masking:
Static data masking enables you to create a realistic, fictitious copy of an entire database. Usually, static data masking software creates a backup copy of the database, loads it to a separate masking environment, then removes any traces such as logs or changes. It then masks the data while it’s static in the masking environment. The masked dataset can be used to generate test and analytical results that mirror those of the original dataset. Because of this, static data masking is commonly used to create “sanitized” versions of production databases, which can then be used in non-production environments such as development and testing.
On-the-fly data masking enables users to read and mask a small subset of data when required. It masks data while it’s being transferred from production environments to development or testing environments, before the data is saved. This means that the data is never present in its unmasked format in the dev/testing environment or the transaction log of that environment. On-the-fly data masking is commonly used in continuous software development environments, where developers need to be able to stream data continuously from production to test environment—without backing up the entire source database and masking it each time, as is done with static data masking.
Dynamic data masking streams data from the production environment directly to a system in the dev/testing environment in response to a request, without saving it in a secondary database. As with on-the-fly masking, it masks data in real-time while the data is in transit.
Deterministic data masking uses two sets of data, then replaces the values from one dataset directly with a corresponding value from the other dataset wherever it appears. For example, you could use deterministic data masking to replace the name “Robert” with “Jack”, and that change would be made wherever “Robert” had previously appeared in the dataset.
Synthetic data generation is not a data masking technique, but some data masking software solutions still offer it, so it’s worth mentioning here. Instead of replacing the values of a dataset with fictitious ones, it generates a completely separate, synthetic dataset that captures and reflects the relationships and distributions within the original dataset. This enables the synthetic dataset to function just as the original dataset would, making synthetic data generation useful for application development.
What Are The Different Data Masking Techniques?
Most data masking solutions offer a variety of different masking techniques, i.e., ways in which they can make your original data unreadable. Here are some of the most common techniques used for data masking:
- Encryption uses a mathematical algorithm to turn the data into a seemingly random collection of characters (“ciphertext”) that’s completely illegible. The data can only be read by someone with the correct decryption key. Encryption is one of the most secure forms of masking, but it requires technology for continuous encryption, and for encryption key management. It’s best applied when you plan on returning the data back to its original values.
- Scrambling re-orders the alphanumeric characters in your dataset in a completely random order. For example, the phone number 3332221234 in a production environment could be replaced with 1223432332 in a dev/testing environment. Scrambling is an easy method of data masking, but it only works on some types of data and is also less secure than most other methods.
- Nulling out replaces data values with a “null” value, which causes the data to appear missing when viewed by an unauthorized person. Nulling out is easy to implement, but it makes the data less useful for development and testing as the nullified data cannot be used in queries or analysis.
- Value variance applies a variance to each value in the original dataset, which modifies that value based on the variance allowed. For example, if you wanted to mask salary information, you could add a variance of 5%, and the original values would be replaced with new ones within 5% of the original. You could also add a variance that enables new values to sit anywhere between the lowest and highest values in the original dataset. Value variance is good for providing useful, realistic datasets.
- Substitution swaps out values for fictitious but realistic alternatives, often using a lookup table. For example, you could swap out a list of names with a different list of names, or a list of phone numbers with a different list of numbers that all meet the criteria needed to be a phone number (e.g., correct length and format).
- Shuffling is similar to substitution, except that it swaps values out randomly for other values within the same dataset. The result is a dataset with re-ordered columns, so the dataset looks accurate but actually doesn’t reveal any sensitive information. For example, “Bob Smith” and “Jack Jones” could be shuffled to “Bob Jones” and “Jack Smith”.
What Features Should You Look For In Data Masking Software?
Data masking solutions offer a variety of different features and capabilities to meet specific use cases, so it’s important that you identify your most critical needs—such as the type of data you need to mask, how frequently you need to mask data, and how secure you need that data to be—before you start comparing solutions. That being said, there are a few key features that you should look for in any strong data masking software:
- While the data produced by data masking software will be fictitious, it still needs to be realistic, so that you can use it for non-production use cases. This means that the masked data needs to be the same format and structure as the original data; you can’t swap out a list of names with a list of numbers because that wouldn’t be functional.
- Your chosen solution needs to be compatible with all the different data sources and types that your organization uses.
- The masking process should be automatic and fast—particularly if you need to mask data continuously to reflect changes in the original dataset.
- If you’re working with particularly large datasets, you may want your making solution to help you identify and classify sensitive information within your dataset that needs to be masked, such as names, contact details, and financial information.
- If you’re operating in a highly regulated industry and need to comply with strict data privacy regulations, you should look for policy-based data masking This will enable you to tokenize and mask data in accordance with specific compliance requirements.