Preventing disclosures by ‘data-masking’ your database
Suppose you’re the Chief Information Officer for a major federal department or agency -- such as DHS, the FBI or TSA – and your database administration team is planning to make a change in the way your organization accesses data and spits it out from your main database (which might contain hundreds of millions of individual records.)
No doubt, the team will want to run a series of tests on a subset of your entire database, which we’ll call the “test database,” to see if their proposed changes will function properly in the real world. The test database they use might include only thousands, or perhaps tens of thousands, of unique records.
Nevertheless, the question arises: In this age of calamitous data breaches, in which “Insider Threats” remain worrisome, could a member of the database administration team gain access to private and personally identifiable information that is stored in that test database? In other words, could an employee of the department, whose job was to improve the functioning of the database, be in a position to steal valuable data and, perhaps, disclose it to evil actors on the outside?
To address this possibility, a relatively new field of data-masking has emerged, in which software has been developed that automatically changes private information -- such as a social security number, personal name, medical diagnosis, salary level or account number – when it is initially “extracted” from the giant database to create that smaller test database.
SoftBase, based in Asheville, NC, has developed a software tool it calls Test Base, which is intended to empower database administrators to mask the sensitive and private information contained in their test databases. Government Security News spoke on Nov. 18 with Steve Woodard, the newly-named CEO, and Neal Lozins, the company’s product manager for Test Base, to learn about the mission and mechanics of data-masking software. SoftBase was acquired last month by a Boston-based private equity firm known as Candescent Partners.
The key to effective data-masking is converting the original data – whether it is text or numerals – into substitute data that maintains the length and character of the original data, and that remains consistent across the test database. In other words, Test Base (and any similar data-masking software) ought to convert a nine-digit social security number into a new number that also contains nine digits, and that remains the same whenever it is attached to the same individual. (Incidentally, the Social Security Administration has set aside a large group of social security numbers that it has not issued to real U.S. citizens or residents, and these social security numbers can be used in test databases without fear of disclosure.)
In another example, a medical diagnosis, which contains a maximum of 50 letters and numbers, say, should be converted into a substitute “data-masked” entry that is also limited to 50 letters and numbers. A original salary figure would be changed to an equally-long string of numerals.
SoftBase’s Lozins pointed out that it is relatively easy to spot a personal name when it stands by itself in a document – for example, at the top of a letter – but much more difficult when the personal name is “embedded” amidst a paragraph of surrounding text.
In some instances, the database administrators using Test Base replace their original data with meaningful words and letters. In other cases, they simply replace the original data with gibberish. The choice can be made by the customer, said Lozins.
Another key is ensuring that the database administrator cannot see the “genuine” data at the time the test database is first generated. To prevent this, SoftBase’s software converts the data as it is being extracted. “We’re not even going to give them the chance to look at it,” said Woodard.
Woodard, the CEO who joined SoftBase after it was acquired by Candescent Partners, is planning a major effort in 2012 to educate potential government customers about the importance of data-masking. In an era when HIPAA, Sarbanes-Oxley and other federal and state laws require vigorous efforts to prevent the public disclosure of private data, the practice of data-masking of test databases may not be required by such laws, but it certainly makes good sense, Woodward argues. One only need look at the numerous incidents in which government or commercial data was accidentally disclosed – or intentionally stolen – to recognize the financial down-side that could result from allowing such data to be compromised.
“You may have secured yourself from outside threats,” said Woodard, “but when you’re worried about insider threats, this Test Base software provides one huge area that you don’t have to worry about.”