Secure but accessible

New encryption scheme aims to keep personal data protected while in use

Format Preserving Encryption:
A chronology

At the National Institute of Standards and Technology's National Information Systems Security Conference, Michael Brightwell and Harry Smith propose that preserving the format of encrypted data could simplify protection of databases and data warehouses.

1997-2002: John Black of the University of Colorado and Phil Rogaway of the University of California-Davis present a paper at the RSA Conference on how to encrypt a variety of data items based on results from a 1988 Luby- Rackoff paper on constructing pseudo-random permutations. However, the security bounds don't hold for Social Security and credit card numbers.

2004: Jacques Patarin of the University of Versailles publishes a set of papers on how a simple modification of the Luby-Rackoff technique would allow security bounds equivalent to 256-bit Advanced Encryption Standard encryption.

2006-07: The Voltage Security crypto team, led by the company's Chief Technology Officer Terence Spies, proposes Format Preserving Encryption, applying Patarin's results to the Black-Rogaway approach and adding additional techniques to build a practical scheme for securing structured data, such as Social Security numbers, credit card numbers and other account information.

Source: Voltage Security

The purpose of cryptography is to scramble data to the point that it cannot be recognized. But that makes it difficult if not impossible to use, so it has to be decrypted, creating the risk of exposing the data you wanted to protect. It would be preferable to encrypt some data but keep it in a usable format. That would let you use sensitive information such as credit card and Social Security numbers stored in databases without exposing them.

Format Preserving Encryption, a technique developed by Terence Spies, chief technology officer at Voltage Security, is designed to do that.

'It allows us to provide strong encryption to structured data without changing the format,' said Dan Beck, director of product management at Voltage.

Format Preserving Encryption is employed in SecureData, a tool that enables the use of encrypted data, as opposed to encrypting data at rest. Encrypted database fields still can be used to index and recover information.

'It allows you to work within the existing framework,' Beck said. 'If you don't change the size of the field, you don't need to overhaul your database and application.'

The idea of Format Preserving Encryption goes back at least to 1997, and during the following decade, a number of cryptographers in this country and France worked on the idea.

Spies cobbled together the results of this work in 2006 and 2007, building on modifications to cryptographic techniques that he applied to the Advanced Encryption Standard (AES) algorithm to produce a practical commercial tool.

'The majority of my life for the last year or so has been working with cryptographers,' Spies said.

The 10-year development time for the cryptographic tool is not unusual. 'Things happen unbelievably slowly in the crypto world because you want to be sure of the security,' he said. 'It has taken the academic community a while to get to the point that they believe they have provable security.'

Where's the value?

The need for such a tool is not always apparent. Bill Burr, manager of the security technology group at NIST's Information Technology Lab, said the value of Format Preserving Encryption was not clear to him until he talked to Spies.

'He convinced me there is a real application for it,' Burr said. 'It seems like it would be useful in an era when we are sensitive about things like Social Security numbers.'

Credit card, identification and Social Security numbers, which are considered personally identifiable information that under some laws and regulations must be protected, often are used as unique identifiers to link records in databases. Applications also use them as indexes to retrieve records, even when the actual numbers are not necessary to the application.

'To make sense of these databases, we need to preserve the relationships that the numbers enable,' Burr said. 'They are the keys that make the different records in the databases hang together.'

Start making sense

Encrypting the numbers can make them meaningless to the database and applications, which expect Social Security numbers to be nine digits, for example.

Moreover, if a number produces a different encrypted text when it is encrypted in different files, the numbers cannot be used to link the files.

Because of that disconnect, encrypted data loses much of its value unless databases and applications are modified, a process that can be expensive and time-consuming.

Burr said he has received calls from several agencies, including the Veterans Affairs Department, where researchers are doing analytical work on patient records. The identity of patients is not needed for the research, but the identifiers are needed to locate and link records.

'They had people walking around with laptops with hundreds of thousands of patient records they were trying to analyze, tied together with Social Security numbers,' he said. They needed some way to protect the identifiers while keeping the data accessible. 'I didn't have any good advice to give them. [Format Preserving Encryption] looks like it might solve that problem.'

Test environment

Organizations sometimes need to use production databases for application testing before an application is deployed live.

In these cases, sensitive data must be protected in a test environment that might not be as well-secured as a production environment. Call centers also routinely access data by using sensitive identifiers to retrieve and verify records even though the identifiers are not necessary to the transaction.

SecureData includes a key server, and the encryption is deployed as a command-line, Web services or toolkit application. The command- line format typically is used in an enterprise to run against and encrypt existing data in a database.

The fields or type of data to be encrypted are specified, and the Web service can automatically encrypt specified data as it comes into a system during a transaction.

The Format Preserving Encryption technique is not tied to any algorithm, but SecureData uses AES with a 256-bit key. It cycles the field to be encrypted multiple times, disposing of some digits in each cycle until it arrives at an encrypted field in the same format as the original.

Despite the shortened format, 'our encryption is as strong as 256-bit AES,' Beck said. 'We can prove mathematically it cannot be broken' any more easily than AES.

By encrypting different parts of a field with separate keys, a customer service representative could be allowed to decrypt and view only the last four digits of a Social Security or credit card number for verification, keeping the rest hidden.

To make the scheme work, the same key should always produce the same cipher text when run against a number, without producing collisions ' that is, no two numbers will produce the same cipher text. This allows use of the encrypted numbers for indexing.

Finding a way to avoid collisions might be the most significant advancement in Format Preserving Encryption, Burr said.

'That's the property that is hard to get,' he said. 'I don't know that it's earth-shattering, but it may be a pretty useful thing.'

On the rise

SecureData was released in October, but Voltage began promoting it widely this year after the March release of a new version with improved data-masking capability and better event reporting for audit and regulatory compliance.

The driver for the tool's adoption so far has been the Payment Card Industry rules for protecting data maintained by transaction and payment processors. Beck said he has been surprised that the Health Insurance Portability and Accountability Act has not encouraged more adoption in the health care industry and that recent federal requirements for agencies to protect personally identifiable information have not driven its use in government.

'We currently don't have any federal government customers,' Beck said.

This raises the question of certification under the Federal Information Processing Standards. All cryptographic modules used by agencies must be certified to FIPS 140-2. Voltage maintains that because Format Preserving Encryption is a mode of AES, it does not need to be validated under FIPS as long as it is implemented in a validated cryptographic module.

'Our belief is we don't need FIPS certification,' Spies said. The company is working with NIST to establish that [Format Preserving Encryption] is an approved mode of AES, he said.

'My feeling is they probably don't need [FIPS validation],' Burr said. 'But the truth is, agencies are nervous about this and would probably prefer them to have the FIPS blessing. I don't think it's required, but people would appreciate it.'


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected