PGCrypto: Column Level Encryption in Rails

The Problem at Hand: we need to encrypt our data

Recently we picked up a project from a global health organization that requires some level of encryption. After a series of conversations with the client we decided that we didn't need a truly HIPAA compliant solution. Instead, we settled on a HIPAA-ish solution that involves encryption at the disk level of the server and a column-level encryption in the database. In this note I'm going to lay out what I've learned so far in implementing our chosen solution to column-level encryption.

The PGCrypto Gem and PGP

After perusing the internet for a good column level encryption solution in rails we happened upon PGCrypto. However we chose a more recently maintained fork of the repo that can be found here.

PGcrypto utilizes [Pretty Good Privacy (PGP)] which is an open source symmetric-key encryption method. The PGCrypto gem packages up this method and makes it available in rails by essentially monkey patching over some of the built in postgresql adapter functionality. This is done in a fashion that abstracts almost all of the encryption out of your code aside from a few small things. It's also really easy to setup.

Some Shortcomings of the gem

As PGCrypto currently exists there are some fundamental functionality gaps that we as a company may or may not try to tackle in the not so distant future.

One of the main drawbacks staring you down the minute you crack open the gem is the lack of typing for any field that is encrypted. PGCrypto's adapter functions convert everything to a string (sometimes unsuccessfully in the case of integers) and after encrypting them, stores them as binary. As a result of the adapter functions, we no longer have native datatypes for encrypted fields as far as postgres and ActiveRecord is concerned. This poses some obvious problems with writing readable and maintainable code.

So what's the solution? Well we looked at two:

  1. Get down and dirty in the gem code and modify the adapter to allow us to maintain some semblance of datatypes on an ActiveRecord level for encrypted fields.

      This allows us to keep a normalized relational model with columns representing each field individually

  2. Store all of our column values as one big long hash, convert that to a string before it's encrypted, and convert it out of a string and parse the hash into individual fields when reporting on the data.

      In this method we would store all columns for a given row in one single field

      Each entry would have it's own row, but there would only be a primary key column and a 'data' column which stored the information for all fields for that model

      This option was nice because it gave us a solution immediately. Allowing us a little bit of wriggle room to add some custom code that would help us type the data

      This is obviously the more hackish method of utilizing an already very hackish gem

Load Testing

One of the main issues with option 2 that seemed clear from the very beginning was searching on a single encrypted column (option 1) would be much faster that searching over (and having to decrypt) an entire hash of columns worth of data for each row just to search on a single field within that hash. Just to be sure, we did some feasibility tests, results shown below:

For 100,000 rows of data in two respective models:

One with all fields stored in a hash in one encrypted column..

..and the other more normalized model with separate columns for each field:

See the screenshot of console output below (numbers are in seconds):

Summary of load tests

As you can see from the console output above, both methods were pretty slow (understandably), because we are doing a search on encrypted data. Now, using the PGP mechanism built into the PGCrypto adapter to decrypt the data in the given column represents the lion's share of the processing time for both requests. It really comes down to how large the data is for each column you have to decrypt. The results came out as you would expect. For each data structure, we are going through and decrypting every single row for a given column in order to search on its value. However in one case (the mock table), we are only have to decrypt the value of an individual field, and in the other case (the form table), we are decrypting a hash containing all columns for a given row. Not to mention that with the column level encryption you can search non-encrypted fields about 100 times faster than you could in the hash model.

The Hash route is significantly less performant, and along with the coding complexities that come along with this solution, we decided that going with option 1 was the best way to go.

OK OK, So then which solution already?

While modifying the existing gem to meet our needs could prove to be quite painful in the short term, it will pay dividends in readability, maintainability, and as our load tests have shown, performance. Please stay tuned, as we may very well contribute to [this repo]( in the near future.

Feb. 01 2016