Recently we picked up a project from a global health organization that requires some level of encryption. After a series of conversations with the client we decided that we didn't need a truly HIPAA compliant solution. Instead, we settled on a HIPAA-ish solution that involves encryption at the disk level of the server and a column-level encryption in the database. In this note I'm going to lay out what I've learned so far in implementing our chosen solution to column-level encryption.
After perusing the internet for a good column level encryption solution in rails we happened upon PGCrypto. However we chose a more recently maintained fork of the repo that can be found here. PGcrypto utilizes [Pretty Good Privacy (PGP)] which is an open source symmetric-key encryption method. The PGCrypto gem packages up this method and makes it available in rails by essentially monkey patching over some of the built in postgresql adapter functionality. This is done in a fashion that abstracts almost all of the encryption out of your code aside from a few small things. It's also really easy to setup.
As PGCrypto currently exists there are some fundamental functionality gaps that we as a company may or may not try to tackle in the not so distant future. One of the main drawbacks staring you down the minute you crack open the gem is the lack of typing for any field that is encrypted. PGCrypto's adapter functions convert everything to a string (sometimes unsuccessfully in the case of integers) and after encrypting them, stores them as binary. As a result of the adapter functions, we no longer have native datatypes for encrypted fields as far as postgres and ActiveRecord is concerned. This poses some obvious problems with writing readable and maintainable code.
One of the main issues with option 2 that seemed clear from the very beginning was searching on a single encrypted column (option 1) would be much faster that searching over (and having to decrypt) an entire hash of columns worth of data for each row just to search on a single field within that hash. Just to be sure, we did some feasibility tests, results shown below: For 100,000 rows of data in two respective models: One with all fields stored in a hash in one encrypted column..
..and the other more normalized model with separate columns for each field:
See the screenshot of console output below (numbers are in seconds):
As you can see from the console output above, both methods were pretty slow (understandably), because we are doing a search on encrypted data. Now, using the PGP mechanism built into the PGCrypto adapter to decrypt the data in the given column represents the lion's share of the processing time for both requests. It really comes down to how large the data is for each column you have to decrypt. The results came out as you would expect. For each data structure, we are going through and decrypting every single row for a given column in order to search on its value. However in one case (the mock table), we are only have to decrypt the value of an individual field, and in the other case (the form table), we are decrypting a hash containing all columns for a given row. Not to mention that with the column level encryption you can search non-encrypted fields about 100 times faster than you could in the hash model. The Hash route is significantly less performant, and along with the coding complexities that come along with this solution, we decided that going with option 1 was the best way to go.
While modifying the existing gem to meet our needs could prove to be quite painful in the short term, it will pay dividends in readability, maintainability, and as our load tests have shown, performance. Please stay tuned, as we may very well contribute to [this repo](https://github.com/BlinkerGit/pgcrypto) in the near future.