Anonymizing Records in Ruby on Rails

Introduction

In the development of modern web applications, especially those handling sensitive user data, the decision to soft delete records rather than permanently remove them from the database is a common practice. This approach, often dictated by application design or business requirements, allows data to remain accessible for administrative purposes while hidden from the user interface. However, with the rising importance of data privacy laws and regulations globally, simply soft deleting records may not suffice to comply with legal standards. It becomes necessary to anonymize certain records that are soft deleted to protect user privacy effectively.

In response to these challenges, we will delve into the utilization of Rails model concerns as a sophisticated solution for adding anonymizing functionality to your models. This tutorial will guide you through the process of leveraging Rails concerns to implement a customizable and reusable anonymization strategy. By integrating this functionality directly into your models, you can ensure that your application not only adheres to privacy laws but also maintains a high standard of data integrity and user trust. Join us as we navigate the practical steps to enhance your Rails application with essential anonymization capabilities.

Setting Up: Creating a New Rails Application and Its Core Models

We'll begin by setting up a new project. Our application, named "Slacker", will exemplify how to integrate anonymizing functionality within a Rails application. To streamline our setup and focus on the essentials, we'll omit certain components that aren't directly relevant to our objectives.

We create our new Rails application in a terminal window:

rails new slacker --skip-jbuilder --skip-test
rails db:create

Next, we'll proceed to create the core models for our application: Account, User, and Message. The User model will belong to an Account, establishing a direct relationship where each user is associated with a single account. Furthermore, we'll set up a one-to-many relationship between Users and Messages, allowing each user to have multiple posts.

Here are the commands to generate each model along with their respective attributes:

rails generate model Account email:string password:string
rails generate model User name:string account:references
rails generate model Message content:text user:references
rails db:migrate

After creating the models and migrating our database, the next step is to set up the associations between these models to reflect their relationships accurately.

class Account < ApplicationRecord
  has_one :user dependent: :destroy
end

class User < ApplicationRecord
  belongs_to :account
  has_many :messages, dependent: :destroy
end

class Message < ApplicationRecord
  belongs_to :user
end

Implementing a Basic Rails Model Concern

We aim to anonymize sensitive information in the User and Account models by setting their relevant fields (name for User and email for Account) to nil when necessary.

To achieve this in a clean and reusable manner, we will employ a Rails concern. A concern allows us to encapsulate this shared behavior in a module that can be easily included in any model requiring anonymization capabilities. This approach not only keeps our code DRY (Don't Repeat Yourself) but also enhances the modularity and maintainability of our application.

One key aspect of this implementation is allowing each model to specify which of its columns are anonymizable. This flexibility is crucial for tailoring the anonymization process to the specific needs of each model, ensuring that we only anonymize the data that truly requires it.

To set the stage for this functionality, we will begin by creating a concern named Anonymizable:

# app/models/concerns/anonymizable.rb

module Anonymizable
  extend ActiveSupport::Concern

  included do
    class_attribute :anonymizable_columns

    def anonymize!
      anonymizable_columns.each do |column|
        public_send("#{column}=".to_sym, nil)
      end
      save!
    end

    private

    def anonymizable_columns
      self.class.anonymizable_columns
    end
  end

  class_methods do
    def anonymizable(*columns)
      self.anonymizable_columns ||= columns
    end
  end
end

In the initial version of our Anonymizable concern, we've laid down a foundation for a flexible yet straightforward approach to data anonymization within our Rails application.

  • Module Definition: The Anonymizable module extends ActiveSupport::Concern, a Rails module that provides a structured way to enhance models with additional capabilities. This choice facilitates the inclusion of shared methods and logic across different models in a clean and maintainable manner.
  • Class Attribute: Within the included block, we define a class_attribute named :anonymizable_columns. This attribute will store an array of symbols representing the columns each model wishes to anonymize. Using a class attribute allows each model to maintain its own list of anonymizable columns, providing the necessary customization for the anonymization process.
  • Anonymize! Method: The anonymize! instance method is the heart of this concern. When invoked on a model instance, it iterates over the anonymizable_columns, setting each specified column's value to nil. This effectively anonymizes the data by removing any identifiable information. The method concludes by saving the changes to the database with save!.

Understanding the included Block

The included block within an ActiveSupport::Concern module is a special hook that Rails calls when the module is included in another class. This block is where we place code that we want to be executed in the context of the class that includes the module. In the case of our Anonymizable concern, the included block defines the :anonymizable_columns class attribute and the anonymize! instance method. This setup ensures that any model including the Anonymizable module will automatically have these attributes and methods injected into its class context, enabling the anonymization functionality without additional boilerplate code.

The Role of class_methods Block

The class_methods block provided by ActiveSupport::Concern offers a clean, organized way to add class methods to the including class. When you define methods within the class_methods block of the concern, these methods become available on the class itself, not just on instances of the class. In our Anonymizable concern, the class_methods block is used to define the anonymizable method. This method allows any model that includes the concern to specify which columns should be anonymizable by setting the anonymizable_columns class attribute. This design pattern simplifies the process of extending class functionality across multiple models, maintaining a DRY approach while providing the necessary hooks for customization.

Now that we understand how our Rails concern works, let's include it in our models to enable the anonymization functionality. By incorporating the Anonymizable concern into our User and Account models, we can specify which fields should be anonymized and easily anonymize records with a simple method call. Here’s how we can do it:

class User < ApplicationRecord
  include Anonymizable

  anonymizable :name

  belongs_to :account
  has_many :messages, dependent: :destroy
end

class Account < ApplicationRecord
  include Anonymizable

  anonymizable :email

  has_one :user, dependent: :destroy
end

Let's test it in a sandbox console rails console --sandbox:

# Create an example user and account
account = Account.create!(email: "jd@beatz.com")
user = User.create!(name: 'Jay Dilla', account:)

# Anonymize the user and account
user.anonymize!
user.account.anonymize!

# Check the anonymized fields
user.reload
user.name
# => nil
user.account.reload
user.account.email
# => nil

Enhancing Anonymization: Custom Placeholder Values

A specific business requirement has emerged: when displaying messages from users who have been anonymized, we want to replace their name attribute, which would typically become nil upon anonymization, with a placeholder text '(deactivated)'. Conversely, for accounts, the email attribute should still be anonymized to nil. This requirement introduces a new layer of complexity: our Anonymizable concern must now support not only the specification of which columns to anonymize but also allow for the customization of the anonymization value for each column.

To accommodate this, we've updated our Anonymizable concern with enhanced functionality:

module Anonymizable
  extend ActiveSupport::Concern

  included do
    class_attribute :anonymizable_columns

    def anonymize!
      anonymizable_columns.to_h.each do |column, value|
        public_send("#{column}=".to_sym, value)
      end
      save!
    end

    private

    def anonymizable_columns
      self.class.anonymizable_columns
    end
  end

  class_methods do
    def anonymizable(*columns)
      self.anonymizable_columns ||= []

      columns.flatten.each do |column|
        case column
        when Symbol, String
          self.anonymizable_columns << [column.to_sym, nil]
        when Hash
          column.each do |key, value|
            self.anonymizable_columns << [key.to_sym, value]
          end
        end
      end

      self.anonymizable_columns
    end
  end
end

The core logic of the Anonymizable concern has been expanded to allow for a more versatile anonymization process:

  • Anonymizable Columns as a Hash: Previously, anonymizable_columns was conceived as an array of symbols representing the columns to be anonymized. We've evolved this approach by transforming anonymizable_columns into an array of arrays (effectively a hash when converted using to_h), where each element consists of a column name and a corresponding anonymization value. This change provides the granularity needed to specify distinct anonymization values for different columns.
  • Enhanced Anonymize! Method: The anonymize! method has been modified to iterate over anonymizable_columns, now expecting a hash map of columns to their respective anonymization values. This iteration allows us to set each specified column to its corresponding anonymization value, fulfilling the requirement to replace certain data with custom placeholders.
  • Flexible Class Method for Specifying Anonymization: The class_methods block within the concern now includes a more sophisticated anonymizable method. This method accepts arguments in various formats (symbols, strings, or hashes), allowing us to specify columns with default anonymization values (i.e., nil) or custom values as needed. This flexibility is key to accommodating our business case, providing the ability to specify that the name column in the User model should be anonymized to '(deactivated)'.
class User < ApplicationRecord
  include Anonymizable

  anonymizable name: '(deactivated)'

  belongs_to :account
  has_many :messages, dependent: :destroy
end

class Account < ApplicationRecord
  include Anonymizable

  anonymizable :email

  has_one :user, dependent: :destroy
end

Let's test it in a sandbox console rails console --sandbox:

# Create an example user and account
account = Account.create!(email: "jd@beatz.com")
user = User.create!(name: 'Jay Dilla', account:)

# Anonymize the user and account
user.anonymize!
user.account.anonymize!

# Check the anonymized fields
user.reload
user.name
# => "(deactivated)"
user.account.reload
user.account.email
# => nil

Advanced Anonymization: Tracking and Extending Anonymization Behavior

Building upon our foundational anonymization functionality, we will now introduce advanced requirements to further enhance how our application handles data anonymization. These new features include tracking the anonymization time, as well as providing a flexible mechanism to execute additional actions during the anonymization process. Specifically, we want to update a new status column in the User model to deleted every time a User record gets anonymized.

To align with these enhancements, our models will require the following updates in a Rails migration:

add_column :users, :anonymized_at, :datetime
add_column :accounts, :anonymized_at, :datetime
add_column :users, :status, :integer, null: false, default: 0

And we add the status enum to the User model:

class User < ApplicationRecord
  include Anonymizable

  anonymizable name: '(deactivated)'

  enum status: %i[active deleted]

  belongs_to :account
  has_many :messages, dependent: :destroy
end

We'll make the following changes to our Anonymizable concern:

module Anonymizable
  extend ActiveSupport::Concern

  included do
    class_attribute :anonymizable_columns

    def anonymize!
      anonymizable_columns.to_h.merge(anonymized_at: Time.zone.now).each do |column, value|
        public_send("#{column}=".to_sym, value)
      end
      yield self if block_given?
      save!
    end

    def anonymized?
      anonymized_at.present?
    end

    private

    def anonymizable_columns
      self.class.anonymizable_columns
    end
  end

  class_methods do
    def anonymizable(*columns)
      self.anonymizable_columns ||= []

      columns.flatten.each do |column|
        case column
        when Symbol, String
          self.anonymizable_columns << [column.to_sym, nil]
        when Hash
          column.each do |key, value|
            self.anonymizable_columns << [key.to_sym, value]
          end
        end
      end

      self.anonymizable_columns
    end
  end
end

The updated version of the Anonymizable concern includes the following key enhancements:

  • Tracking Anonymization Time: We introduce an anonymized_at attribute to our models, which records the exact time a record was anonymized. This attribute is updated within the anonymize! method by merging anonymized_at: Time.zone.now into the anonymizable_columns hash before the anonymization process begins.
  • Yielding Self for Additional Actions: The anonymize! method now accepts a block, yielding self to it. This allows the caller to perform further actions on the record being anonymized, adding a layer of flexibility to the anonymization process. For instance, a model can update additional attributes or perform checks during anonymization.
  • Determining Anonymization Status: A new instance method, anonymized?, checks for the presence of the anonymized_at attribute to determine if a record has been anonymized. This method provides a simple way to query the anonymization status of any record.

Let's test it in a sandbox console rails console --sandbox:

# Create an example user and account
account = Account.create!(email: "jd@beatz.com")
user = User.create!(name: 'Jay Dilla', account:)
user.status
# => "active"

# Anonymize the user and account
user.anonymize! do |record|
 record.status = :deleted # no need to save as our anonymize! calls save! at the end
end
user.account.anonymize!

# Check the anonymized fields
user.reload
user.name
# => "(deactivated)"
user.status
# => "deleted"
user.anonymized?
# => true
user.account.reload
user.account.email
# => nil
user.account.anonymized?
# => true

In conclusion, this tutorial showcases the remarkable flexibility and utility of Rails concerns in real-life applications, particularly in addressing complex requirements such as data anonymization. By methodically building and refining the Anonymizable concern, we've illustrated how Rails developers can implement sophisticated features that highlight the importance of modular and reusable code in modern web development, enabling elegant solutions to complex challenges.