Rails and PostgreSQL Arrays

This is something that I recently learned while working on Building Localize Ninja.

I have a Project model, where I need to store two sets of data, the source_locale and the target_locales. Basically, a Project represents a localization project that the translators/collaborators will be working on translating the app. The source_locale will only contain one language which will probably be English for many people and target_locales will be one to many languages.

The default way I approach this would have been creating a locales table where I insert all language that will be available and then creating a join table between the projects and the locales table. However, I wanted to see if there was a way to associates projects with multiple target_locales without joining tables.

At first, I thought about storing comma separated values as a string directly into a column called target_locales in the projects table. And after querying, I could split the string by commas and determine which languages the project is targeting. This is a bit archaic but I’ve seen it being done this way in other projects I have worked on and it seems to work for them. However, it felt a bit too hackity hack hack for me so I looked into it further.

After a few minutes of research, I learned that PostgreSQL supports storing arrays directly in a column. With this, rather than storing a list of languages in a separate table and then joining tables every time I want to figure out which target locales the project has, I could define the list of languages in Ruby as a constant and store the list of target languages in the array column, avoiding having an extra table just for the sake of keeping a list of languages.

To add an array column that will contain strings, migration will Rails will look something like this.

Note that you have to define what data type the array will hold. In my case, it will be strings, thus I used the text data type. If I wanted to store numbers, I would have used the integer data type instead.


Let’s say that we want to allow having projects to have the same name as long as they have different values for the target_locales column. In Ruby, arrays are considered to be not equal if they have the same values in different order. For example,

Let’s say that we want our program to recognize that [1, 2, 3] == [1, 3, 2] are equal and return true since they technically contain the exact same values, albeit in a different order. Within the world of Rails, we could have a before_save callback so that we sort our array before saving it into the database like this.

Something like the above will sort the list of target_locales that we have so that all of the locales saved in the database will be nil and empty string free and be sorted so that we can easily compare the target_locales list of one project to another.

If you wanted to make the above more performant, you could do the above within PostgreSQL land, sorting the array by creating a custom function and then creating a custom rule that will execute within an insert statement. I actually prefer doing things like this within Ruby land because there have been times where I was debugging something and couldn’t figure out why something unexpected was happening and then realizing a few hours later that there was a custom database rule that was firing that I didn’t realize existed. I think when I’m coding Rails apps, I tend to be stuck in Ruby land and tend to concentrate on the code in front of me. Thus, I generally like to have as much logic written in Ruby as possible unless I have a good reason to move operations out into the database. Usually when I consider such options, it’s due to performance bottlenecks.

Since I’m currently at the beginning stages of the project, I am sticking with doing this in Ruby land. If the project ever gets some users and I see performance bottlenecks, I will consider moving the above logic out into the database.