Posts Tagged

postgresql

Common Table Expressions within Rails

We use ActiveAdmin heavily at work.

And I have a love and hate relationship with it. While it’s an easy way to quickly spin up an admin dashboard in Rails, it is so opinionated that any time you want to do anything slightly out of the ordinary, it quickly becomes a nightmare for me. I guess people may say the same thing about Rails, but I digress…

I recently had to modify a CSV export within ActiveAdmin so that it includes two rows in the CSV export for each one row that’s in the database. For the sake of example, let’s say that we have a payments table where we have three columns.

In ActiveAdmin, you define how you want the csv export to look like in a “csv” block like this.

The above block of code will produce a csv file that will have one row per database row in the payments table. However, the requirement that I was given called for two csv rows per one database row with the processing_fee being displayed as amount. Since we can’t force ActiveAdmin’s CSV processor to produce two rows, we need to feed the CSV processor the collection with two records per one record already preprocessed. At first, I decided to attempt this with Common Table Expressions.

What’s a Common Table Expression (CTE)?

I think of Common Table Expressions as a temporary table that you construct within a query that you can query from. I think of it as a more temporary lite version of database views. Let’s say that you need to produce a result set with data coming from multiple tables. You could perform complex joins and then query from the joined result set, or you could create a CTE and simplify your queries. In my specific case, I needed to create a temporary table that contained two rows per one row in the database with the second row displaying the processing_fee as amount. How do we achieve this with Common Table Expression?

The query that I came up with was

The above query will construct a temporary table called cte_payments with the result set from the query contained inside the parenthesis. Then you can query the cte_payments table as you normally would. With the above query, we can do the following within ActiveAdmin to create a custom scope to feed into ActiveAdmin’s CSV processor.

The above in the if request.csv? block work in creating a collection of result sets that you want. However, find_by_sql method returns an Array rather than the ActiveRecord::Relation collection and unfortunately ActiveAdmin requires that you feed ActiveRecord::Relation into its CSV processor. If you are working outside of ActiveAdmin or in a situation where an Array will work fine for you, you can stop reading here. If you need your custom query that you execute within Rails to return an ActiveRecord::Relation, then read on.

If you must need to have your ActiveRecord query return an ActiveRecord::Relation collection, and you would like to stick with using CTEs, you can try using the postgres_ext gem that was created by the folks at DockYard. Unfortunately, the gem is no longer maintained. You can also dig into Arel and play in that world. There’s an excellent blog post on how to work in Arel to utilize Common Table Expressions here. I personally think that working directly with SQL is easier than working with a gem that’s no longer maintained and trying to figure out Arel, so I opted for rewriting my CTE query with a regular subquery that I can use with the ActiveRecord’s from method. If I rewrite the CTE query that I wrote above as a subquery, and use ActiveRecord to return an ActiveRecord::Relation collection, it would look like the following.

The above method will return an ActiveRecord::Relation instead of an Array that you get with using Common Table Expressions.

Rails and PostgreSQL Arrays

This is something that I recently learned while working on Building Localize Ninja.

I have a Project model, where I need to store two sets of data, the source_locale and the target_locales. Basically, a Project represents a localization project that the translators/collaborators will be working on translating the app. The source_locale will only contain one language which will probably be English for many people and target_locales will be one to many languages.

The default way I approach this would have been creating a locales table where I insert all language that will be available and then creating a join table between the projects and the locales table. However, I wanted to see if there was a way to associates projects with multiple target_locales without joining tables.

At first, I thought about storing comma separated values as a string directly into a column called target_locales in the projects table. And after querying, I could split the string by commas and determine which languages the project is targeting. This is a bit archaic but I’ve seen it being done this way in other projects I have worked on and it seems to work for them. However, it felt a bit too hackity hack hack for me so I looked into it further.

After a few minutes of research, I learned that PostgreSQL supports storing arrays directly in a column. With this, rather than storing a list of languages in a separate table and then joining tables every time I want to figure out which target locales the project has, I could define the list of languages in Ruby as a constant and store the list of target languages in the array column, avoiding having an extra table just for the sake of keeping a list of languages.

To add an array column that will contain strings, migration will Rails will look something like this.

Note that you have to define what data type the array will hold. In my case, it will be strings, thus I used the text data type. If I wanted to store numbers, I would have used the integer data type instead.

Validations

Let’s say that we want to allow having projects to have the same name as long as they have different values for the target_locales column. In Ruby, arrays are considered to be not equal if they have the same values in different order. For example,

Let’s say that we want our program to recognize that [1, 2, 3] == [1, 3, 2] are equal and return true since they technically contain the exact same values, albeit in a different order. Within the world of Rails, we could have a before_save callback so that we sort our array before saving it into the database like this.

Something like the above will sort the list of target_locales that we have so that all of the locales saved in the database will be nil and empty string free and be sorted so that we can easily compare the target_locales list of one project to another.

If you wanted to make the above more performant, you could do the above within PostgreSQL land, sorting the array by creating a custom function and then creating a custom rule that will execute within an insert statement. I actually prefer doing things like this within Ruby land because there have been times where I was debugging something and couldn’t figure out why something unexpected was happening and then realizing a few hours later that there was a custom database rule that was firing that I didn’t realize existed. I think when I’m coding Rails apps, I tend to be stuck in Ruby land and tend to concentrate on the code in front of me. Thus, I generally like to have as much logic written in Ruby as possible unless I have a good reason to move operations out into the database. Usually when I consider such options, it’s due to performance bottlenecks.

Since I’m currently at the beginning stages of the project, I am sticking with doing this in Ruby land. If the project ever gets some users and I see performance bottlenecks, I will consider moving the above logic out into the database.

PostgreSQL Copy to/from feature

While completing an interview coding challenge recently, I learned of a feature in PostgreSQL that allows one to load large amounts of data into a PostgreSQL database.

Sometimes, you get a request to seed large amounts of data from a text file or a CSV file into a database table. Or sometimes, you get a request to export large amounts of data from a database table out to a CSV file. Let’s just stick with the seeding example to see how we may do this in ActiveRecord. Below is an example of seeding data from a CSV into a users table.

Importing data from a CSV file

I didn’t test the code above, but it should be something like that. The above would actually work. However, if there are a large amounts of user data, let’s say millions of rows, the above code may take awhile to run. Also, if the User model has a lot of ActiveRecord callbacks, we would take a large performance hit due to the fact that we need to go through the ActiveRecord abstraction layer.

How do we get around this issue and copy data to/from the database faster? Well, turns out PostgreSQL has a neat little feature called COPY that can do this super fast. The link to the PostgreSQL documentation on COPY is below.

https://www.postgresql.org/docs/9.6/static/sql-copy.html

Basically, you can copy to and from a database table. Let’s convert the above script into utilizing the COPY feature.

Boom. And that will execute in seconds instead of potential hours. Much faster. The first “users” refers to the table name and the columns list in the parenthesis refers to the users table’s columns. You can actually omit the columns list and PostgreSQL will just figure it out if you have the “HEADER CSV” parameters set like the example above, but I like a bit more specificity in my COPY statements.

One issue you may run into if you try executing the above in a production environment where you have the csv file in the application directory is that PostgreSQL’s COPY commands are read and written directly by the server. Which means that the CSV file must reside on the server where the database resides. In a typical production environment, this will not be the case since the application server and the database server will live on separate servers. Fortunately, there’s an easy way to get around this issue. Take a look at the modified script below.

This script is longer than the first example, but it allows you to copy CSV data that lives on the application server into the database in a separate server. So, if you ever do data seeding type of work in migration files in production environments, you would want to go with the example script above.

There’s the last “gotcha” that you might run into when using the COPY feature with Ruby on Rails. Database tables that map to ActiveRecord models in typical Rails application generally have the created_at and updated_at columns that cannot be null. If you use the COPY feature above, you may run into a situation where the script fails because the created_at and updated_at columns cannot be null. This won’t be an issue if your CSV files have this data set defined, but it most likely won’t. To get around this issue, generate a migration that will set all new database entries with created_at and updated_at columns to the current datetime.

Exporting data out to a CSV file

Now, what about exporting data out a CSV? This is pretty straightforward.

Yep, that’s literally it. You can write any SQL statement you want within the parameters.

Rails gem to help simplify this process

Some developers are either not comfortable with writing raw SQL statements (I have to admit, I sometimes fall into this category as well if it’s a complex query) or they simply prefer the abstraction that modern ORMS provide. If you prefer performing the above operations in pretty Ruby code, there’s a gem called “postgres-copy” that provides this abstraction layer.

https://github.com/diogob/postgres-copy

I personally haven’t used the gem yet because I feel fine writing raw COPY statements, but for those who prefer writing the above operations in Ruby should try the above gem.