Category

Ruby on Rails

Build RSS feeds with Rails

I started a new position at a company called Modern Message last Monday. I’ll cover how to build RSS feeds in the Rails framework in this post.

One of my first tasks was to build a integration with Contentful (https://www.contentful.com/). A bit of a random side note, but Contentful is a blogging platform like WordPress that happens to be focus on API first interface. Thus, for developers who don’t want to build their own blogging platform in their applications, platforms like Contentful can be useful as they provide an interface to write blog posts and an API to consume your blog posts. I’ve never heard of a service like this, so I did some research into it and it looks like this is a new software niche that’s becoming popular these days.

Back to the main topic, one of the feature requests in the Contentful integration project was to expose an RSS feed that the public can consume. I’ll go over how to quickly build one in this post.

Let’s say that we have a model called Book and we want to expose all books that we have. The ActiveRecord model looks something like this.

And our corresponding standard Rails controller looks something like this.

And our config/routes.rb file looks something like this.

Fairly simple so far right? It gets simpler.

In our BooksController, we need to add a block to detect which http request formats our BooksController#index route will respond to. We want it to respond to both html and rss formats. html format because I’m assuming that you already have a corresponding HTML page for the index route. rss format to respond specifically to return rss feeds that the client applications can consume. Let’s modify our BooksController to look like this.

Now, create a new rss feed file at app/views/books/index.rss.builder. This file build the RSS feed that your clients will consume.

Now, if make a HTTP request to the BooksController#index endpoint with the rss format, you will get a response that looks something like this.

And that just about covers how to build RSS feeds within the context of Ruby on Rails!

PostgreSQL Copy to/from feature

While completing an interview coding challenge recently, I learned of a feature in PostgreSQL that allows one to load large amounts of data into a PostgreSQL database.

Sometimes, you get a request to seed large amounts of data from a text file or a CSV file into a database table. Or sometimes, you get a request to export large amounts of data from a database table out to a CSV file. Let’s just stick with the seeding example to see how we may do this in ActiveRecord. Below is an example of seeding data from a CSV into a users table.

Importing data from a CSV file

I didn’t test the code above, but it should be something like that. The above would actually work. However, if there are a large amounts of user data, let’s say millions of rows, the above code may take awhile to run. Also, if the User model has a lot of ActiveRecord callbacks, we would take a large performance hit due to the fact that we need to go through the ActiveRecord abstraction layer.

How do we get around this issue and copy data to/from the database faster? Well, turns out PostgreSQL has a neat little feature called COPY that can do this super fast. The link to the PostgreSQL documentation on COPY is below.

https://www.postgresql.org/docs/9.6/static/sql-copy.html

Basically, you can copy to and from a database table. Let’s convert the above script into utilizing the COPY feature.

Boom. And that will execute in seconds instead of potential hours. Much faster. The first “users” refers to the table name and the columns list in the parenthesis refers to the users table’s columns. You can actually omit the columns list and PostgreSQL will just figure it out if you have the “HEADER CSV” parameters set like the example above, but I like a bit more specificity in my COPY statements.

One issue you may run into if you try executing the above in a production environment where you have the csv file in the application directory is that PostgreSQL’s COPY commands are read and written directly by the server. Which means that the CSV file must reside on the server where the database resides. In a typical production environment, this will not be the case since the application server and the database server will live on separate servers. Fortunately, there’s an easy way to get around this issue. Take a look at the modified script below.

This script is longer than the first example, but it allows you to copy CSV data that lives on the application server into the database in a separate server. So, if you ever do data seeding type of work in migration files in production environments, you would want to go with the example script above.

There’s the last “gotcha” that you might run into when using the COPY feature with Ruby on Rails. Database tables that map to ActiveRecord models in typical Rails application generally have the created_at and updated_at columns that cannot be null. If you use the COPY feature above, you may run into a situation where the script fails because the created_at and updated_at columns cannot be null. This won’t be an issue if your CSV files have this data set defined, but it most likely won’t. To get around this issue, generate a migration that will set all new database entries with created_at and updated_at columns to the current datetime.

Exporting data out to a CSV file

Now, what about exporting data out a CSV? This is pretty straightforward.

Yep, that’s literally it. You can write any SQL statement you want within the parameters.

Rails gem to help simplify this process

Some developers are either not comfortable with writing raw SQL statements (I have to admit, I sometimes fall into this category as well if it’s a complex query) or they simply prefer the abstraction that modern ORMS provide. If you prefer performing the above operations in pretty Ruby code, there’s a gem called “postgres-copy” that provides this abstraction layer.

https://github.com/diogob/postgres-copy

I personally haven’t used the gem yet because I feel fine writing raw COPY statements, but for those who prefer writing the above operations in Ruby should try the above gem.

Facebook Authentication for Rails APIs

I’ve implemented Facebook login plenty of times on the Android side of things but I have never implemented on on the API side.

Granted, yes, I’ve implemented Facebook OAuth when building Ruby on Rails applications, but have never implemented one when building out an API that’s supposed to take in a Facebook access token from the client side and then retrieve user information with it to sign in or register a new user.

When implementing Facebook OAuth (or Google, Twitter, etc.), I usually just default to OmniAuth and rely on the various oauth strategy plugins written for the gem. Unfortunately, the OmniAuth gem’s default behavior doesn’t lend itself well to integrating it to build oauth systems within an API. Thus, we need a different and a much more simple solution.

Enter the koala gem. I’ve always known of its existence and have used it at times, but never thought that it would be useful for using a Facebook access token to retrieve user profiles. Using the gem to implement Facebook authentication for your Rails API is incredibly simple. Let’s say that you have a User model that you need to sign in or create using Facebook access tokens that may be passed into your API via the mobile devices. Also, let’s say that you have a controller called FacebookAuthenticationsController whose sole purpose is to take in a facebook_access_token and retrieve or create a user using the token. First, let’s write the controller that will take in the token.

The create method takes in a facebook_access_token and tries to find or create a User using the token. If the user has persisted, meaning that the User is persisted (or existing) in the database, then we return the user data. If not, we return what caused the user errors. If you’re familiar with ActiveRecord, you’ll quickly notice that find_or_create_with_facebook_token is not an ActiveRecord method. ActiveRecord methods use the term “by” rather than “with”, so the ActiveRecord’s version of the method goes like User.find_or_create_by_facebook_token.

Our users table in our schema.rb looks like the following.

Notice that I’m using devise (although it’s completely unimportant to this post). The important columns in the users table for this facebook login/registration implementation are: name, picture_url, uid, provider, and oauth_token. Now, let’s get to writing ourfind_or_create_with_facebook_token method in our User model.

In our method, we use the oauth_access_token along with the Koala gem to retrieve the user graph from Facebook. We retrieve the profile data from the graph and then store necessary data in a hash. We then use the uid (which really is the Facebook user id) to try to find whether the user exists in the database. If the user exists, we update the user attributes and return that user and if the user does not exist, we create a new user using the data that we retrieved from Facebook. The reason uid is important here is because while the oauth_access_tokens will eventually expire and change, the uid from Facebook does not. Thus we use that attribute to check whether we already have the user in our database or not.

Simple yes? I like this method especially if you’re building an API that will solely rely on Facebook Login for authentication because you don’t have to add larger dependencies like devise, oauth, and etc.

How to install Ruby on Rails

I recently had to walk through a client on how to install Ruby on Rails on his machine. It made me realize that it’s not exactly a straightforward process. I’ll go over how to install Ruby on Rails step by step on a Mac OS.

The instructions found here can be followed on the Mac OS (or OS X if you haven’t upgraded in awhile). I know that there are posts that show you how to install Ruby on Rails on specific version of the Mac operating system, but I’ve found the steps to be more or less the same in my experience.

Just FYI, all of the commands will be run in the bash terminal. They should work automatically without any problems on zshell too. If you’re on fish, you may have to modify things around a bit, especially when you install RVM (ruby version manager).

1. Install Homebrew

If you’re a developer, you probably have this installed already, but go ahead and install Homebrew if you don’t have this installed yet. For those who don’t know what Homebrew is, it’s just a package manager for Mac OS, kind of like “apt” in Debian based Linux distributions.

Follow the directions in https://brew.sh/

At the time of this writing, it says to run this command in your terminal.

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

2. Install xcode-select

You need to have the command line tools for XCode installed. Even if you think you already have this, run this command in your terminal:

xcode-select --install

This may take some time to install. Be patient and let the installation finish. Once finished, move onto the next step.

3. Install libxml2

Nokogiri is a Ruby gem that also happens to be a dependency of Ruby on Rails. The thing is, nokogiri gem installation will fail if you don’t have libxml2 installed. You can install this with homebrew by running

brew install libxml2

4. Install RVM and a Ruby version

RVM stands for Ruby version manager. There are two major ones, RVM and rbenv. It’s preferable to use a version manager because it’ll separate the ruby installation that you use for actual development work from the one that’s installed directly on your system. This makes it easy to manage gems as well as various different ruby versions that you may encounter when you work on different Rails applications.

Both RVM and rbenv work fine, but I like RVM mostly because that’s what I’m used to. I’m going to recommend installing RVM here because that’s what I use, but it really doesn’t matter which ruby version manager you use. If you strongly prefer rbenv, go ahead and install that one.

As far as installing RVM, you can find the directions on their website https://rvm.io/

If you don’t feel like searching through the website for installation directions, just run this in your terminal.

\curl -sSL https://get.rvm.io | bash -s stable --ruby

If you get some error that says that you don’t have curl, you can install curl via homebrew

brew install curl

Once you have installed RVM, install the Ruby version you need via RVM. For example, if you need to install Ruby version 2.4.1 (current latest version at the time of this writing), you can install this via running this command in your terminal.

rvm install 2.4.1

5. Install bundler and Rails

Once you have installed RVM and a Ruby version via RVM, install the gem bundler via the command

gem install bundler

After you install bundler you can finally install Rails by running

gem install rails

Or you can cd into a Rails project directory and run

bundle

and bundler will install the correct Rails version and all required dependencies for you.

6. Optional – Install MySQL and/or PostgreSQL

You’ll notice that many Rails projects will be using MySQL or PostgreSQL. You can install these two databases with ease via Homebrew. The commands to install them are just

brew install mysql

and

brew install postgresql

AAAND that should be it! If you have any questions about installing Rails on your machine, feel free to leave a comment or email me at chris0374@gmail.com

API key authentication with Rails

I recently completed a code challenge where part of the challenge was to build an API that authenticated a request via an API key. I’ll go over the basics of implementing this in this blog post.

Even if I’m working on an API specific Rails project, I prefer to namespace my Rails apps with “/api”. This gives me the flexibility to build full web applications on top of the same codebase in the future using the top level namespace if I wanted to. Also, it just feels “right”.

Let’s say that we’re going to be building an API that returns user data. Let’s assume that in our codebase, the database and the ActiveRecord model are already there and that the users table contains a column called api_key. In order to use the API, the client needs to pass a valid api_key in the header.

So, let’s start by namespacing our API in config/routes.rb. I’m also going to version the API in this, but this is up to you and what the requirements call for.

Here, you can see that I’m namespacing the API and setting the default format of all requests into the endpoints under the api namespace into json. This ensures that only json requests are accepted in all the controllers under the namespace api.

Whenever I’m creating a group of controllers under a new namespace, I like to create a new top-level controller. To do this, create a new folder named “api” directly under your “controllers” folder and create a new controller called ApiController. The new controller will look like this.

And just for clarification, we’ll be inhering all controller under the namespace “api” from this new Api::ApiController. For example, if we have a UsersController under the namespace “api”, it’ll look like this. Notice that this UsersController is under the namespace /api/v1/.

The authentication via an api_key part is in ApiController so let’s go through the code in the ApiController one by one.

In our ApiController, we have a protected method called authenticate that is called on each api request via before_callback method. In this method, we are taking in the api key that is passed via a header key called “X-Api-Key”. We use that api_key to query our database to see if a user with that api key exists. If that user exists, we store the instance of that user in the instance variable @user. Then if that @user variable is valid, we allow the api request to go through and if it isn’t valid, we return a an unauthorized http status code and break out from the code.

I found this to be a good and simple way to implement api authentication via api keys in Ruby on Rails. It gives a good base to build more advanced features like tracking how many api requests are being made per API key and then limiting (or throttling) them if needed.

Single source of truth in data

This post is about something I learned in one of my client projects a few years ago.

Single source of truth in Wikipedia is defined as:

… the practice of structuring information models and associated schemata such that every data element is stored exactly once. Any possible linkages to this data element (possibly in other areas of the relational schema or even in distant federated databases) are by reference only.

This definition sounds overly complicated to me. So my definition of single source of truth is:

Anytime you store or retrieve data, it should come from one place and one place only. If you retrieve it twice or more, each instance of the data retrieved should be exactly the same.

Adhering to the single source of truth principle when designing your system seems like common sense that isn’t difficult to adhere to. Unfortunately (or fortunately, depending on how one might decide to treat this experience), I had an experience working on a project where this seemingly simple rule was violated.

I won’t mention the name of the project due to confidentiality reasons. I also wasn’t part of the initial design of the project, but instead came in once the project was completed and in the maintenance phase.

System Design of the project

The main application was a mobile app for both iOS and Android with the API provided by the client. The client provided three different APIs which all provided the same data but with different values. This is where the bulk of the problem began.

For example, let’s say that the API returned a data on restaurants. You’re making a request to the API for a specific restaurant that you have the primary key of. So you make a request to the all of the three endpoints requesting information on the same restaurant. But for some odd reason, the type of the restaurant, for example Brazilian restaurant, Italian restaurant, French restaurant and etc., that you get back from the three different endpoints are all different. How can this be?

It turned out that the client’s system was designed like more or less like this.

It turned out the reason for this was that the client was using the three different APIs for their current web application and wanted to display different sets of data for the same data depending on the page that the user was viewing. I’m not sure why this was the case, but that’s the way it was working live in production and what the client’s stakeholders wanted.

The API also had a problem of having extremely slow response times, which made it an issue for the mobile development team. So the solution that was proposed and implemented to alleviate the issue of having three different endpoints along with slow response times of the three APIs was to develop an intermediary Rails API that would make requests to the three existing APIs at different intervals throughout the day, aggregate the data, and figure out which data to store based on specific business rules provided by the client.

So the resulting system design ended up something like this.

This sounds “workable” in theory. You introduce another API that was to act as the “single source of truth” for the mobile apps and the response times of this new Rails API was fast. Problem solved right? Unfortunately, there were a few problems that came along with this.

1 – Complicated business rules

Not only were there were way too many business rules to account for, the business rules changed every now and then, which mean the crazy amounts of business logic in the Rails API had to be changed, increasing the chance of introducing bugs. In fact, due to the holes in the logic, many duplicate entries of the same data was stored in the Rails API. This resulted in the users seeing duplicate data in the mobile apps and sometimes just seeing stuff like “null” as the names for the data in the mobile apps. The only way to alleviate this issue was to literally console into the production database, pick out the incorrect data, and manually delete the data.

2 – Race against time

As mentioned above, this system design caused many duplicate data that had to be manually deleted in production. The worst part of it all was that because the Rails API was fetching data from the three different APIs at specific intervals, the duplicate data would be stored again and again throughout the day. The situation was literally

  1. Damn it, there’s a duplicate data in the database. Okay, carefully pick out the incorrect one so that I don’t accidentally delete the real data (but again, how do I know which one is real or fake?).
  2. Okay, deleted it safe and sound, whew! Ah shit, there’s another one? Okay, delete that too.
  3. Shit, the background scheduler is going to run in 10 minutes, that means more duplicate data. !@#!@#!@#!@#!@#.
  4. Try my best to patch up the business rules (and then have them change which eventually broke more things) and delete duplicate data throughout the day.

Maintaining the system was literally a race against time playing a video game that was impossible to win.

What should have been the ideal solution?

There was a solution proposed by the tech team on both sides (our company and the client) where the client would provide a single endpoint from where the Rails API would start retrieving its data from. This way, the responsibility of keeping data integrity would be up to the API designed by the client and not the data fetched from three different endpoints and aggregated in the Rails API.

Unfortunately, I left the company before the solution was implemented so I have no idea whether this implementation succeeded or not. Looking back on it now, I don’t think keeping the Rails API would have been the correct solution. The ideal solution would have been to have the client provide a proper API with correct data that the clients would manage that the mobile apps would make requests to. Having the Rails API in the middle of it seemed like an additional layer of complexity that was completely unnecessary.

Lessons learned

Lesson learned from this project is an obvious one that I knew beforehand but the knowledge solidified after the experience. Always always ALWAYS ALWAYS ALWAYS (I can’t emphasize this enough) have your data come from a single source. Do not make your source of data a guessing machine (in which this project, the Rails API was a giant guessing machine). Also, if you have to manually go into your production database to delete duplicate data because your system stores duplicate data throughout the day, the system needs to be redesigned.

System Tests in Rails 5.1 is awesome

System tests were announced for Rails 5.1 a few months ago. Plethora of blog posts were written about this topic but I’m still going to write about it due to how much it improved my testing life.

For people who haven’t seen the RailsConf talk about system tests, below is the video.

What are System Tests in Rails?

Just for those who may not be familiar with the term “System Tests”, it’s just simulating real user actions in a browser, automatically clicking through various parts of your application via tests, and then asserting that certain things should have happened. One example would be, a user clicks a button, a modal pops up, and so forth.

This was always possible to do in previous version of Rails by using the capybara gem, but configuring it to work flawlessly, especially when there are a lot of JavaScript code in your application was a nightmare. Maybe other people were able to get it working well, but I never could, especially when using it with the Rails’s default testing framework . Most of the documentation on integrating capybara was based on RSpec, and even in those cases, the tests would be flakey especially when there were a lot of JavaScript interactions on the page that you were testing. Thus, in many cases, I would just not write capybara tests just due to the sheer headache of making it work.

Setting up capybara properly would involve installing capybara, configuring it to work with the testing framework you are using, configuring database cleaner properly to get it to run properly alongside your other tests, choosing and configuring your capybara driver (poltergeist, capybara-webkit, etc.), and etc. If one setting in your configuration was off, your tests were going to be flakey.

Awesomeness of System Tests baked into Rails

The problems of manually integrating capybara to work with previous versions of Rails are completely solved with system tests. It works… out of the box… with no configuration on your part.

The way I used to test Rails applications was this:

  • Write controller tests, assert proper responses
  • Write unit tests for some critical classes
  • Write capybara tests for absolutely critical pages, and then get annoyed when tests run in a flakey manner

In some cases, I would simply not write capybara tests and rely on the QA process to catch bugs. With system tests baked into Rails, I found myself writing a ton of capybara tests just because it was much easier than before (not to mention that seeing the fast automated interactions in the browser driven by selenium is fun).

This was one of those “Ahh, I feel happy that this exists” posts. Props to the Rails team for finally baking this feature in.

Managing localization in Rails

In large applications, managing localization files can become a behemoth of a task, especially if you’re working with external translation teams who will be providing you with proper translations of YAML files (or XML, Ruby hash, files, etc.).

Most Rails applications that have an international user base will have localization features built in. This means that in your views, rather than having something like this

You’ll have something like this instead

This <%= t('key_name') %> method will search for the matching key in your localization file and display the matching string based on your app’s locale setting. Most of how to set your app-wide locale can be easily found by Googling. This blog post will cover the lessons I’ve learned with to manage localization in a more organized way while working on a large Rails application.

Set team wide formatting rules

The first and most important thing when working in a team environment on a large Rails application is to agree upon a formatting rule when adding in new localization texts. There are several reasons for this.

  1. Consistency between naming convention for localization keys
  2. Eliminates potential duplicate strings
  3. Reduces chance for Rails app to fail booting up due to errors in string formatting
  4. Establishes clear set of rules when working with third party vendors

Let’s go over these one by one.

#1 – Consistent Naming Conventions

It’s not uncommon to see thousands of localization keys in a large Rails application. It may be easy to find duplicate strings when your application is small, but when your localization files contain thousands of strings, it can be difficult to search for the correct key you want to use when building out your HTML pages. Also, not having a team wide naming convention set can lead to duplicate strings, which can cause hard to catch bugs in your views.

Let’s say that for example, you have two keys in your YAML file that goes like this:

All three keys will display the text “Welcome back” with a few differences. The keys “welcome_back” and “label_welcome_back” will both display “Welcome back!” while the key “text_welcome_back” will display the same text with an exclamation point.

Let’s say you have an application with multiple developers working on it. If you have a localization file like the one above, different developers may start using different keys based on what their “Find” function in their editor will locate first. Depending on what text you’re supposed to display, some developers may end up displaying the incorrect version of the text. For example, what if you’re not supposed to display the exclamation point in the view, but you end up doing so because you accidentally used “text_welcome_back”?

The issue becomes even more transparent when you have to start changing the actual text in the views. To change the text of the keys, you’ll have to go into your localization files, try to figure out which text you’re supposed to change, and hope that changing one of the “welcome_back” texts won’t negatively affect all the other views that might be incorrectly using that key. For example, if you change the text for “welcome_back” to close a ticket for your specific page, it has a chance of creating a bug for all other pages that weren’t supposed to be using that specific “welcome_back” key.

Therefore, it’s important to set team wide rules on how to name localization keys. It really doesn’t matter how you do it. The key is to set a ground rule and have your team be consistent with it when adding new keys. I’ve seen it done in different ways. Some teams have duplicate texts, but have different keys based on where they’re supposed to use that specific version of the key. For example, “text_welcome_back” may just be a simple text while “label_welcome_back” may only be used as a label on a form. I’ve seen teams just stick with “welcome_back” and that’s the only key for that text that they’re allowed to have.

One final important rule that you should set (because I always have trouble keeping this consistent) is whether you should include punctuation in your keys or not, and if you do, how you should name your keys. For example, if you really do need to have “Welcome back!” as the text, would you leave the exclamation point in your localization text or hardcode it into your HTML? If you have it in your localization file, should you name your keys differently like “welcome_text_exclamation”? Personally, I think you should have the exclamation point inside your localization files because certain languages like Spanish will have the upside down exclamation point before the first word. In these cases, it’ll make translating your app into other languages much easier since punctuation will already be accounted for in your localization files for languages that treat them differently.

#2 – Eliminates potential duplicate strings

I mentioned one potential issue that may arise from having inconsistent naming conventions where it may become difficult to figure out which locale string you’re supposed to be using and/or changing as you code. Another benefit of having consistent naming conventions is that you eliminate duplicate keys/value pairs in your localization files. For example, when I search for a text to use in a localization file, I’ll usually just use my text editor’s search feature to look for which key I should be using. Sometimes, I’ll find two key/value pairs with the same exact texts but different keys. Sometimes I’ll find similar texts but with different keys. This always makes me go, “Okay… which one am I supposed to be using?”.  Not having any duplicate keys and texts in your localization files reduces this unnecessary mental road block.

#3 – Reduces chance for Rails app to fail booting due to errors in formatting

If your localization files have improperly formatted strings, your Rails app will fail to boot. Worst of all, the logs will not tell you exactly which of your localization file is causing the issue nor will it tell you which line. This is incredibly frustrating to debug and always has me performing a manual binary search (temporarily delete first half of a YAML file, see if app boots up, if not repeat) through a large localization file(s) to figure out which key/value pair is causing the issue. And yes, sometimes it’s literally one key/value pair that’s blocking the entire app from booting.

The cause for improper string formatting can vary. Sometimes, it can even be caused by inconsistent string formatting between various key/value pairs. Therefore, it’s much more sane to keep a consistent rule when it comes to formatting the string values for your texts. I like to follow this rule when it comes to Rails apps.

  • Always wrap your strings in double quotes
    • You’ll see that some values for certain locale keys won’t have any sort of quotation surrounding it. While this may work, if some of your texts require double quotes, you may run into parsing errors in subsequent key/value pairs that follow.
    • If you wrap your strings in double quotes, you can easily add in single quotes inside your double quotes without any issues.
    • Double quotes allow you to easily interpolate variables into your locale calls
  • If your text requires to display double quotes, then simply escape the double quotes with a backspace.
    • For example, if your text requires you to display Hi "Benjamin", then do something like this hi_benjamin: "Hi \"Benjamin\"" will allow you to display your double quotes and allow you to go to the next line in your localization files without any parsing issue.

If you follow these two rules consistently, you will have zero (or very few) issues when it comes to experiencing boot up errors in your Rails apps. In addition, having these set of rules will make working with third party translation vendors much easier.

#4 – Establishes clear set of rules when working with vendors

As mentioned in the last point, if you’re working with professional third party translation vendors, having a clear set of formatting rules will make your life 10 times easier. If you don’t set some sort of ground rules with your translation vendor, they may send you huge YAML files with inconsistent formatting rules, causing your Rails app to crash. However, if your team already has a clearly defined set of rules, you’ll have a more smooth experience working with third party vendors.

Autogenerated translations

It is entirely possible to have your app translated into different languages automatically via services like Google Translate and Yandex. Sounds magical? Well, there are advantages and disadvantage to this method.

Advantages:

  • Fast and easy way to have your app translated into multiple languages
  • Cheaper than hiring real human translators
  • You can write a rake task to auto translate any new strings you have added into the languages that are made available in your app

The way this works is that you’ll write a script that you run as a rake task that will grab all key/values of your locale texts, loop through them, send a request to a translation API (Google Translate, Yandex, etc.), and build up a new YAML file for the language of your choice. This sounds great until you see the results which is in the disadvantages list.

Disadvantages:

  • The translation is almost always terrible

Yep, the resulting translation usually is terrible, especially if it’s a language that’s difficult to translate into. For example, machine translation from English to Spanish may work “ok”, but from English to something like Chinese probably will not. This is where working with professional translation services come in if your app requires proper translations.

Working with external vendors

If you want your app to read well (or even make sense) in other languages, you’ll have to bring in a professional translation vendor. Most of the time, what you’ll be doing is handing over your main YAML (or XML, Ruby hash file, or etc.) to them and then they’ll send you over the YAML files in the languages you requested. You’ll then take these YAML files and then merge them back into your codebase.

I’ve already mentioned that you should set text formatting rules with your vendor so that you don’t have any problems with your app booting up. Assuming you have this part taken care of, I’ll list a few things that can make your life working with vendors much easier.

  1. Setting up a workflow to receive your translated files
  2. Falling back on your autogenerated translation keys if there’s no match

#1 – Setting up a workflow for your translation files

There are multiple ways that you can work with translation vendors, but the way I’ve worked with them involved me providing the English YAML files (usually named “en.yml”) and then having the translation vendor send the translated files back to me (includes something like “ko.yml”, “es.yml”, “fr.yml” and more). I would then take these files and merge them back into the codebase, and then start the local Rails server to check that the app boots up properly.

You can utilize different ways of managing the sending and receiving of the YAML files. I’m sure platforms like Dropbox and Google Drive work fine (heck, I’m sure you can even email them back and forth, although I wouldn’t recommend it since managing that would get confusing real quick), but I found that using Github repos work even better. With Github repos, you can easily merge files back and forth between your main code repository and your “locale files” repository, and use git to resolve any differences.

With Github repos, anytime there were changes in the main YAML file (in my case, English), I would sync that file into the locale files repository on Github. The translation vendor would then pull down the latest “en.yml” file, translate the texts, and then push the latest set of YAML files in different languages to the repository. I would then pull down the latest YAML files, and then merge them back into the app’s codebase.

Of course, there are different software solutions out there to manage this process, but I found using git to be the most simple and economical to manage localization files when working with translation vendors.

#2 – Falling back on autogenerated translation keys

Sometimes, you might run into issues where you’re displaying a text that exists in your default YAML file (en.yml for me) but that key doesn’t exist in the YAML files provided by your translation vendor. This can happen since the translation vendors are humans at the end of the day, they might miss translating some texts. If this happens, it’s possible to “fall back” on your autogenerated locales if you happened to have those lying around.

To do this, simply have two sets of translation files for each languages. First set is the custom ones from the translation vendor and the second set is the autogenerated ones that you built with making API calls to services like Google Translate.

You want to then organize your translations files in your config/locales folder.

I like to put all of the autogenerated localization files under config/locales/auto_generated and then the custom translated localization files under config/locales/custom folder.

And then in your config/application.rb

What this will do, especially the config.load_path += line, is that it’ll load up all of the localization files in your locales folder, and then sort them alphabetically. And since the word “custom” comes after “auto_generated”, when Rails looks up the key/value pairs of locales, it’ll first look for the locale key in the auto_generated one and then in the custom one. And since the custom translation comes after the auto_generated one, the custom translation will be utilized rather than the auto_generated one. And if for some reason the custom translation does not exist, the auto_generated translation will be utilized instead since it has already been loaded by Rails.

This way, you can always have a fallback option in case there are missing locale keys in your custom localization files.

Writing tests

When working with translation vendors, there is a chance that the localization files that you get back from them will have typos. The typo I’ve seen most often have been that the language key that’s on the top of the YAML file would be a different one.

For example, here’s a typical YAML file that would represent a Korean translation.

See that “ko” definition? Rather than that, sometimes I would get back files with some other language key instead, like “es” which is usually used for “Spanish”. I’ve had times where I would merge this typo into the codebase, deploy and then have all the Korean users have their version of the app displayed in Spanish.

This is one of those typos that’s easy to miss and can cause all sorts of annoyance for your international users. Thus, I like to write tests that check for the integrity of the YAML files before you merge and deploy your app. Here’s are some typical tests I like to write to check for the integrity of the YAML files.

All this test does is it loads up the individual YAML files, and then check for the integrity of the key of that YAML file. For example, it’ll load up a French YAML file, and then make sure that the key definition of that file is set to “fr”, so that French users won’t end up accidentally seeing Russian (it’s happened once…). This is just an example of a test you can write to check for the integrity of your localization files. I’m sure that there are different types of tests that you can write depending on your situation.

Final Notes

I wrote in the first sentence of this post that managing localization in large apps can be a behemoth of a task. It is if you don’t have a process set up for managing it. However, if you set up a organization wide process for managing it in a systematic way, it becomes one of those 5 minute tasks that’s painless to do. I’m hoping this post can be helpful for those with apps that need to support multiple languages.

Edit (May 15th, 2018)

I’m building an application to solve the problems that I listed in this post. I describe the application that I’ll be building here.

RubyC 2017 – Kiev Thoughts

I’m still in Kiev, Ukraine right now and I had the opportunity to attend RubyC 2017 conference over the weekend.

Despite having written Ruby professionally for a few years now, this was my first Ruby related conference. The tickets were only $100 so I figured that it would be a great opportunity to attend a software development conference as well as doing something productive and career related over the weekend.

Both Ruby and Ruby on Rails is over 10 years old at this point and one can make the argument that both the programming language and the web framework are both stable and mature. Due to it, the language has been “evolving” so to speak for the last few years and not too many ground breaking changes and/or features have been introduced. Also, the reason Ruby became popular at all is due to the rising (and eventual dominance) of Ruby on Rails. Ruby on Rails is essentially the “elephant” in the room. In fact, this was emphasized in many of the talks that were presented at the conference.

Pouches for speakers to sit for open talk session

One of the talks that I found to be the most thought provoking was “Ruby 4.0: To Infinity and Beyond” by Bozhidar Batso. He’s apparently the author of RuboCop which I found to be impressive. I love that library  as it helps me write cleaner code and makes me adhere to the basic programming and syntactical that I set for myself.

One thing Bozhidar Batso mentioned was that not much has changed in the last few version of Ruby. He had a few slides where he listed the major list of changes from Ruby 2.0 – 2.4 (I think he started from 2.0) and he did have a point in that each release since Ruby 2.0 has been incremental. In fact, the only change that I could think of that has been made from Ruby 2.3 – 2.4 (and this was listed in Bozhidar’s slide) was the unification of Fixnum and Bignum into the Integer class which generated a lot of deprecation warnings in some gems. Aside from that, nothing major sticks out in my mind.

Bozhidar also mentioned that Ruby is the only programming language where certain features and changes made are driven by the web framework that dominates the language, in our case Ruby on Rails. He mentioned how insane this is and a web framework shouldn’t influence the design of the programming language that it’s written in. He had a point there, but this made me notice that there was a lot of, “WE NEED AN ALTERNATIVE TO RAILS” mentality not only in the speakers but also by a lot of the audience. For example, when a statement about some negative aspects of ActiveRecord was mentioned, there were a few cheers made by the audience.

This is something I do not understand. The fact that as soon as something gets popular, we humans have a natural tendency to immediately start hating on it. This doesn’t just happen in technology stacks but even in coffee chains in Starbucks. How many times have you heard Americans say that they hate Starbucks and the coffee served is terrible? I personally think Starbucks coffee is pretty good, especially if you get it fresh in the morning. It’s strong and consistent.

Same goes for Ruby on Rails. As soon as it became the default go-to web framework for Ruby, there have been many talks about the need to replace it and that it’s inappropriate that something so dominant. During the conference, the Hanami web framework was proposed as something that could give Rails some competition.

Personally, I like Ruby on Rails. I think it’s well designed, easy to work with, and the most productive web framework in the market right now. Is it perfect? No, of course not. But is it a good tool for building web applications? Yes, it’s excellent.

There was an open talk session where the audience had an opportunity to ask the speakers any questions they had. One of the speakers mentioned that despite other speakers demoting Ruby on Rails and promoting something else, if he was the one paying for the software project, he would go with Ruby on Rails every time due to its emphasis on productivity and set of sane default configurations that it comes with. I completely agree with this. If I’m paying other developers to write a web application, I’m going to go with whatever is most productive and gets the job done in a timely manner with reasonable code quality. Also, due to its convention over configuration philosophy, new developers on the team can get acquainted with the codebase very quickly. Despite being over 10 years old, Ruby on Rails still hits on these points better than other technology stacks out there.

The speaker who gave the answer in terms of, “If I’m putting up my own money…”made me think of a training curriculum that all software developers should be required to go through. I think all software developers should have to go through a phase in their career where they have to manage a project from beginning to completion and through maintenance cycles where the team members are switched once every few months. Essentially, the training regimen should replicate a professional work environment as much as possible. I think a lot of software developers put too much emphasis on the engineering aspects of software development rather than the pragmatic side of it. Maybe, the training regiment should also include a portion where the participant loses money if he/she doesn’t complete the training regiment successfully. I’m willing to bet that a lot of participants will opt for tried and true technologies like Ruby on Rails than to Hanami or “enter the cool new shiny programming language/framework of the week” if their own money on was on the line.

Easy way to speed up your automated tests

Just a note before we start, this will be a Ruby on Rails centric post with code samples in written in Ruby. However, the concepts in this post can be applied to any tech stack, whether that would be Laravel, Django, Phoenix, and even mobile platforms like iOS and Android.

Most software developers are knowledgable enough to know that interacting with a database can be expensive. For example, one expensive query on a large dataset can seriously hamper your application’s performance. In the case of Ruby on Rails, this may include eager loading associations and avoiding ActiveRecord altogether and writing raw SQL queries. However, in many of the codebases I work on, I see that many developers do not adhere to this principle when writing automated tests.

For example, let’s say that we’re working on a Rails project for managing a list of books, the authors, and publishers. We have three models: Book, Author, and Publisher. Book belongs to Author and Publisher, and both Publisher and Author has many books. Below are the three hypothetical models.

Book

Author

Publisher

And now, for the purposes of writing our tests, let’s say that we’re using the FactoryGirl gem have three factories set up for each of the models defined above.

Book Factory

Author Factory

Publisher Factory

Okay, all of our models and factories are set up now. We are finally ready to write our tests!

First common mistake: Creating unnecessary records

I’ll give an example of a BooksController with the index method and then provide a “typical” test that I see often. This is the “bad” test.

Simplified BookController

BookControllerTest

Here, we have a pretty simple BooksController where we load up all of the books in the index method. And in our BooksControllerTest, we are creating 10 books and then testing the index route of the BooksController. This test will pass.

However… this test is not optimally written. Our Book factory has two belongs_to associations: Author and Publisher. In the setup block of our BooksControllerTest, where we loop 10 times to create 10 books:

This line of code will create not only 10 books, but a new set of authors and publisher each time the loop is executed. This line of code is devious as it looks like it’ll only insert 10 books into your test database. But in reality, it’s inserting 30 records (1 author and 1 publisher for each book), which is 3 times more records than the test needs.

These are unnecessary database writes that may seem innocent when your test suite is small, but it can seriously slow down your overall test suite as you write more tests.

The optimal way to write the BooksControllerTest should be

Modifying your tests this way, where you specify the author and the publisher that each individual book will belong to will ensure that you do not create unnecessary records (in this case, new set of authors and publishers) in our database, significantly improving performance.

In one of the projects I’ve worked on, just implementing this reduced the time needed to run the test suite from something like 13 minutes down to 7 minutes. You can see how this simple change can speed up the test suite by a lot.

Second common mistake: Writing to the database to check validations

In our example, all of our models validates the presence of the name field. Let’s write a test to make sure our models are adhering to this validation. I’ll show an example with the author model.

Above are two simple validation tests that check for the presence of the name field in the Author model. The above tests will pass, however they are not efficient. The reason for this is the same as the first common mistake which is unnecessary writes to the database.

You don’t have to actually write to the database to check the model validations. You can simply build a new instance of the model and then check the validity from there. Let’s rewrite this test to run faster.

Calling the build method on FactoryGirl will instantiate a random Author model rather than creating a new author and writing that author to the database. Also, rather than writing to the database to check the validity of the instance of the model, you can just call valid? on the model to see if the model is valid or not. This approach saves unnecessary writes to the database which in turn speeds up your test suite.

Closing note…

You can see that neither of the two examples above are difficult concepts to grasp. It’s just avoiding unnecessary database operations in your tests. These simple adjustments can make drastic performance improvements to the speed of your test suite.

Also as I mentioned at the beginning of this post, while this post was Rails centric, the same concepts can and should be applied to whatever language/framework you’re using.