Using Hstore in Rails 4 with PostgreSQL

Hstore basically gives you a schema-less data store in your PostgreSQL DB. This allows you to store the Hash in a Database column.

If you want to query Model records, it is very difficult if you use Serialize to store dynamic attributes. Hstore acts same as Serialize but we can additionally query the model records using Hstore.

Please follow below steps to enable Hstore in a Ruby on Rails 4 application.

rails generate migration enable_hstore_extension
class EnableHstoreExtension < ActiveRecord::Migration
  def change
    enable_extension 'hstore'

<pre>rake db:migrate</pre>

<pre>rails g migration AddPropertiesToChocolate</pre>

class AddPropertiesToChocolate < ActiveRecord::Migration
  def change
    add_column :chocolates, :properties, :hstore

<pre>rake db:migrate</pre>

<strong>Rails console:</strong>

p = Chocolate.last = {'color' =&gt; 'orange'}['price'] = '5'
Chocolate.where("properties -&gt; 'color' = 'orange' ")
Chocolate.where("properties -&gt; 'price' ='5' ")

Finally, you can define accessors for your Hstore keys in your model. Validations work just like they would for any other column in your model.

class Chocolate &lt; ActiveRecord::Base
  # setup hstore
  store_accessor :properties, :color

  # can even run typical validations on hstore fields
  validates :color,
  inclusion: { in: %w{orange gold red} }


Optimize the DBMS – Whoosh! goes the App

As we all are aware, SQL optimizations are required for speeding up applications. There situations in my career where we felt like MySQL’s MyISAM engine is not fast enough! There are multiple factors that may slow down the SQL. They could be categorized into three categories.

  1. The database Software/Engine
  2. The database design/architecture
  3. The SQL used

We will get to each of this one by one.

1. The database Software/Engine

For normal websites including e-commerce sites, the most executed SQL statement is SELECT. There may be updates when someone buys a product. Admin will be doing INSERT and DELETE too. Still the most used one is SELECT statements. So experts recommend DBMS which are fast in SELECT statements and MySQL with MyISAM engine is always recommended for faster SELECT statements. Problem with MyISAM is the lack of foreign key support. InnoDB engine supports foreign Keys, but is slower compared to MyISAM engine.

Again we can see that MySQL 5.0x.xx is slower than MySQL 5.1x.xx. You can read more about it here.

So we need to choose a database/engine that fits our requirement. If you intend to have only few hundred SELECT statements per minute; it won’t be a big concern.

2. The database design/architecture

The design has a huge impact on performance. For those who have a computer science degree are aware of database normalization. Unfortunately, I have never heard a non CMM level company’s Project Manager or team lead talk about normalization. On the other hand I heard people saying that ‘normalization will slow down the application’. In reality, its both true and false.

When a database is designed in 3rd normal form, the database is considered ‘normalized’ and you can go to BCNF or 4th normal form.

When we normalize a database, we have four goals: 1. Arranging data into logical groupings such that each group describes a small part of the whole. 2. Minimizing the amount of duplicate data stored in a database 3. Organizing the data such that, when you modify it, you make the change in only one place. 4. Building a database in which you can access and manipulate the data quickly and efficiently without compromising the integrity of the data in storage.

When it’s mostly SELECT only system like a blog where few people insert or update data and thousands view it, it’s Ok to have denormalized DB design. But for ecommerce systems or similar that have extensive use of inserts/updates, it’s always better to reduce no. of places to update for each transaction. Denormalization may make the system to use less number of SELECT queries and make complex SQLs more simple, but when it’s time to insert or update, the application needs to execute multiple INSERT or UPDATE statements. In effect, normalization may make the system to use complex SELECT statements with joins, but denormalization may make the application slower by needing us to execute more INSERT/UPDATE statements.

We need to analyse the system first to see how normalized our system needs to be. A standard piece of database design guidance is that the designer should first create a fully normalized design; then selective denormalization can be performed for performance reasons.

Another important fact is that, all fields that appear in a WHERE clause should be indexed for better performance of SELECT statements. on the other hand, over indexing will slow down the INSERT/UPDATE statements.

3. SQL Used

We can optimize SQLs used in the application. Simplest tip is: never use * in a select statement. Select the fields you need. Another is to specify the table names if we join tables. For example

Let A and B be two tables in a database; and we need only field1 and field2 from table A and field1 from table B, then instead of



SELECT A.field1. A.field2, B.field1 FROM A, B WHERE = B.t1id`

There are cases where multiple SELECT statements and the language you use to write the application is faster than complex joins in SQLs. Think wisely 🙂

In MySQL, they support specifying which index to use using index hint statements like FORCE INDEX, USE INDEX etc. You have to look into the features provided by the DBMS being used to find out the best method to optimize your SQL.

Database Design
Database Normalizations wiki
Database Normalizations explained
MySQL index hint statements