Refactoring in Ruby in Haskell

Recently, my team at work read the first few chapters of Refactoring: Ruby Edition, a 2009 translation by Jay Fields and Shane Harvie of Martin Fowler’s Refactoring from 1999.

The book’s first chapter takes the reader through a refactoring of a small example program, with incremental code changes and their motivations explained along the way. The chapter is presumably meant to give the reader a taste of what the authors consider a well-managed refactoring session.

Of the chapters we read, except for a handful of points, I don’t have a strong positive or negative opinion on the authors’ arguments and insights, (given the book’s context). The dominant frustration I did have while reading the chapters was from the staggering proportion of considerations that would have been obviated by an expressive static type system.

Naturally, I translated the refactoring session from Chapter 1 into Haskell.

There are significant differences from the Ruby examples and this Haskell translation. In translating, I tried to be faithful to (my reading of) the spirit of the Ruby examples by writing code that might have been written by a junior dev, hammered out in haste, or hacked together without intention of ‘making it to production’: it fulfills its stated purpose and nothing more. The point is this is straightforward Haskell code, and its differences from the Ruby examples are attributable to differences between the respective languages.

This is a working literate Haskell program, which you can download here. You can load it in ghci with :load refactoring_1.lhs, and inspect any value defined here. If you need to install Haskell first, follow this guide.


First, some basic imports.

module Refactoring where
import Data.List (intercalate)

The Starting Point

The stated purpose of the example program is “to calculate and print a statement of a customer’s charges at a video store.”

First a Ruby snippet, followed by its Haskell translation.

class Movie
  REGULAR = 0
  NEW_RELEASE = 1
  CHILDRENS = 2

  attr_reader :title
  attr_accessor :price_code

  def initialize(title, price_code)
    @title, @price_code = title, price_code
  end
end
data MovieType = Regular | NewRelease | Childrens
data Movie = Movie {title :: String, priceCode :: MovieType}

The movie price codes were defined as integer constants in Ruby, but their purpose was to be discriminated in a case statement as though they together formed an enum. Ruby doesn’t have enums, but Haskell does, as a simple use case of an algebraic data type.

Haskell doesn’t have classes, so our Movie is just a data type. This does nothing more than introduce the Movie type along with its only constructor, also named Movie, and the record of the two fields it represents.

class Rental
  attr_reader :movie, :days_rented
  def initialize(movie, days_rented)
    @movie, @days_rented = movie, days_rented
  end
end
type DaysRented = Int
data Rental = Rental {movie :: Movie, daysRented :: DaysRented}

Our Rental type and constructor here work the same as Movie above. We give an alias to the Int type called DaysRented so that we can keep track of what that Int is supposed to represent. To be clear, this gets us no more type safety than using Int directly (the type system treats them as interchangeable), but it helps our type signatures better document programmer intent.

class Customer
  attr_reader :name
  def initialize(name)
    @name = name
    @rentals = []
  end
  def add_rental(arg)
    @rentals << arg
  end
data Customer = Customer {name :: String, rentals :: [Rental]}

addRental :: Customer -> Rental -> Customer
addRental cust rental = cust {rentals = rental : rentals cust}

This Customer type and constructor should look familiar. The addRental function creates a new Customer with the name and rentals of the given Customer, except with an additional rental pushed onto the front of the list with the : operator (pronounced ‘cons’). The cust {rentals = ...} bit is called record update syntax, and it’s how you create a new record by updating the field of an existing record. Unmentioned fields like name get passed through unchanged.

# inside Customer class
  def statement
    total_amount, frequent_renter_points = 0, 0
    result = "Rental Record for #{@name}\n"
    @rentals.each do |element|
      this_amount = 0

      # determine amounts for each line
      case element.movie.price_code
      when Movie::REGULAR
        this_amount += 2
        this_amount += (element.days_rented - 2) * 1.5 if element.days_rented > 2
      when Movie::NEW_RELEASE
        this_amount += element.days_rented * 3
      when Movie::CHILDRENS
        this_amount += 1.5
        this_amount += (element.days_rented - 3) * 1.5 if element.days_rented > 3
      end

      # add frequent renter points
      frequent_renter_points += 1
      # add bonus for a two day new release rental
      if element.movie.price_code == Movie.NEW_RELEASE && element.days_rented > 1
          frequent_renter_points += 1
      end

      # show figures for this rental
      result += "\t" + element.movie.title + "\t" + this_amount.to_s + "\n"
      total_amount += this_amount
    end
    # add footer lines
    result += "Amount owed is #{total_amount}\n"
    result += "You earned #{frequent_renter_points} frequent renter points" result
  end
end
statement :: Customer -> String
statement c = unlines
    [ "Rental record for " ++ name c
    , intercalate "\n" rentalReportLines
    , "Amount owed is " ++ show totalAmount
    , "You earned " ++ show totalFrequentRenterPoints ++ " frequent renter points"
    ]
  where
  (rentalReportLines, totalAmount, totalFrequentRenterPoints) = foldl f ([], 0, 0) (rentals c)

  f :: ([String], Double, Int) -> Rental -> ([String], Double, Int)
  f (result, accAmount, accFRPts) rental =
    let chrg = charge rental
        pts = frequentRenterPoints rental
        rentalReportLine = "\t" ++ title (movie rental) ++ "\t" ++ show (charge rental)
    in (result ++ [rentalReportLine], accAmount + chrg, accFRPts + pts)

  charge :: Rental -> Double
  charge (Rental m nDays) = case priceCode m of
    Regular    -> 2.0 + (if nDays > 2 then (fromIntegral nDays - 2) * 1.5 else 0)
    NewRelease -> fromIntegral nDays * 3.0
    Childrens  -> 1.5 + (if nDays > 3 then (fromIntegral nDays - 3) * 1.5 else 0)

  frequentRenterPoints :: Rental -> Int
  frequentRenterPoints (Rental (Movie _ NewRelease) nDays) | nDays > 1 = 2
  frequentRenterPoints _ = 1

The Data.List.intercalate function is simply what most other languages call join on a list of strings. The unlines function is like intercalate "\n" except it also adds a newline at the end.

The statement function is the meat of the original example, and its translation required the biggest departure from its Ruby counterpart. The Ruby version begins by initializing a few local accumulator variables which are added to or appended to throughout the method body. In Haskell, we don’t have assignment operators, because our variables aren’t references, they’re immutable values (this is Haskell’s purity). Instead, threading accumulator state through a computation is accomplished by folding over a list. The helper function f defines how to add or append to our accumulator values as we fold over the list of the customer’s rentals.

It would have been arguably easier to build up a statement with a few maps instead of a single fold, but that would have left us with even less to refactor.

We also have charge and frequentRenterPoints defined in statement’s local scope inside a where clause. In Haskell it is easy and natural to define local helper functions like this without worrying about polluting any broader namespace. None of these helper functions or intermediate values are in any way visible outside the scope of statement, and we haven’t done any preemptive modularization. We of course could have put the definitions of charge and frequentRenterPoints in the let bindings of the f helper, where they are used, but in either case the pattern matching and guards would have worked and looked almost exactly the same.

Comments on the Starting Program

We anticipate a couple of upcoming changes to which this program will need to adapt. The first is that the plaintext statement will need an HTML-generating counterpart. The second is that the MovieType system for classifying movies will change in an unknown way, and the formula for calculating charges for each MovieType will need to change with it.

The First Step in Refactoring

Here, the authors say (paraphrased) that an efficient refactoring session will need to be supported by solid tests, so that we find out quickly if we introduce regressions. He doesn’t give examples of the tests, but I will note here that the Haskell translation has barely any testable surface area that isn’t already covered by the type system.

That is to say, expressive, statically checked types get us the same fast regression feedback during a refactoring session that we rely on tests for in Ruby. Moreover, we get the benefit with none of the investment required by hand-written tests, we are guaranteed full coverage, and most often we can display type errors directly in our editor at the error site instead of at the broken test.

Decomposing and Redistributing the Statement Method Function

The authors’ first goal is to pull out the section of statement which handles calculating the amount to charge for each rental. The session proceeds in several stages. First, a preface is given regarding how to analyze the surrounding code to ensure we don’t affect what remains when we extract a chunk of it. This is followed by the actual extraction into a method in the Customer class, and then by renaming a now-local variable so that it makes more sense in its new context. Finally, the new method is moved to its natural home in the Rental class, with the call site in statement updated to reflect its new location.

Eliding several intermediate steps, we end up with this Ruby.

# in Rental class
  def charge
    result = 0
    case movie.price_code
    when Movie::REGULAR
      result += 2
      result += (days_rented - 2) * 1.5 if days_rented > 2
    when Movie::NEW_RELEASE
      result += days_rented * 3
    when Movie::CHILDRENS
      result += 1.5
      result += (days_rented - 3) * 1.5 if days_rented > 3
    end
    result
  end

# in Customer class
  def statement
    total_amount, frequent_renter_points = 0, 0
    result = "Rental Record for #{@name}\n"
    @rentals.each do |element|
      this_amount = element.charge
      # ^^^^ changed here ^^^^^^^^

      # add frequent renter points
      frequent_renter_points += 1
      # add bonus for a two day new release rental
      if element.movie.price_code == Movie.NEW_RELEASE &&
               element.days_rented > 1
          frequent_renter_points += 1
      end

      # show figures for this rental
      result += "\t" + each.movie.title + "\t" + this_amount.to_s + "\n"
      total_amount += this_amount
    end
    # add footer lines
    result += "Amount owed is #{total_amount}\n"
    result += "You earned #{frequent_renter_points} frequent renter points"
    result
  end
charge :: Rental -> Double
charge (Rental m nDays) = case priceCode m of
  Regular    -> 2.0 + (if nDays > 2 then (fromIntegral nDays - 2) * 1.5 else 0)
  NewRelease -> fromIntegral nDays * 3.0
  Childrens  -> 1.5 + (if nDays > 3 then (fromIntegral nDays - 3) * 1.5 else 0)

In our Haskell example, we can simply unindent the charge helper so that it gets raised from the statement local scope into the Refactoring module scope, and move it to above or below the statement definition. If we wanted, we could rename charge to something else, or move it to a new file. Note that we would see a compiler error for any of these changes if we moved or renamed this function without updating its call sites. Also note that in all cases its definition wouldn’t need to change, since it would have been less convenient to write it any other way in the first place.

The authors discuss a couple of concerns that are simply non-issues in Haskell. We can’t give a different local name to this function’s Rental argument (which was a refactoring step in the Ruby example), since we never gave it a name! We only pattern matched on its contents. We also can’t worry about whether to make it a method on Customer or on Rental, since data types aren’t namespaces like classes are, and Haskell functions don’t have a magic self argument to worry about like OO instance methods. Our charge function is simply a pure function in our module.

Extracting Frequent Renter Points

The authors perform a similar extraction on the section of statement that deals with frequent renter points. The authors elide these steps to a single operation, so we can assume the same steps were taken as in the previous section which dealt with extracting the charge computation out of statement.

# in Customer class
  def statement
    total_amount, frequent_renter_points = 0, 0
    result = "Rental Record for #{@name}\n"
    @rentals.each do |element|
      frequent_renter_points += element.frequent_renter_points
      # ^^^^^ changed here ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

      # show figures for this rental
      result += "\t" + each.movie.title + "\t" + element.charge.to_s + "\n"
      total_amount += element.charge
    end
    # add footer lines
    result += "Amount owed is #{total_amount}\n"
    result += "You earned #{frequent_renter_points} frequent renter points"
    result
  end

# in Rental class
  def frequent_renter_points
    (movie.price_code == Movie.NEW_RELEASE && days_rented > 1) ? 2 : 1
  end
frequentRenterPoints :: Rental -> Int
frequentRenterPoints (Rental (Movie _ NewRelease) nDays) | nDays > 1 = 2
frequentRenterPoints _ = 1

Again, this is a simple change in the Haskell example. We unindent frequentRenterPoints to bring it out into module scope, and the definition of the function needs no modification.

Removing Temps

For the sake of removing the accumulator variables used in computing the aggregate metrics over all rentals for a Customer, the authors pull total_amount and frequent_renter_points into their own Customer methods.

class Customer
  def statement
    result = "Rental Record for #{@name}\n"
    @rentals.each do |element|
      # show figures for this rental
      result += "\t" + each.movie.title + "\t" + element.charge.to_s + "\n"
    end
    # add footer lines
    result += "Amount owed is #{total_charge}\n"
    result += "You earned #{total_frequent_renter_points} frequent renter points" result
  end

  private

  def total_charge
    @rentals.inject(0) { |sum, rental| sum + rental.charge }
  end

  def total_frequent_renter_points
    @rentals.inject(0) { |sum, rental| sum + rental.frequent_renter_points }
  end
end
totalCharge :: Customer -> Double
totalCharge cust = sum $ map charge $ rentals cust

totalFrequentRenterPoints :: Customer -> Int
totalFrequentRenterPoints = sum . map frequentRenterPoints . rentals

The analogous definitions in Haskell are just as short as the rather concise Ruby definitions. They get slightly shorter if we define them by composing functions instead of fully applying them, as in totalFrequentRenterPoints.

And with that, we can rewrite our statement function to make use of our refactorings.

statement' :: Customer -> String
statement' c = unlines
    [ "Rental record for " ++ name c
    , intercalate "\n" rentalReportLines
    , "Amount owed is " ++ show (totalCharge c)
    , "You earned " ++ show (totalFrequentRenterPoints c) ++ " frequent renter points"
    ]
  where
  rentalReportLines = flip map (rentals c) $ \rental ->
    "\t" ++ title (movie rental) ++ "\t" ++ show (charge rental)

Finally, we can do what we set out to and write our html statement.

# in Customer class
  def html_statement
    result = "<h1>Rentals for <em>#{@name}</em></h1><p>\n"
    @rentals.each do |element|
      # show figures for this rental
      result += "\t" + each.movie.title + ": " + element.charge.to_s + "<br>\n"
    end
    # add footer lines
    result += "<p>You owe <em>#{total_charge}</em><p>\n"
    result += "On this rental you earned " +
           "<em>#{total_frequent_renter_points}</em> " +
           "frequent renter points<p>"
    result
  end
htmlStatement :: Customer -> String
htmlStatement c = unlines
    [ "<h1>Rentals for <em>" ++ name c ++ "</em></h1><p>"
    , intercalate "\n" rentalReportLines
    , "<p>You owe <em>" ++ show (totalCharge c) ++ "</em><p>"
    , "On this rental you earned <em>" ++ show (totalFrequentRenterPoints c) ++ "</em> frequent renter points<p>"
    ]
  where
  rentalReportLines = flip map (rentals c) $ \rental ->
    "\t" ++ title (movie rental) ++ ": " ++ show (charge rental) ++ "<br>"

Note also that at this point in the book, several UML class diagrams had been given to keep the reader oriented among all the code changes at play. In Haskell, we almost always have type signatures to serve exactly that purpose, and when we don’t, we can still interrogate the compiler for the type of any value it knows about, and often do so without leaving our text editor!

Replacing the Conditional Logic on Price Code with Polymorphism

At this point, the authors decide treating a group of constants as together forming an enum so that we can switch over them is a Bad Idea™. It is decidedly slightly less bad if that case statement only needs to be defined in one place. This naturally requires a refactor to move both the charge method and the frequent_renter_points from the Rental class to the Movie class, which is where the constants are defined.

Moreover, the prescribed solution to this dilemma is to encode the enum in a class hierarchy instead of in constants. The ensuing refactor is given the most detailed play-by-play of any in the chapter, and involves no fewer than

  • 1 custom setter
  • 3 bespoke classes
  • 1 mixin providing a default method implementation for 2/3 of the classes
  • 2 implicit interfaces
  • 5 methods in total, split across those classes and mixins

to support the charge and frequentRenterPoints in a way that avoids using constants as enums. This is, in my opinion, a nightmare. Though some Ruby programmers may disagree that the original example is the best approach for the problem given, it’s tough to argue that the approach outlined here is the most object-oriented. I think all too many programmers would thereby consider it to be the most praiseworthy.

After lots of incremental changes, the Ruby comes out looking like this.

module DefaultPrice
  def frequent_renter_points(days_rented)
    1
  end
end

class RegularPrice
  include DefaultPrice
  def charge(days_rented)
    result = 2
    result += (days_rented - 2) * 1.5 if days_rented > 2
    result
  end
end

class NewReleasePrice
  def charge(days_rented)
    days_rented * 3
  end
  def frequent_renter_points(days_rented)
    days_rented > 1 ? 2 : 1
  end
end

class ChildrensPrice
  include DefaultPrice
  def charge(days_rented)
    result = 1.5
    result += (days_rented - 3) * 1.5 if days_rented > 3
    result
  end
end

# then, in Movie class
  def charge(days_rented)
    @price.charge(days_rented)
  end
  def frequent_renter_points(days_rented)
    @price.frequent_renter_points(days_rented)
  end

There’s no Haskell translation for this ultimate refactoring. The Haskell examples already given don’t need to change, at all, for 2 reasons.

First, Haskell has fantastic support for user-defined data types. Defining an enum is dead simple, and defining fancier algebraic data types (not shown here) is not much more difficult. Second, designing and organizing your functions and data types are completely separate concerns from organizing your namespaces. This is in stark contrast with Ruby and most other OO languages, where both are concerns are painfully intertwined, as exhibited here.

I want to pause and note, again, that the primary motivation for this whole section of changes, which take no fewer than 16 pages in the book, boils down to a lack of enums or sum types in Ruby. In Haskell, we can define what kinds of movies we know about, in a single place, as a data type. We started and finished with this.

data MovieType = Regular | NewRelease | Childrens

The compiler could thereafter detect when we failed to consider one of these possible values, or when we tried to treat one as something it’s not, or even when we did something silly like spell one of their names wrong.

In Ruby, the best we can do is give some special names to what are really just a handful of integers, and then hope our tests are good enough to catch when we make one of those mistakes. Unsurprisingly, this can make code feel pretty brittle. If this poor man’s enum using integer constants approach makes a handful of classes and mixins and a little duck-typing look well-behaved by comparison, maybe that is less a strength of OOP and more an inability for the language to express much outside of a class hierarchy.