Book Store (Code Meditation 1)

posted in: Code Meditation | 0

I’ve recently been doing some coding drills to create some opportunities for code refactors. I was reminded of Exercism.io and, having not checked them out in several years, was delighted to see how much they’ve improved since then.

Feel free to follow / friend / stalk me over there: https://exercism.io/profiles/armahillo

I would like to do a blog series on some of the exercises I work through over there, to discuss my thought process on how I approach the problems. I’m currently working through the Ruby track, so all these problems will be using Ruby code, but software design discussion is largely language agnostic, and you should be able to follow along even if you prefer a different language.

The Problem: Book Store (Checkout)

This exercise, one of the ones designated “Hard”, is called “Book Store“. This problem felt very realistic because, at first glance, it seems almost trivial (I solved the primary use cases in about 5 minutes), but there’s an optimization requirement underneath the surface that is a bit insidious.

To try and encourage more sales of different books from a popular 5 book series, a bookshop has decided to offer discounts on multiple book purchases.

One copy of any of the five books costs $8.

If, however, you buy two different books, you get a 5% discount on those two books.

If you buy 3 different books, you get a 10% discount.

If you buy 4 different books, you get a 20% discount.

If you buy all 5, you get a 25% discount.

Note: that if you buy four books, of which 3 are different titles, you get a 10% discount on the 3 that form part of a set, but the fourth book still costs $8.

Your mission is to write a piece of code to calculate the price of any conceivable shopping basket (containing only books of the same series), giving as big a discount as possible.

https://exercism.io/my/solutions/08d30003d7f347639f31d31930ff4b1e

Pass 1: Naive Interpretatation

Not too complicated, right? My initial reading of it was “oh, look at how many books are in the cart, and calculate the discount from that, easy peasy”. The input you get is an array (this much is given), and looks like: [2,3,5]

The solution looked a bit like this:

class BookStore
  DISCOUNTED_TOTAL = [ 0.0, 1.0, 0.95, 0.90, 0.80, 0.75]
  NOMINAL_PRICE_PER_UNIT = 8.0

  def self.calculate_price(basket)
    NOMINAL_PRICE_PER_UNIT * basket.size * DISCOUNTED_TOTAL[basket.size]
  end
end

A few of the tests passed. My approach here was a trivial (and naive, as I later discovered) approach: multiply the nominal cost times the number of books, and then apply a discount.

I used a class constant array to represent the discount — the discount level is a scalar that I can have serve double-duty as the index in the array. If you have 1 book, you pay 100% (1.0); if you have 3 books, you get a 10% discount (1.00 - 0.10 => 0.90), etc.

I had overlooked one important thing: the discount wasn’t on how many books were purchased overall, it was applied individually to books in a set. A set was 5 possible books, represented as the numerals 1 through 5. The 5-book discount was if you made your set from all 5, the 3-book discount was from if you had 3 of the books, etc.

The test that first failed was given input like this: [1,2,3,3,4,5,5]

Whoops!

Pass 2: Allowing for multiple sets

So the problem then was to calculate the largest possible sets, group the purchase together into those sets, and then calculate the discount and apply it to each group, and total up the result.

This was what I came up with:

class BookStore
  DISCOUNTED_TOTAL = [ 0.0, 1.0, 0.95, 0.90, 0.80, 0.75]
  NOMINAL_PRICE_PER_UNIT = 8.0

  def self.calculate_price(basket)
    grouped = basket.group_by { |i| i }
    all_sets = []
    while (!grouped.values.flatten.compact.empty?) do
      all_sets << grouped.values.map!(&:pop).compact
    end
    all_sets.collect do |set|
      calculate_price_for_set(set.size)
    end.sum
  end

  def self.calculate_price_for_set(size)
    NOMINAL_PRICE_PER_UNIT * size * DISCOUNTED_TOTAL[size]
  end
end

So this got a bit more interesting.

The first thing I did was to extract the price calculation to a separate method, calculate_price_for_set. I applied a dependency injection refactor to it where it only needs to know the size, instead of knowing that the argument it receives is an argument that can receive a :size message.

After that, my thinking for the main algorithm was this: I wanted to produce the largest possible sets possible. I envisioned making multiple passes over the collection, where you make a maximized combination on each pass.

There is no provision that the arguments will be ordered. The example from above could also be written: [3,1,2,5,3,4,5]

Applying a group_by to the collection will create the necessary clustering. This will convert the above example into:

irb> [3,1,2,5,3,4,5].group_by { |i| i }
=> {3=>[3, 3], 1=>[1], 2=>[2], 5=>[5, 5], 4=>[4]}

If we have it clustered, then we can ask each type to give us 1 book, at a time, sort of like peeling off a layer at a time. Asking an empty book type for an element will yield nil, which can be compacted away easily.

The powerhouse is here:

while (!grouped.values.flatten.compact.empty?) do
  all_sets << grouped.values.map!(&:pop).compact
end

The while conditional is verbose (and not technically a Law of Demeter violation because each method returns self), but straightforward: if there’s anything left in the groupings, keep chugging along.

The inside of the block is a bit trickier, so let’s break it down.

grouped.values is referencing the Hash values method.

Applying map! to it means “edit this in-place” by applying the block to it. Typical usage of map! would look like this:

animals = ["dog", "cat", "mouse"]
animals.map! do |element|
  element.upcase
end
puts animals.inspect

The end result would yield: ["DOG", "CAT", "MOUSE"]: The original array with each element being transformed in the same way. The ! does in-place modification, so that we don’t have to do:

animals = ["dog", "cat", "mouse"]
animals = animals.map do |element|
  element.upcase
end
puts animals.inspect

The :pop method, when applied to an array, does a LIFO desertion of an element from the collection. Though this does not have a ! on it, it is destructive and does modify in-place:

animals = ['dog', 'cat', 'mouse']
pet = animals.pop
puts animals.inspect
=> ['dog','cat']

Since each array in the groupings is homogeneous, we don’t care about the order (I think :pop is more fun-sounding than :shift, but they’re effectively the same here.) What we want to happen here is this:

all_sets << grouped.values.map! do |set|
  set.pop
end

Since the same operation is being applied directly to each element, uniformly, with no conditions or additional operations, we can use the Proc shorthand. The above code can be re-written like this, and it will behave identically:

all_sets << grouped.values.map!(&:proc)

With the array input of: [3,1,2,5,3,4,5], the contents of all_sets will be [[1,2,3,4,5], [3,5]]. With this, we pass it on to the sum calculator, which is similar to before:

all_sets.collect do |set|
  calculate_price_for_set(set.size)
end.sum

:collect is similar to :map, in that it’s an iterator; I think they may actually just be aliases for one another. Contextually, I like to use :collect when I’m gathering the results, and :map when I’m applying a transformation, but that’s a personal preference.

This block of code does the :calculate_price_for_set method, discussed earlier, gathering the result into an array result. The end.sum at the end means “take the result and send the :sum message to it”

Narratively, the original code for this pass means:

  1. Cluster the list into like terms
  2. Peel one entry off of each and store it in one set
  3. Repeat 2 if there are any entries left
  4. Take the sets and calculate their price (using the formula from the first pass), then sum up the total for all the sets

I have to admit, I felt pretty clever here. It’s concise and elegant. And almost passed all the tests.

Pass 3: Optmization

Remember the insidious problem underneath the surface I mentioned earlier?

One way of grouping these 8 books is:

1 group of 5 –> 25% discount (1st,2nd,3rd,4th,5th)
+1 group of 3 –> 10% discount (1st,2nd,3rd)

This would give a total of:

5 books at a 25% discount
+3 books at a 10% discount

Resulting in:

5 x (8 – 2.00) == 5 x 6.00 == $30.00
+3 x (8 – 0.80) == 3 x 7.20 == $21.60

For a total of $51.60

However, a different way to group these 8 books is:

1 group of 4 books –> 20% discount (1st,2nd,3rd,4th)
+1 group of 4 books –> 20% discount (1st,2nd,3rd,5th)

This would give a total of:

4 books at a 20% discount
+4 books at a 20% discount

Resulting in:

4 x (8 – 1.60) == 4 x 6.40 == $25.60
+4 x (8 – 1.60) == 4 x 6.40 == $25.60

For a total of $51.20

And $51.20 is the price with the biggest discount.

These were the tests that were failing.

This is also a slightly different, and significantly more complicated, problem: We now have to teach the script how to optimize the groupings. Comparing the grouping costs once they’re assembled is trivial, but establishing the groupings first is not.

Allocation algorithms is a well-documented and explored facet of computer science (and mathematics). There’s the Travelling Salesman problem (optimize the best path given different weights along the edges), the Knapsack problem (optimize inclusion in a final set based on limited space), etc. This problem is a little similar to both of those.

The first thing I did was to look at the original definition of the discount breakdown to understand why there might be a difference:

  • 1 of any of the five books gets 0% discount ($8)
  • 2 different books, gives a 5% discount on those two books ($8 + $8 = $16 * 95%).
  • 3 different books, you get a 10% discount.
  • 4 different books, you get a 20% discount.
  • 5, you get a 25% discount.

The jump from 1 to 2 is a 5% decrease in cost. The difference from 2 to 3 is also a 5% decrease. However, the difference from 3 to 4 is a whole 10% decrease. Going from 4 to 5 is only an additional 5% decrease.

So really, any situation where we can get a 4-book set is potentially a bigger discount. Designing for this scenario specifically will still be somewhat complicated and would likely result in over-fitting (making it harder to modify in the future).

The strategy I was imagining was one where we try to balance out the sets as much as possible. One of the tests is named: “two groups of four is cheaper than a group of five and a group of three”, which further supports this hypothesis.

The question then became, how do I turn this:

{3=>[3, 3], 1=>[1], 2=>[2], 5=>[5, 5], 4=>[4]}

Into this:

[[1,2,3,5],[3,4,5]]

My first pass on this strategy was very brutish. I don’t have the code anymore and it would be a pain to recreate it, but basically it set up two pointers, a “set pointer” and a “book pointer”. The set pointer was responsible for pointing to which set we were currently adding to, and the book pointer pointed to the book that was a candidate for addition. Each iteration of the loop would advance the set pointer if a book had been added, but would always advance the book pointer.

I also realized that the number of sets we would need was (perhaps naively) the size of the largest collection of books. Logically, we would at least need that many; given [1,1,1,1,1,2,3,4,5] even the “largest set” algorithm would yield:

[[1,2,3,4,5],[1],[1],[1],[1]]

This would prove important later.

Overall, though, this was clunky. It was immediately apparent that, in all the ways the “largest set” solution was elegant, this was the opposite. I kept running into “wait, but what if…” cases.

“What if the set already has the book that’s pointed to?”, “What if the set is full already?”, “What if the book that’s pointed to is all gone?”

I put it away for the evening and returned to it in the morning.

Sleep helped! I realized that I didn’t need to pre-group like with the largest set strategy. The approach for “largest set” was to focus on the set first and fill it up with as much as possible; what we needed to do here was focus on each book type and spread them out as much as possible.

If we began with the book types sorted by how many there were, highest quantity first, we would get this:

order_by_quantity([3,1,2,5,3,4,5])
=> [3,3,5,5,1,2,4]

This tells us that we will need 2 sets (we have two 3s), and if we process this list in order, we will ensure that the highest available books will be distributed before the rest of the books are distributed. In pseudocode, this might look like:

  • (1) Order the list by its quantity, highest quantity first
  • (2) Initialize a number of sets equal to the highest number of books of a single type
  • (3) Starting with the highest quantity books in the ordered list:
    • (3a) Add the book to the current set
    • (3b) Move to the next set in the series (wrapping around if necessary)

Step 1 could be approached a few different ways. I did it like this, using the grouping from the “largest set” strategy:

def cluster(pile)
    pile.group_by { |i| i }
end
# ...
grouped = cluster(pile).sort_by { |b| b.size }.to_h

We’re using the clustering approach from “largest set”, and then applying :sort_by to it. We’re sorting based on the size. This puts the largest groups first.

Step 2, initializing the arrays, is trivial, though there is a gotcha here that I learned (continuing from previous sample):

number_of_sets = grouped.values.first.size
balanced_sets = Array.new(number_of_sets) { |_| [] }

Array.new can technically accept two arguments: how many elements to initialize, and a second argument that is an initial value for each.

HOWEVER, look at this:

2.7.1 :001 > cloned_elements = Array.new(3,[])
2.7.1 :002 > cloned_elements.inspect
 => "[[], [], []]"
2.7.1 :003 > cloned_elements[0].object_id
 => 180
2.7.1 :004 > cloned_elements[1].object_id
 => 180
2.7.1 :005 > cloned_elements[2].object_id
 => 180
2.7.1 :006 > cloned_elements[0] << 1
 => [1]
2.7.1 :007 > cloned_elements
 => [[1], [1], [1]]

Whoops! That’s no good. The documentation for Array.new notes:

When sending the second parameter, the same object will be used as the value for all the array elements. Since all the Array elements store the same hash, changes to one of them will affect them all. If multiple copies are what you want, you should use the block version which uses the result of that block each time an element of the array needs to be initialized.

2.7.1 :008 > unique_elements = Array.new(3) { |i| [] }
2.7.1 :009 > unique_elements
 => [[], [], []]
2.7.1 :010 > unique_elements[0].object_id
 => 200
2.7.1 :011 > unique_elements[1].object_id
 => 220
2.7.1 :012 > unique_elements[2].object_id
 => 240
2.7.1 :013 > unique_elements[0] << 1
 => [1]
2.7.1 :014 > unique_elements
 => [[1], [], []]

This looks like what we want. I actually got tripped up by this behavior initially — I was adding a single book to one set and it was adding to all the sets. How confusing!

Step 3 from above is a straightforward approach, instead of using book pointers and set pointers:

set_pointer = 0
ordered.each do |book|
  # stuff coming soon
end

We’re using ordered from the previous example. We are going to use a set pointer here, but we no longer need the book pointer as well.

Steps 3a and 3b will use that set pointer to distribute the book:

ordered.each do |book|
  balanced_sets[set_pointer % balanced_sets.size] << book
  set_pointer += 1
end

The set_pointer modulo math is a trick I learned in C, many years ago. You can continuously increase and ensure that you wrap around with the right cycle by taking the modulus against the size of the array.

> 0 % 3
=> 0
> 1 % 3
=> 1
> 2 % 3
=> 2
> 3 % 3
=> 0
> 4 % 3
=> 1
# ...

Since arrays are 0-indexed, this trick maps well, and balanced_sets[set_pointer % balanced_sets.size]should never land out-of-bounds.

This solution worked. I wrapped the whole business in a method called “balanced” and put it alongside the other method “largest”.

The last thing to do was to apply both strategies to the same basket and see which one produced the lower cost:

[:largest, :balanced].map do |strategy|
  calculate_price_for_combinations(book_pile.apply_strategy(strategy))
end.min

The approach I took here was to do some meta-programming: :apply_strategy accepts a label of some kind, and then invokes the method that matches that name:

def apply_strategy(strategy)
  return [] if @pile.empty?
  send(strategy.to_sym, @pile.dup)
rescue NoMethodError => e
  puts "#{e} isn't a supported strategy"
end

This pre-emptively resolves any situation where the inbound array (@pile) is empty. It uses the :send method, passing the strategy provided, with a duplication of the @pile. Using a duplication ensures we don’t accidentally produce mutations as side effects.

One advantage of this approach is that we can add additional optimization strategies (maybe the store offers additional discounts on odd-numbered volumes) and add them alongside these other two, and minimal changes will be required to the overall structure — add a new method :discount_odd_volumes, add that to the array of methods, and that’s it!

Similarly, it separates the cost calculation from the grouping algorithm, so changes to the former will not affect the latter, and vice versa.

I’m happy with this result, and so was the test suite.

Check out my solution and let me know what you think.

I’ll try to post some more of these in the future.


Header Image Credit: https://www.needpix.com/photo/download/928828/books-book-bookshelf-library-read-literature-shelf-free-pictures-free-photos