Bracket Packing

Ruby Quiz #78 was posted by Ross Bamford. Please follow the link to see what it’s all about.

Specifying the Grammar

We start by defining a grammar for allowable bracket-packing configurations. Here’s the beginnings of a grammar specification:

class BracketGrammar < Dhaka::Grammar
  
  for_symbol(Dhaka::START_SYMBOL_NAME) do
    start                %w| Package |
  end

We’re defining a production for the start symbol of the grammar, represented by the constant Dhaka::START_SYMBOL_NAME. All grammars must define productions for a start symbol. In our case, we’re saying that the start symbol’s only expansion is the grammar symbol 'Package'.

According to the quiz, packages may be boxed in cardboard, soft-wrapped or encased in wood. We can specify three production rules for the symbol 'Package' to codify this:

  for_symbol('Package') do
    soft_wrapped_package %w| ( Contents ) |
    cardboard_package    %w| [ Contents ] |
    wooden_package       %w| { Contents } |
  end

All production rules have a name. In the above, the production named ‘soft_wrapped_package’ expands to the array of grammar symbols ['(', 'Contents', ')']. We’ve introduced a few more symbols here, most importantly 'Contents'. A package may contain a bracket or a set of other packages:

  for_symbol('Contents') do
    bracket              %w| B |
    set_of_packages      %w| SetOfPackages |
  end

A SetOfPackages can, in turn, be just a single package or the concatenation of a set of packages and a single package:

  for_symbol('SetOfPackages') do
    one_package          %w| Package |
    multiple_packages    %w| SetOfPackages Package |
  end

That’s it. The whole grammar specification is as follows:

class BracketGrammar < Dhaka::Grammar
  
  for_symbol(Dhaka::START_SYMBOL_NAME) do
    start                %w| Package |
  end

  for_symbol('Package') do
    soft_wrapped_package %w| ( Contents ) |
    cardboard_package    %w| [ Contents ] |
    wooden_package       %w| { Contents } |
  end

  for_symbol('Contents') do
    bracket              %w| B |
    set_of_packages      %w| SetOfPackages |
  end

  for_symbol('SetOfPackages') do
    one_package          %w| Package |
    multiple_packages    %w| SetOfPackages Package |
  end

end

Generating a Parser

We’re ready to generate a parser from this grammar. This is really simple, assuming that the grammar’s correct:
require 'dhaka'
require 'bracket_grammar'
parser = Dhaka::Parser.new(BracketGrammar)

What does this parser look like? We can export it to dot format:

File.open('bracket_parser.dot', 'w') {|file| file << parser.to_dot}

Opening this file with GraphViz yields the following:

The parser that we’ve just created is fully capable of parsing token streams (we haven’t covered the tokenizer yet). But we don’t want to keep regenerating the states (it isn’t very efficient). So we compile the parser to ruby source:

File.open('bracket_parser.rb', 'w') {|file| file << parser.compile_to_ruby_source_as(:BracketParser)}

The file that results from this contains all the information needed for reloading the states and actions of the automaton without going through the costly calculations again. It looks like the following:

class BracketParser < Dhaka::CompiledParser

  self.grammar = BracketGrammar

  start_with 0

  at_state(0) {
    for_symbols("{") { shift_to 2 }
    for_symbols("[") { shift_to 6 }
    for_symbols("(") { shift_to 1 }
    for_symbols("Package") { shift_to 14 }
  }

  at_state(10) {
    for_symbols("}") { shift_to 11 }
  }

  at_state(8) {
    for_symbols("]") { shift_to 9 }
  }

and so on.

Specifying the Tokenizer

The lexical structure for this grammar is so simple that we can write the tokenizer by hand using the Dhaka::Tokenizer class. Characters in the input string correspond directly to the symbols in the grammar (this need not be the case, since grammar symbols are an abstraction, but it makes life easy for us). This is the tokenizer:

require 'bracket_grammar'

class BracketTokenizer < Dhaka::Tokenizer
  
  all_characters = %w| ( [ { B } ] ) |

  for_state Dhaka::TOKENIZER_IDLE_STATE do
    for_characters(all_characters) do
      create_token(curr_char, nil)
      advance
    end
  end
  
end

The tokenizer is a state machine that you specify directly. This one basically translates to: “for every character that you encounter create a token with a grammar symbol corresponding to that character”. When tokenizers threaten to become complex and unwieldy because the lexical structure of the language is complicated, it’s better to generate a lexer for it using regular expressions to specify patterns for tokens (see the dice roller example).

Detecting Faulty Bracket Packing Configurations

Now we can put it all together in a test. Note that when we require the library in this test, we’re requiring 'dhaka/runtime' instead of just plain 'dhaka'. Requiring the runtime loads a subset of the library that has a much smaller memory footprint and is the preferred method of using the library in a production situation. As shown above, when developing grammars or generating parsers, we require the full library.

require 'test/unit'
require 'dhaka/runtime'
require 'bracket_grammar'
require 'bracket_parser'
require 'bracket_tokenizer'

class TestBracketGrammar < Test::Unit::TestCase
  def test_recognizes_faulty_bracket_configuration_correctly
    assert correct("[{(B)[B]}(B)]")
    assert correct("[[B]{(B)}(B)]")
    assert correct("{(B)[B]}")
    assert !correct("{B[B]}")
    assert !correct("B")
    assert !correct("[[B]{(B)}}(B)]")
  end
  def correct input_string
    result = BracketParser.parse(BracketTokenizer.tokenize(input_string))
    !result.has_error?
  end
end

Defining an Evaluator

The brackets in these examples define a sort of arithmetic. Think of the Bs as 1s and imagine that there are plus signs between the packages. Imagine further that parentheses, square brackets and curly braces group expressions as in ordinary arithmetic. Let’s define an evaluator for these bracket expressions:

require 'bracket_grammar'

class BracketEvaluator < Dhaka::Evaluator
  
  self.grammar = BracketGrammar
  
  define_evaluation_rules do
    
    for_bracket do
      1 
    end

    for_multiple_packages do
      evaluate(child_nodes[0]) + evaluate(child_nodes[1])
    end
    
    for_soft_wrapped_package do
      evaluate(child_nodes[1])
    end
    
    for_wooden_package do
      evaluate(child_nodes[1])
    end
    
    for_cardboard_package do
      evaluate(child_nodes[1])
    end
    
  end
  
end

Each block of code refers to a particular production in the grammar and tells the evaluator how to evaluate a syntax node that corresponds to that production. For example, for_multiple_packages refers to the production named ‘multiple_packages’ that we defined in the grammar. The expansion for that production is ['SetOfPackages', 'Package']. Here we’re telling the evaluator to evaluate the child node that corresponds to 'SetOfPackages' (child_nodes[0]), then evaluate the child node that corresponds to 'Package' (child_nodes[1]) and add the two results.

Note that we aren’t defining evaluation rules for all production rules in the grammar. Dhaka defines a default evaluation rule that returns the evaluated result of the child node in the cases where there is only one. So you don’t have to explicitly take care of these pass-through calculations.

We can now add a new test method to our test:

  def test_evaluator_adds_up_the_number_of_brackets
    tokens = BracketTokenizer.tokenize("[{[B]{([B](B)[B])}}(B)]")
    parse_result = BracketParser.parse(tokens)
    assert_equal 5, BracketEvaluator.new.evaluate(parse_result)
  end
  

At this stage, it looks like all we’ve done is generated a complicated piece of software for counting the number of occurrences of the letter ‘B’ in a given string. However, we know a great deal more about this string now. Here’s a crude, but easy way to demonstrate this – we’ll change the evaluation rules for the various package types a little:

    for_multiple_packages do
      evaluate(child_nodes[0]) + evaluate(child_nodes[1])
    end
    
    for_soft_wrapped_package do
      size = evaluate(child_nodes[1])
      puts "A soft-wrap package that holds #{size} brackets."
      size
    end
    
    for_wooden_package do
      size = evaluate(child_nodes[1])
      puts "A wooden package that holds #{size} brackets."
      size
    end
    
    for_cardboard_package do
      size = evaluate(child_nodes[1])
      puts "A cardboard package that holds #{size} brackets."
      size
    end
    
When we re-run the test we see the following output:
    A cardboard package that holds 1 brackets.
    A cardboard package that holds 1 brackets.
    A soft-wrap package that holds 1 brackets.
    A cardboard package that holds 1 brackets.
    A soft-wrap package that holds 3 brackets.
    A wooden package that holds 3 brackets.
    A wooden package that holds 4 brackets.
    A soft-wrap package that holds 1 brackets.
    A cardboard package that holds 5 brackets.

This is a list of all the packages and how large they have to be, quite useful when you’re packing brackets.