
Bracket Packing
Ruby Quiz #78 was posted by Ross Bamford. Please follow the link to see what it’s all about.
Specifying the Grammar
We start by defining a grammar for allowable bracket-packing configurations. Here’s the beginnings of a grammar specification:
class BracketGrammar < Dhaka::Grammar for_symbol(Dhaka::START_SYMBOL_NAME) do start %w| Package | end
We’re defining a production for the start symbol of the grammar, represented by the constant Dhaka::START_SYMBOL_NAME. All grammars must define productions for a start symbol. In our case, we’re saying that the start symbol’s only expansion is the grammar symbol 'Package'.
According to the quiz, packages may be boxed in cardboard, soft-wrapped or encased in wood. We can specify three production rules for the symbol 'Package' to codify this:
for_symbol('Package') do soft_wrapped_package %w| ( Contents ) | cardboard_package %w| [ Contents ] | wooden_package %w| { Contents } | end
All production rules have a name. In the above, the production named ‘soft_wrapped_package’ expands to the array of grammar symbols ['(', 'Contents', ')']. We’ve introduced a few more symbols here, most importantly 'Contents'. A package may contain a bracket or a set of other packages:
for_symbol('Contents') do bracket %w| B | set_of_packages %w| SetOfPackages | end
A SetOfPackages can, in turn, be just a single package or the concatenation of a set of packages and a single package:
for_symbol('SetOfPackages') do one_package %w| Package | multiple_packages %w| SetOfPackages Package | end
That’s it. The whole grammar specification is as follows:
class BracketGrammar < Dhaka::Grammar for_symbol(Dhaka::START_SYMBOL_NAME) do start %w| Package | end for_symbol('Package') do soft_wrapped_package %w| ( Contents ) | cardboard_package %w| [ Contents ] | wooden_package %w| { Contents } | end for_symbol('Contents') do bracket %w| B | set_of_packages %w| SetOfPackages | end for_symbol('SetOfPackages') do one_package %w| Package | multiple_packages %w| SetOfPackages Package | end end
Generating a Parser
We’re ready to generate a parser from this grammar. This is really simple, assuming that the grammar’s correct:require 'dhaka' require 'bracket_grammar' parser = Dhaka::Parser.new(BracketGrammar)
What does this parser look like? We can export it to dot format:
File.open('bracket_parser.dot', 'w') {|file| file << parser.to_dot}
Opening this file with GraphViz yields the following:
The parser that we’ve just created is fully capable of parsing token streams (we haven’t covered the tokenizer yet). But we don’t want to keep regenerating the states (it isn’t very efficient). So we compile the parser to ruby source:
File.open('bracket_parser.rb', 'w') {|file| file << parser.compile_to_ruby_source_as(:BracketParser)}
The file that results from this contains all the information needed for reloading the states and actions of the automaton without going through the costly calculations again. It looks like the following:
class BracketParser < Dhaka::CompiledParser self.grammar = BracketGrammar start_with 0 at_state(0) { for_symbols("{") { shift_to 2 } for_symbols("[") { shift_to 6 } for_symbols("(") { shift_to 1 } for_symbols("Package") { shift_to 14 } } at_state(10) { for_symbols("}") { shift_to 11 } } at_state(8) { for_symbols("]") { shift_to 9 } }
and so on.
Specifying the Tokenizer
The lexical structure for this grammar is so simple that we can write the tokenizer by hand using the Dhaka::Tokenizer class. Characters in the input string correspond directly to the symbols in the grammar (this need not be the case, since grammar symbols are an abstraction, but it makes life easy for us). This is the tokenizer:
require 'bracket_grammar' class BracketTokenizer < Dhaka::Tokenizer all_characters = %w| ( [ { B } ] ) | for_state Dhaka::TOKENIZER_IDLE_STATE do for_characters(all_characters) do create_token(curr_char, nil) advance end end end
The tokenizer is a state machine that you specify directly. This one basically translates to: “for every character that you encounter create a token with a grammar symbol corresponding to that character”. When tokenizers threaten to become complex and unwieldy because the lexical structure of the language is complicated, it’s better to generate a lexer for it using regular expressions to specify patterns for tokens (see the dice roller example).
Detecting Faulty Bracket Packing Configurations
Now we can put it all together in a test. Note that when we require the library in this test, we’re requiring 'dhaka/runtime' instead of just plain 'dhaka'. Requiring the runtime loads a subset of the library that has a much smaller memory footprint and is the preferred method of using the library in a production situation. As shown above, when developing grammars or generating parsers, we require the full library.
require 'test/unit' require 'dhaka/runtime' require 'bracket_grammar' require 'bracket_parser' require 'bracket_tokenizer' class TestBracketGrammar < Test::Unit::TestCase def test_recognizes_faulty_bracket_configuration_correctly assert correct("[{(B)[B]}(B)]") assert correct("[[B]{(B)}(B)]") assert correct("{(B)[B]}") assert !correct("{B[B]}") assert !correct("B") assert !correct("[[B]{(B)}}(B)]") end def correct input_string result = BracketParser.parse(BracketTokenizer.tokenize(input_string)) !result.has_error? end end
Defining an Evaluator
The brackets in these examples define a sort of arithmetic. Think of the Bs as 1s and imagine that there are plus signs between the packages. Imagine further that parentheses, square brackets and curly braces group expressions as in ordinary arithmetic. Let’s define an evaluator for these bracket expressions:
require 'bracket_grammar' class BracketEvaluator < Dhaka::Evaluator self.grammar = BracketGrammar define_evaluation_rules do for_bracket do 1 end for_multiple_packages do evaluate(child_nodes[0]) + evaluate(child_nodes[1]) end for_soft_wrapped_package do evaluate(child_nodes[1]) end for_wooden_package do evaluate(child_nodes[1]) end for_cardboard_package do evaluate(child_nodes[1]) end end end
Each block of code refers to a particular production in the grammar and tells the evaluator how to evaluate a syntax node that corresponds to that production. For example, for_multiple_packages refers to the production named ‘multiple_packages’ that we defined in the grammar. The expansion for that production is ['SetOfPackages', 'Package']. Here we’re telling the evaluator to evaluate the child node that corresponds to 'SetOfPackages' (child_nodes[0]), then evaluate the child node that corresponds to 'Package' (child_nodes[1]) and add the two results.
Note that we aren’t defining evaluation rules for all production rules in the grammar. Dhaka defines a default evaluation rule that returns the evaluated result of the child node in the cases where there is only one. So you don’t have to explicitly take care of these pass-through calculations.
We can now add a new test method to our test:
def test_evaluator_adds_up_the_number_of_brackets tokens = BracketTokenizer.tokenize("[{[B]{([B](B)[B])}}(B)]") parse_result = BracketParser.parse(tokens) assert_equal 5, BracketEvaluator.new.evaluate(parse_result) end
At this stage, it looks like all we’ve done is generated a complicated piece of software for counting the number of occurrences of the letter ‘B’ in a given string. However, we know a great deal more about this string now. Here’s a crude, but easy way to demonstrate this – we’ll change the evaluation rules for the various package types a little:
for_multiple_packages do evaluate(child_nodes[0]) + evaluate(child_nodes[1]) end for_soft_wrapped_package do size = evaluate(child_nodes[1]) puts "A soft-wrap package that holds #{size} brackets." size end for_wooden_package do size = evaluate(child_nodes[1]) puts "A wooden package that holds #{size} brackets." size end for_cardboard_package do size = evaluate(child_nodes[1]) puts "A cardboard package that holds #{size} brackets." size end
A cardboard package that holds 1 brackets.
A cardboard package that holds 1 brackets.
A soft-wrap package that holds 1 brackets.
A cardboard package that holds 1 brackets.
A soft-wrap package that holds 3 brackets.
A wooden package that holds 3 brackets.
A wooden package that holds 4 brackets.
A soft-wrap package that holds 1 brackets.
A cardboard package that holds 5 brackets.
This is a list of all the packages and how large they have to be, quite useful when you’re packing brackets.
