I am stuck on this algorithmic question :
design an algorithm that parse an expression like this : "((a,b,cy)n,m)" should give : an - bn - cyn - m
The expression can nest, therefore : "((a,b)o(m,n)p,b)" parses to ; aomp - aonp - bomp - bonp - b.
I thought of using stacks, but it is too complicated. thanks.
You can parse it with a Recursive Descent Parser.
Let's say the comma separated strings are components
, so for an expression ((a, b, cy)n, m)
, (a, b, cy)n
and m
are two components. a
, b
and cy
are also components. So this is a recursive definition.
For a component (a, b, cy)n
, let's say (a, b, cy)
and n
are two component parts
of the component. Component parts will later be combined to produce final result (i.e., an - bn - cyn
).
Let's say an expression
is comma separated components, for example, (a, cy)n, m
is an expression. It has two components (a, cy)n
and m
, and the component (a, cy)n
has two component parts (a, cy)
and n
, and component part (a, cy)
is a brace expression containing a nested expression
: a, cy
, which also has two components a
and cy
.
With these definitions (you might use other terms), we can write down the grammar for your expression:
expression = component, component, ...
component = component_part component_part ...
component_part = letters | (expression)
One line is one grammar rule. The first line means an expression
is a list of comma separated components
. The second line means a component
can be constructed with one or more component parts
. The third line means a component part
can be either a continuous sequence of letters or a nested expression inside a pair of braces.
Then you can use a Recursive Descent Parser to solve your problem with the above grammar.
We will define one method/function for each grammar rule. So basically we will have three methods ParseExpression
, ParseComponent
, ParseComponentPart
.
As I stated above, an expression is comma separated components
, so in our ParseExpression
method, it simply calls ParseComponent
, and then check if the next char is comma or not, like this (I'm using C#, I think you can easily convert it to other languages):
private List<string> ParseExpression()
{
var result = new List<string>();
while (!Eof())
{
// Parsing a component will produce a list of strings,
// they are added to the final string list
var items = ParseComponent();
result.AddRange(items);
// If next char is ',' simply skip it and parse next component
if (Peek() == ',')
{
// Skip comma
ReadNextChar();
}
else
{
break;
}
}
return result;
}
You can see that, when we are parsing an expression, we recursively call ParseComponent
(it will then recursively call ParseComponentPart
). It's a top-down approach, that's why it's called Recursive Descent Parsing.
ParseComponent
is similar, like this:
private List<string> ParseComponent()
{
List<string> leftItems = null;
while (!Eof())
{
// Parse a component part will produce a list of strings (rightItems)
// We need to combine already parsed string list (leftItems) in this component
// with the newly parsed 'rightItems'
var rightItems = ParseComponentPart();
if (rightItems == null)
{
// No more parts, return current result (leftItems) to the caller
break;
}
if (leftItems == null)
{
leftItems = rightItems;
}
else
{
leftItems = Combine(leftItems, rightItems);
}
}
return leftItems;
}
The combine method simply combines two string list:
// Combine two lists of strings and return the combined string list
private List<string> Combine(List<string> leftItems, List<string> rightItems)
{
var result = new List<string>();
foreach (var leftItem in leftItems)
{
foreach (var rightItem in rightItems)
{
result.Add(leftItem + rightItem);
}
}
return result;
}
Then is the ParseComponentPart
:
private List<string> ParseComponentPart()
{
var nextChar = Peek();
if (nextChar == '(')
{
// Skip '('
ReadNextChar();
// Recursively parse the inner expression
var items = ParseExpression();
// Skip ')'
ReadNextChar();
return items;
}
else if (char.IsLetter(nextChar))
{
var letters = ReadLetters();
return new List<string> { letters };
}
else
{
// Fail to parse a part, it means a component is ended
return null;
}
}
The other parts are mostly helper methods, full C# source code is listed below:
using System;
using System.Collections.Generic;
using System.Text;
namespace Examples
{
public class BashBraceParser
{
private string _expression;
private int _nextCharIndex;
/// <summary>
/// Parse the specified BASH brace expression and return the result string list.
/// </summary>
public IList<string> Parse(string expression)
{
_expression = expression;
_nextCharIndex = 0;
return ParseExpression();
}
private List<string> ParseExpression()
{
// ** This part is already posted above **
}
private List<string> ParseComponent()
{
// ** This part is already posted above **
}
private List<string> ParseComponentPart()
{
// ** This part is already posted above **
}
// Combine two lists of strings and return the combined string list
private List<string> Combine(List<string> leftItems, List<string> rightItems)
{
// ** This part is already posted above **
}
// Peek next char without moving the cursor
private char Peek()
{
if (Eof())
{
return '\0';
}
return _expression[_nextCharIndex];
}
// Read next char and move the cursor to next char
private char ReadNextChar()
{
return _expression[_nextCharIndex++];
}
private void UnreadChar()
{
_nextCharIndex--;
}
// Check if the whole expression string is scanned.
private bool Eof()
{
return _nextCharIndex == _expression.Length;
}
// Read a continuous sequence of letters.
private string ReadLetters()
{
if (!char.IsLetter(Peek()))
{
return null;
}
var str = new StringBuilder();
while (!Eof())
{
var ch = ReadNextChar();
if (char.IsLetter(ch))
{
str.Append(ch);
}
else
{
UnreadChar();
break;
}
}
return str.ToString();
}
}
}
var parser = new BashBraceParser();
var result = parser.Parse("((a,b)o(m,n)p,b)");
var output = String.Join(" - ", result);
// Result: aomp - aonp - bomp - bonp - b
Console.WriteLine(output);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With