BASH brace expansion algorithm

Question

I am stuck on this algorithmic question :

design an algorithm that parse an expression like this : "((a,b,cy)n,m)" should give : an - bn - cyn - m

The expression can nest, therefore : "((a,b)o(m,n)p,b)" parses to ; aomp - aonp - bomp - bonp - b.

I thought of using stacks, but it is too complicated. thanks.

Mouhong Lin · Accepted Answer

You can parse it with a Recursive Descent Parser.

Let's say the comma separated strings are components, so for an expression ((a, b, cy)n, m), (a, b, cy)n and m are two components. a, b and cy are also components. So this is a recursive definition.

For a component (a, b, cy)n, let's say (a, b, cy) and n are two component parts of the component. Component parts will later be combined to produce final result (i.e., an - bn - cyn).

Let's say an expression is comma separated components, for example, (a, cy)n, m is an expression. It has two components (a, cy)n and m, and the component (a, cy)n has two component parts (a, cy) and n, and component part (a, cy) is a brace expression containing a nested expression: a, cy, which also has two components a and cy.

With these definitions (you might use other terms), we can write down the grammar for your expression:

expression     = component, component, ...
component      = component_part component_part ...
component_part = letters | (expression)

One line is one grammar rule. The first line means an expression is a list of comma separated components. The second line means a component can be constructed with one or more component parts. The third line means a component part can be either a continuous sequence of letters or a nested expression inside a pair of braces.

Then you can use a Recursive Descent Parser to solve your problem with the above grammar.

We will define one method/function for each grammar rule. So basically we will have three methods ParseExpression, ParseComponent, ParseComponentPart.

Algorithm

As I stated above, an expression is comma separated components, so in our ParseExpression method, it simply calls ParseComponent, and then check if the next char is comma or not, like this (I'm using C#, I think you can easily convert it to other languages):

private List<string> ParseExpression()
{
    var result = new List<string>();

    while (!Eof())
    {
        // Parsing a component will produce a list of strings,
        // they are added to the final string list

        var items = ParseComponent();

        result.AddRange(items);

        // If next char is ',' simply skip it and parse next component
        if (Peek() == ',')
        {
            // Skip comma
            ReadNextChar();
        }
        else
        {
            break;
        }
    }

    return result;
}

You can see that, when we are parsing an expression, we recursively call ParseComponent (it will then recursively call ParseComponentPart). It's a top-down approach, that's why it's called Recursive Descent Parsing.

ParseComponent is similar, like this:

private List<string> ParseComponent()
{
    List<string> leftItems = null;

    while (!Eof())
    {
        // Parse a component part will produce a list of strings (rightItems)
        // We need to combine already parsed string list (leftItems) in this component
        // with the newly parsed 'rightItems'
        var rightItems = ParseComponentPart();
        if (rightItems == null)
        {
            // No more parts, return current result (leftItems) to the caller
            break;
        }

        if (leftItems == null)
        {
            leftItems = rightItems;
        }
        else
        {
            leftItems = Combine(leftItems, rightItems);
        }
    }

    return leftItems;
}

The combine method simply combines two string list:

// Combine two lists of strings and return the combined string list
private List<string> Combine(List<string> leftItems, List<string> rightItems)
{
    var result = new List<string>();

    foreach (var leftItem in leftItems)
    {
        foreach (var rightItem in rightItems)
        {
            result.Add(leftItem + rightItem);
        }
    }

    return result;
}

Then is the ParseComponentPart:

private List<string> ParseComponentPart()
{
    var nextChar = Peek();

    if (nextChar == '(')
    {
        // Skip '('
        ReadNextChar();

        // Recursively parse the inner expression
        var items = ParseExpression();

        // Skip ')'
        ReadNextChar();

        return items;
    }
    else if (char.IsLetter(nextChar))
    {
        var letters = ReadLetters();

        return new List<string> { letters };
    }
    else
    {
        // Fail to parse a part, it means a component is ended
        return null;
    }
}

Full Source Code (C#)

The other parts are mostly helper methods, full C# source code is listed below:

using System;
using System.Collections.Generic;
using System.Text;

namespace Examples
{
    public class BashBraceParser
    {
        private string _expression;
        private int _nextCharIndex;

        /// <summary>
        /// Parse the specified BASH brace expression and return the result string list.
        /// </summary>
        public IList<string> Parse(string expression)
        {
            _expression = expression;
            _nextCharIndex = 0;

            return ParseExpression();
        }

        private List<string> ParseExpression()
        {
            // ** This part is already posted above **
        }

        private List<string> ParseComponent()
        {
            // ** This part is already posted above **
        }

        private List<string> ParseComponentPart()
        {
            // ** This part is already posted above **
        }

        // Combine two lists of strings and return the combined string list
        private List<string> Combine(List<string> leftItems, List<string> rightItems)
        {
            // ** This part is already posted above **
        }

        // Peek next char without moving the cursor
        private char Peek()
        {
            if (Eof())
            {
                return '\0';
            }

            return _expression[_nextCharIndex];
        }

        // Read next char and move the cursor to next char
        private char ReadNextChar()
        {
            return _expression[_nextCharIndex++];
        }

        private void UnreadChar()
        {
            _nextCharIndex--;
        }

        // Check if the whole expression string is scanned.
        private bool Eof()
        {
            return _nextCharIndex == _expression.Length;
        }

        // Read a continuous sequence of letters.
        private string ReadLetters()
        {
            if (!char.IsLetter(Peek()))
            {
                return null;
            }

            var str = new StringBuilder();

            while (!Eof())
            {
                var ch = ReadNextChar();
                if (char.IsLetter(ch))
                {
                    str.Append(ch);
                }
                else
                {
                    UnreadChar();
                    break;
                }
            }

            return str.ToString();
        }
    }
}

Use The Code

var parser = new BashBraceParser();
var result = parser.Parse("((a,b)o(m,n)p,b)");

var output = String.Join(" - ", result);

// Result: aomp - aonp - bomp - bonp - b
Console.WriteLine(output);

BASH brace expansion algorithm

Tags:

algorithm

user199

1 Answers

Algorithm

Full Source Code (C#)

Use The Code

Mouhong Lin

Recent Activity

Donate For Us

BASH brace expansion algorithm

Tags:

algorithm

user199

1 Answers

Algorithm

Full Source Code (C#)

Use The Code

Mouhong Lin

Related questions

Recent Activity

Donate For Us