Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Controlling shell command line wildcard expansion in C or C++

I'm writing a program, foo, in C++. It's typically invoked on the command line like this:

foo *.txt

My main() receives the arguments in the normal way. On many systems, argv[1] is literally *.txt, and I have to call system routines to do the wildcard expansion. On Unix systems, however, the shell expands the wildcard before invoking my program, and all of the matching filenames will be in argv.

Suppose I wanted to add a switch to foo that causes it to recurse into subdirectories.

foo -a *.txt

would process all text files in the current directory and all of its subdirectories.

I don't see how this is done, since, by the time my program gets a chance to see the -a, then shell has already done the expansion and the user's *.txt input is lost. Yet there are common Unix programs that work this way. How do they do it?

In Unix land, how can I control the wildcard expansion?

(Recursing through subdirectories is just one example. Ideally, I'm trying to understand the general solution to controlling the wildcard expansion.)

like image 688
Adrian McCarthy Avatar asked Nov 23 '25 15:11

Adrian McCarthy


2 Answers

You program has no influence over the shell's command line expansion. Which program will be called is determined after all the expansion is done, so it's already too late to change anything about the expansion programmatically.

The user calling your program, on the other hand, has the possibility to create whatever command line he likes. Shells allow you to easily prevent wildcard expansion, usually by putting the argument in single quotes:

program -a '*.txt'

If your program is called like that it will receive two parameters -a and *.txt.

On Unix, you should just leave it to the user to manually prevent wildcard expansion if it is not desired.

like image 54
sth Avatar answered Nov 26 '25 03:11

sth


As the other answers said, the shell does the wildcard expansion - and you stop it from doing so by enclosing arguments in quotes.

Note that options -R and -r are usually used to indicate recursive - see cp, ls, etc for examples.

Assuming you organize things appropriately so that wildcards are passed to your program as wildcards and you want to do recursion, then POSIX provides routines to help:

  • nftw - file tree walk (recursive access).
  • fnmatch, glob, wordexp - to do filename matching and expansion

There is also ftw, which is very similar to nftw but it is marked 'obsolescent' so new code should not use it.


Adrian asked:

But I can say ls -R *.txt without single quotes and get a recursive listing. How does that work?

To adapt the question to a convenient location on my computer, let's review:

$ ls -F | grep '^m'
makefile
mapmain.pl
minimac.group
minimac.passwd
minimac_13.terminal
mkmax.sql.bz2
mte/
$ ls -R1 m*
makefile
mapmain.pl
minimac.group
minimac.passwd
minimac_13.terminal
mkmax.sql.bz2

mte:
multithread.ec
multithread.ec.original
multithread2.ec
$

So, I have a sub-directory 'mte' that contains three files. And I have six files with names that start 'm'.

  • When I type 'ls -R1 m*', the shell notes the metacharacter '*' and uses its equivalent of glob() or wordexp() to expand that into the list of names:

    1. makefile
    2. mapmain.pl
    3. minimac.group
    4. minimac.passwd
    5. minimac_13.terminal
    6. mkmax.sql.bz2
    7. mte
  • Then the shell arranges to run '/bin/ls' with 9 arguments (program name, option -R1, plus 7 file names and terminating null pointer).

  • The ls command notes the options (recursive and single-column output), and gets to work.
    • The first 6 names (as it happens) are simple files, so there is nothing recursive to do.
    • The last name is a directory, so ls prints its name and its contents, invoking its equivalent of nftw() to do the job.
    • At this point, it is done.
  • This uncontrived example doesn't show what happens when there are multiple directories, and so the description above over-simplifies the processing.
  • Specifically, ls processes the non-directory names first, and then processes the directory names in alphabetic order (by default), and does a depth-first scan of each directory.
like image 39
Jonathan Leffler Avatar answered Nov 26 '25 04:11

Jonathan Leffler



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!