I have a shell script
find . -name "*.java" -print0 | xargs -0 grep -Lz 'regular_expression'
which outputs file names not matching the regexp in this way:
file1.java
file2.java
...
The way I understand, it works as follows: find find needed files and concatenate their names with \0. Then xargs split the output of find with \0 and feeds them to grep one-by-one.
Then I wanted to add one more stage and get only basename of the files. I modified the script:
find . -name "*.java" -print0 | xargs -0 grep -LzZ 'regular_expression' | xargs -0 basename
but got an error. I started investigating and made an temporary output:
find . -name "*.java" -print0 | xargs -0 grep -LzZ 'regular_expression' | xargs -0 echo basename
and got this:
basename ./file1.java ./file2.java ./subdir/file1.java ./subdir/file2.java
So, the filenames were not split by \0. I can't get why they are split in case of xargs used with grep and not split in xargs with basename.
I got a workaround by using -n1 in the latter xargs. But still I don't understand why I needed it (given I didn't use in in the xargs with grep) and what this parameter does.
Hope you can explain to me what -n1 does and why I needed it in the latter usage and didn't need it in the former with grep.
-n1 tells xargs to run the given command once per argument.
So if you have something like
echo file1 file2 file2 | xargs basename
That's equivalent to passing all arguments at once to a single call to the basename command, like this:
basename file1 file2 file2
But if you do
echo file1 file2 file2 | xargs -n1 basename
That will cause xargs to pass only one argument (due to -n1) to the basename call, per instance of the call, like this:
basename file1
basename file2
basename file2
As for the -0 flag in xargs, that's an alias to the --null option which tells xargs to identify separate arguments by looking for a binary zero (\0) null character between them instead of the default whitespace between them. You need it after find because find is set to use binary zero separators (\0) via the -print0 option, and grep then needs the -z argument to do the same thing.
The filenames were split by \0. The difference is in the commands you're using. xargs normally takes its standard input, breaks it into a list (here, by splitting on NUL), and then passes that list as extra arguments to your command. So when you do this:
find . -name "*.java" -print0 | xargs -0 grep -Lz 'regular_expression'
What actually runs is this:
grep -Lz 'regular_expression' file1.java file2.java file3.java...
Here, the -z doesn't matter because it only affects how grep reads stdin, and you're not sending anything to its stdin.
So, when you add another xargs that runs basename, you get this:
basename file1.java file2.java file3.java...
But while grep will take any number of filename arguments, basename only takes one and ignores the others.
That's where -n 1 comes in: it tells xargs to break its list of arguments into chunks (of 1), and run the command multiple times. So what runs now is:
basename file1.java
basename file2.java
basename file3.java
...
And all the output is concatenated together onto stdout.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With