Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I split a file up by matching lines context?

Tags:

shell

unix

sh

I have a file, x, with section delimiters:

The first section

#!

The second section

#!

The third section

And I want to split it up into a sequence of separate files, like:

The first section
#!

The second section
#!

The third section

I thought csplit would be the solution, with a command-line something like:

$ csplit -sk x '/#!/' {9999}

But the second file (xx01) ends up containing both delimiters:

#!

The second section

#!

Any ideas for how to accomplish what I want in a POSIX compliant way? (Yes, I could reach for Perl/Python/Ruby and friends; but, the point is to stretch my shell knowledge.)


I worry that I've found a bug in OSX csplit. Can people give the following a go and let me know the results?

#!/bin/sh

test -e

work="$(basename $0).$RANDOM"
mkdir $work

csplit -sk -f "$work/" - '/#/' '{9999}' <<EOF
First
#
Second
#
Third
EOF

if [ $(grep -c '#' $work/01) -eq 2 ]; then
  echo FAIL Repeat
else
  echo PASS Repeat
fi

rm $work/*

csplit -sk -f "$work/" - '/#/' '/#/' <<EOF
First
#
Second
#
Third
EOF

if [ $(grep -c '#' $work/01) -eq 2 ]; then
  echo FAIL Exact
else
  echo PASS Exact
fi

uname -a

When I run it on my Snow Leopard box, I get:

$ ./csplit-test
csplit: #: no match
FAIL Repeat
PASS Exact
Darwin lani.bigpond 11.2.0 Darwin Kernel Version 11.2.0: Tue Aug  9 20:54:00 PDT 2011; root:xnu-1699.24.8~1/RELEASE_X86_64 x86_64

And on my Debian box, I get:

$ sh ./csplit-test 
csplit: `/#/': match not found on repetition 2
PASS Repeat
PASS Exact
like image 561
Scott Robinson Avatar asked Sep 01 '25 04:09

Scott Robinson


1 Answers

this seems to work for me on LINUX:

csplit -sk filename '/#!/' {*}

giving:

$ more xx00
The first section

$ more xx01
#!

The second section

$ more xx02
#!

The third section

you could also use Ruby or Perl to do this in a tiny script, and get rid of the delimiters all together


on Fedora 13 Linux:

$ ./test.sh 
csplit: `/#/': match not found on repetition 2
PASS Repeat
PASS Exact
Linux localhost.localdomain 2.6.34.8-68.fc13.x86_64 #1 SMP Thu Feb 17 15:03:58 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
like image 174
Tilo Avatar answered Sep 03 '25 04:09

Tilo