Coder Perfect

In Cygwin, how can I crop (clip) text files based on the beginning and ending line numbers?

Problem

I have a couple log files that are each around 100MB in size. Dealing with such large files is inconvenient for me. I’m aware that the log lines that interest me are only 200 to 400 lines long.

What would be a suitable technique to extract relevant log lines from these files, i.e. simply pipe the line number range to another file?

The following are some examples of inputs:

filename: MyHugeLogFile.log
Starting line number: 38438
Ending line number:   39276

Is there a cygwin command that will cat out only that range in that file? I know that if I can display that range in stdout, I can pipe to an output file as well.

Note: I’ve added the Linux tag to increase visibility, but I’m looking for a solution that will work in Cygwin. (In most cases, linux commands work in cygwin.)

Asked by bits

Solution #1

This appears to be a job for sed:

sed -n '8,12p' yourfile

Lines 8 through 12 of your file will be sent to standard out.

You might want to run cat -n first if you want to prepend the line number:

cat -n yourfile | sed -n '8,12p'

Answered by Johnsyweb

Solution #2

To determine the total number of lines, use wc -l.

Then you can combine head and tail to acquire the desired range. Assume the log has 40,000 lines, and you want the last 1562 lines, followed by the first 838. So:

tail -1562 MyHugeLogFile.log | head -838 | ....

There’s surely a simpler method to do it with sed or awk.

Answered by David

Solution #3

When I was trying to break a file into 100 000 line files, I came across this topic. For that, there’s a better option than sed:

split -l 100000 database.sql database-

It will produce files such as:

database-aaa
database-aab
database-aac
...

Answered by Dorian

Solution #4

If you just want to clip a section of a file, say from line 26 to 142, and paste it into a new file: sed -n ‘26,142p’ | cat file-to-cut.txt >> new-file.txt

Answered by Marc Perrin-Pelletier

Solution #5

How about this:

$ seq 1 100000 | tail -n +10000 | head -n 10
10000
10001
10002
10003
10004
10005
10006
10007
10008
10009

It uses tail to output from the 10,000th line and onwards and then head to only keep 10 lines.

With sed, you get the same (nearly) result:

$ seq 1 100000 | sed -n '10000,10010p'
10000
10001
10002
10003
10004
10005
10006
10007
10008
10009
10010

This one has the benefit of allowing you to directly input the line range.

Answered by thkala

Post is based on https://stackoverflow.com/questions/5683367/how-to-cropcut-text-files-based-on-starting-and-ending-line-numbers-in-cygwin