Problem
I have a couple log files that are each around 100MB in size. Dealing with such large files is inconvenient for me. I’m aware that the log lines that interest me are only 200 to 400 lines long.
What would be a suitable technique to extract relevant log lines from these files, i.e. simply pipe the line number range to another file?
The following are some examples of inputs:
filename: MyHugeLogFile.log
Starting line number: 38438
Ending line number: 39276
Is there a cygwin command that will cat out only that range in that file? I know that if I can display that range in stdout, I can pipe to an output file as well.
Note: I’ve added the Linux tag to increase visibility, but I’m looking for a solution that will work in Cygwin. (In most cases, linux commands work in cygwin.)
Asked by bits
Solution #1
This appears to be a job for sed:
sed -n '8,12p' yourfile
Lines 8 through 12 of your file will be sent to standard out.
You might want to run cat -n first if you want to prepend the line number:
cat -n yourfile | sed -n '8,12p'
Answered by Johnsyweb
Solution #2
To determine the total number of lines, use wc -l.
Then you can combine head and tail to acquire the desired range. Assume the log has 40,000 lines, and you want the last 1562 lines, followed by the first 838. So:
tail -1562 MyHugeLogFile.log | head -838 | ....
There’s surely a simpler method to do it with sed or awk.
Answered by David
Solution #3
When I was trying to break a file into 100 000 line files, I came across this topic. For that, there’s a better option than sed:
split -l 100000 database.sql database-
It will produce files such as:
database-aaa
database-aab
database-aac
...
Answered by Dorian
Solution #4
If you just want to clip a section of a file, say from line 26 to 142, and paste it into a new file: sed -n ‘26,142p’ | cat file-to-cut.txt >> new-file.txt
Answered by Marc Perrin-Pelletier
Solution #5
How about this:
$ seq 1 100000 | tail -n +10000 | head -n 10
10000
10001
10002
10003
10004
10005
10006
10007
10008
10009
It uses tail to output from the 10,000th line and onwards and then head to only keep 10 lines.
With sed, you get the same (nearly) result:
$ seq 1 100000 | sed -n '10000,10010p'
10000
10001
10002
10003
10004
10005
10006
10007
10008
10009
10010
This one has the benefit of allowing you to directly input the line range.
Answered by thkala
Post is based on https://stackoverflow.com/questions/5683367/how-to-cropcut-text-files-based-on-starting-and-ending-line-numbers-in-cygwin