Problem
Is there a simple way to determine the number of lines in a text file programmatically?
Asked by TK.
Solution #1
If you’re using.NET 4.0 or later, you can skip this step.
Instead of greedily reading all lines into an array like ReadAllLines, the File class now offers a new ReadLines function that lazily enumerates lines. As a result, you may now have both efficiency and brevity with:
var lineCount = File.ReadLines(@"C:\file.txt").Count();
Original Answer
If efficiency isn’t important to you, you can simply write:
var lineCount = File.ReadAllLines(@"C:\file.txt").Length;
You might use the following method for a more efficient method:
var lineCount = 0;
using (var reader = File.OpenText(@"C:\file.txt"))
{
while (reader.ReadLine() != null)
{
lineCount++;
}
}
In response to questions concerning efficiency, I’ve made the following changes.
The reason I said the second was more efficient was because it used less memory, not because it was faster. The first loads the whole contents of the file into an array, requiring at least as much memory as the file size. The second repeats one line at a time, ensuring that no more than one line’s worth of memory is allocated at any given time. This isn’t a big deal for small files, but it can be a problem for bigger files (for example, trying to find the number of lines in a 4GB file on a 32-bit machine, where there isn’t enough user-mode address space to allocate an array this big).
In terms of speed, I don’t expect it to be very fast. It’s possible that ReadAllLines has some internal optimizations, but it’s also likely that it needs to allocate a large amount of memory. I’d think that ReadAllLines is faster for small files but much slower for large files; however, the only way to know for sure would be to use a Stopwatch or a code profiler to measure it.
Answered by Greg Beech
Solution #2
The easiest:
int lines = File.ReadAllLines("myfile").Length;
Answered by leppie
Solution #3
This would use less memory, but it would almost certainly take longer.
int count = 0;
string line;
TextReader reader = new StreamReader("file.txt");
while ((line = reader.ReadLine()) != null)
{
count++;
}
reader.Close();
Answered by benPearce
Solution #4
If by simple you mean lines of code that are simple to understand yet possibly inefficient?
string[] lines = System.IO.File.RealAllLines($filename);
int cnt = lines.Count();
That’s most likely the simplest approach to figure out how many lines there are.
You might also consider (depending on if you are buffering it in)
#for large files
while (...reads into buffer){
string[] lines = Regex.Split(buffer,System.Enviorment.NewLine);
}
There are countless other options, but you’ll most likely choose one of the aforementioned.
Answered by user8456
Solution #5
Reading a file takes some time, and trash collection is a challenge because you have to read the entire file just to count the newline character (s),
Regardless of whether this is the framework or your code, someone will have to read the characters in the file at some point. This implies you’ll have to open the file and read it into memory, which could be an issue if the file is large and the memory needs to be garbage collected.
Nima Ara provided an excellent analysis that you should consider.
The proposed technique reads four characters at a time, counts the line feed character, and then compares the next character using the same memory location.
private const char CR = '\r';
private const char LF = '\n';
private const char NULL = (char)0;
public static long CountLinesMaybe(Stream stream)
{
Ensure.NotNull(stream, nameof(stream));
var lineCount = 0L;
var byteBuffer = new byte[1024 * 1024];
const int BytesAtTheTime = 4;
var detectedEOL = NULL;
var currentChar = NULL;
int bytesRead;
while ((bytesRead = stream.Read(byteBuffer, 0, byteBuffer.Length)) > 0)
{
var i = 0;
for (; i <= bytesRead - BytesAtTheTime; i += BytesAtTheTime)
{
currentChar = (char)byteBuffer[i];
if (detectedEOL != NULL)
{
if (currentChar == detectedEOL) { lineCount++; }
currentChar = (char)byteBuffer[i + 1];
if (currentChar == detectedEOL) { lineCount++; }
currentChar = (char)byteBuffer[i + 2];
if (currentChar == detectedEOL) { lineCount++; }
currentChar = (char)byteBuffer[i + 3];
if (currentChar == detectedEOL) { lineCount++; }
}
else
{
if (currentChar == LF || currentChar == CR)
{
detectedEOL = currentChar;
lineCount++;
}
i -= BytesAtTheTime - 1;
}
}
for (; i < bytesRead; i++)
{
currentChar = (char)byteBuffer[i];
if (detectedEOL != NULL)
{
if (currentChar == detectedEOL) { lineCount++; }
}
else
{
if (currentChar == LF || currentChar == CR)
{
detectedEOL = currentChar;
lineCount++;
}
}
}
}
if (currentChar != LF && currentChar != CR && currentChar != NULL)
{
lineCount++;
}
return lineCount;
}
The underlying framework reads a line one character at a time, as you must read all characters to view the line feed, as shown above.
If you profile it as done by Nima, you’ll notice that it’s a fairly quick and efficient method of accomplishing this.
Answered by Walter Verhoeven
Post is based on https://stackoverflow.com/questions/119559/determine-the-number-of-lines-within-a-text-file