Coder Perfect

What is the quickest method for reading a text file line by line?

Problem

I’d like to go through a text file line by line. I wanted to see if I was doing it as efficiently as feasible within the confines of the.NET C# framework.

So far, this is what I’ve tried:

var filestream = new System.IO.FileStream(textFilePath,
                                          System.IO.FileMode.Open,
                                          System.IO.FileAccess.Read,
                                          System.IO.FileShare.ReadWrite);
var file = new System.IO.StreamReader(filestream, System.Text.Encoding.UTF8, true, 128);

while ((lineOfText = file.ReadLine()) != null)
{
    //Do something with the lineOfText
}

Asked by Loren C Fortner

Solution #1

You’ll need to do some benchmarking to figure out the fastest technique to read a file line by line. I conducted some short experiments on my PC, but you should not expect my findings to apply to your situation.

Using StreamReader.ReadLine

This is essentially your strategy. You set the buffer size to the minimum possible value for some reason (128). Increasing this will boost overall performance. The default size is 1,024 bytes, however 512 bytes (the size of a Windows sector) or 4,096 bytes are also suitable options (the cluster size in NTFS). To establish the best buffer size, you’ll need to run a benchmark. If not faster, a larger buffer is at least not slower than a smaller buffer.

const Int32 BufferSize = 128;
using (var fileStream = File.OpenRead(fileName))
  using (var streamReader = new StreamReader(fileStream, Encoding.UTF8, true, BufferSize)) {
    String line;
    while ((line = streamReader.ReadLine()) != null)
      // Process line
  }

FileOptions can be specified in the FileStream constructor. For example, if you are reading a large file sequentially from beginning to end, you may benefit from FileOptions. SequentialScan. Again, benchmarking is the best thing you can do.

Using File.ReadLines

This is fairly similar to your own solution, with the exception that it uses a StreamReader with a fixed buffer size of 1,024. This gives somewhat better performance on my PC than your code with a buffer size of 128. Using a higher buffer size, on the other hand, will give you the same performance boost. This approach uses an iterator block and does not use all of the available memory.

var lines = File.ReadLines(fileName);
foreach (var line in lines)
  // Process line

Using File.ReadAllLines

This is very much like the previous method except that this method grows a list of strings used to create the returned array of lines so the memory requirements are higher. However, it returns String[] and not an IEnumerable allowing you to randomly access the lines.

var lines = File.ReadAllLines(fileName);
for (var i = 0; i < lines.Length; i += 1) {
  var line = lines[i];
  // Process line
}

Using String.Split

Due to the way String.Split is performed, this method is significantly slower, at least on large files (tested on a 511 KB file). In comparison to your solution, it also allocates an array for all of the lines, increasing the amount of RAM required.

using (var streamReader = File.OpenText(fileName)) {
  var lines = streamReader.ReadToEnd().Split("\r\n".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
  foreach (var line in lines)
    // Process line
}

Because it is clean and efficient, I recommend using File.ReadLines. If you require unique sharing capabilities (for example, if you utilize FileShare). You can use your own code instead of ReadWrite, but you should increase the buffer size.

Answered by Martin Liversage

Solution #2

Simply use File if you’re using.NET 4. ReadLines is a program that takes care of everything for you. I’m guessing it’s similar to yours, but it might also use FileOptions. A bigger buffer and SequentialScan (128 seems very small).

Answered by Jon Skeet

Solution #3

While File.ReadAllLines() is one of the most straightforward methods for reading a file, it is also one of the slowest.

If you merely want to read lines in a file without doing anything more, these benchmarks show that the fastest way to read a file is to:

using (StreamReader sr = File.OpenText(fileName))
{
        string s = String.Empty;
        while ((s = sr.ReadLine()) != null)
        {
               //do minimal amount of work here
        }
}

However, if you need to do a lot with each line, this article finds that the best method is as follows (and pre-allocating a string[] is faster if you know how many lines you’ll read):

AllLines = new string[MAX]; //only allocate memory here

using (StreamReader sr = File.OpenText(fileName))
{
        int x = 0;
        while (!sr.EndOfStream)
        {
               AllLines[x] = sr.ReadLine();
               x += 1;
        }
} //Finished. Close the file

//Now parallel process each line in the file
Parallel.For(0, AllLines.Length, x =>
{
    DoYourStuff(AllLines[x]); //do your work here
});

Answered by Free Coder 24

Solution #4

Use the following code to get started:

foreach (string line in File.ReadAllLines(fileName))

This was a significant gap in reading ability.

It comes at a cost in terms of memory usage, but it’s well worth it!

Answered by user2671536

Solution #5

If the file is not large, it is faster to read the complete file and then split it.

var filestreams = sr.ReadToEnd().Split(Environment.NewLine, 
                              StringSplitOptions.RemoveEmptyEntries);

Answered by Saeed Amiri

Post is based on https://stackoverflow.com/questions/8037070/whats-the-fastest-way-to-read-a-text-file-line-by-line