Coder Perfect

Why is stdout printing so slow? Is it possible to speed things up?


The length of time it takes to just print to the terminal with a print statement has always amazed/frustrated me. I decided to investigate some recent very sluggish logging and was astonished to see that virtually all of the time is spent waiting for the terminal to process the information.

Is there a way to make writing to stdout faster?

I created a script (‘print’ at the bottom of this question) to compare timing when printing 100k lines to stdout, file, and /dev/null. The following is the outcome of the timing:

$ python
this is a test
this is a test
<snipped 99997 lines>
this is a test
timing summary (100k lines each)
print                         :11.950 s
write to file (+ fsync)       : 0.122 s
print with stdout = /dev/null : 0.050 s

Wow. I did the redirection outside the script to make sure python wasn’t doing anything behind the scenes, like recognizing that I redirected stdout to /dev/null or anything…

$ python > /dev/null
timing summary (100k lines each)
print                         : 0.053 s
write to file (+fsync)        : 0.108 s
print with stdout = /dev/null : 0.045 s

So it isn’t a python trick, but rather a terminal trick. I’d always known that dumping output to /dev/null sped things up, but I didn’t realize how much!

The tty’s slowness astounds me. How is it that writing to actual disk is significantly faster than writing to the “screen” (probably an all-RAM op) and is effectively as fast as simply dumping to the garbage with /dev/null?

This link talks about how the terminal will block I/O so it can “parse [the input], update its frame buffer, communicate with the X server in order to scroll the window and so on”… but I don’t fully get it. What can be taking so long?

I’m guessing there’s no way out (except from a quicker tty implementation?). However, I figured I’d inquire anyhow.

UPDATE: after reading some comments, I was curious as to how much my screen size affects the print time, and it turns out that it does. With my Gnome terminal blown out to 1920×1200, the statistics above are extremely slow. I get… if I shrink it to a very little size.

timing summary (100k lines each)
print                         : 2.920 s
write to file (+fsync)        : 0.121 s
print with stdout = /dev/null : 0.048 s

That’s a lot better (4x), but it still doesn’t answer my question. It merely adds to my puzzle, because I’m not sure why the terminal screen displaying should slow down an application outputting to stdout. Why is it necessary for my program to wait for screen rendering to finish?

Aren’t all terminal/tty programs the same? I haven’t tried anything yet. A terminal should, in my opinion, be able to buffer all incoming data, parse/render it invisibly, and only render the most recent piece visible in the current screen setup at a reasonable frame rate. So, if I can write+fsync to disk in 0.1 seconds, a terminal should be able to do the same in around the same time (with maybe a few screen updates while it did it).

I’m still expecting that there’s a tty setting that can be modified from the application side to improve programmer behavior. If this is simply a terminal application issue, is it possible that this shouldn’t even be on StackOverflow?

What am I overlooking?

The python program that was used to generate the time is as follows:

import time, sys, tty
import os

lineCount = 100000
line = "this is a test"
summary = ""

cmd = "print"
startTime_s = time.time()
for x in range(lineCount):
    print line
t = time.time() - startTime_s
summary += "%-30s:%6.3f s\n" % (cmd, t)

#Add a newline to match line outputs above...
line += "\n"

cmd = "write to file (+fsync)"
fp = file("out.txt", "w")
startTime_s = time.time()
for x in range(lineCount):
t = time.time() - startTime_s
summary += "%-30s:%6.3f s\n" % (cmd, t)

cmd = "print with stdout = /dev/null"
sys.stdout = file(os.devnull, "w")
startTime_s = time.time()
for x in range(lineCount):
t = time.time() - startTime_s
summary += "%-30s:%6.3f s\n" % (cmd, t)

print >> sys.stderr, "-----"
print >> sys.stderr, "timing summary (100k lines each)"
print >> sys.stderr, "-----"
print >> sys.stderr, summary

Asked by Russ

Solution #1

Congratulations, you’ve now realized how critical I/O buffering is. 🙂

Because the drive is substantially buffered, it looks to be faster: all Python write() methods return before anything is really written to the disc. (This is done later by the operating system, which combines thousands of individual writes into large, efficient chunks.)

The terminal, on the other hand, offers little or no buffering: each print / write(line) must wait for the entire write (i.e. display to output device) to complete before proceeding.

You may make the file test utilize the same output buffering as the terminal to make the comparison fair, which you can do by changing your example to:

fp = file("out.txt", "w", 1)   # line-buffered, like stdout
for x in range(lineCount):
    os.fsync(fp.fileno())      # wait for the write to actually complete

I ran your file writing test on my system, and it took 0.05s for 100,000 lines with buffering.

However, with the above modifications to write unbuffered, writing 1,000 lines to disk takes 40 seconds. I gave up waiting for 100,000 lines to write, but based on recent experience, it would take well over an hour.

Doesn’t that put the 11 seconds spent at the terminal into perspective?

To address your initial question, writing to a terminal is actually quite quick, all things considered, and there isn’t much potential for improvement (though particular terminals vary in how much work they do; see Russ’s comment to this response).

(You could increase write buffering, like with disk I/O, but you wouldn’t be able to see what was written to your terminal until the buffer was flushed.) Interactivity versus bulk efficiency is a trade-off.)

Answered by Pi Delport

Solution #2

Thank you for all of the feedback! With your assistance, I was able to answer it myself. However, it feels unclean to answer your own question.

Question 1: Why is stdout printing so slow?

Printing to stdout is not always sluggish. It’s the terminal you’re using that’s sluggish. And it has almost nothing to do with application-side I/O buffering (eg: python file buffering). See the list below.

Question 2: Is it possible to speed things up?

Yes, it appears that it cannot be done from the program side (the side that is ‘writing’ to stdout). Use a faster terminal emulator to speed things up.


I tried wterm, a self-described “lightweight” terminal application, and got much better results. On the same system where the simple print option took 12s using gnome-terminal, this is the output of my test script (at the bottom of the question) when ran in wterm at 1920×1200:

timing summary (100k lines each)
print                         : 0.261 s
write to file (+fsync)        : 0.110 s
print with stdout = /dev/null : 0.050 s

0.26s is a huge improvement over 12s! I’m not sure if wterm is smarter than gnome-terminal when it comes to rendering to screen, like I suggested (draw the ‘visible’ tail at a decent frame rate), or if it simply “does less.” But, for the purposes of my question, I have the answer. Gnome-terminal is a sluggish program.

So, if you have a long-running script that you think is slow and spews a lot of text to stdout, try a different terminal to see if it helps!

Note that I grabbed wterm from the Ubuntu/Debian repository at random. I’m not sure if this is the same terminal as the last one. Other terminal emulators were not tested.

Update: To satisfy my need, I tried a slew of additional terminal emulators with the same script and full screen mode (1920×1200). Here are the stats I manually gathered:

wterm           0.3s
aterm           0.3s
rxvt            0.3s
mrxvt           0.4s
konsole         0.6s
yakuake         0.7s
lxterminal        7s
xterm             9s
gnome-terminal   12s
xfce4-terminal   12s
vala-terminal    18s
xvt              48s

The times were manually recorded, although they were quite constant. I made a note of the best(ish) value. Obviously, your mileage may vary.

It was also an intriguing tour of some of the many terminal emulators available! I’m surprised that my first ‘alternative’ test was the best of the bunch.

Answered by Russ

Solution #3

Your redirection probably does nothing as programs can determine whether their output FD points to a tty.

When pointing to a terminal, stdout is likely to be line buffered (as is the case with C’s stdout stream).

As an amusing experiment, try piping the output to cat.

I conducted my own hilarious experiment, and the findings are as follows.

$ python 2>foo
$ cat foo
timing summary (100k lines each)
print                         : 6.040 s
write to file                 : 0.122 s
print with stdout = /dev/null : 0.121 s

$ python 2>foo |cat
$ cat foo
timing summary (100k lines each)
print                         : 1.024 s
write to file                 : 0.131 s
print with stdout = /dev/null : 0.122 s

Answered by Hasturkun

Solution #4

I can’t comment on the technical details because I’m not familiar with them, but this doesn’t surprise me: the terminal was not meant for printing large amounts of data. Indeed, you include a link to a slew of graphical interface tasks that it must perform every time you wish to print something! It’s worth noting that if you use pythonw instead of python, the script does not take 15 seconds; this is totally a GUI issue. To circumvent this, redirect stdout to a file:

import contextlib, io
def redirect_stdout(stream):
    import sys
    sys.stdout = stream
    sys.stdout = sys.__stdout__

output = io.StringIO
with redirect_stdout(output):

Answered by Katriel

Solution #5

Printing to the terminal is going to be slow. Unfortunately short of writing a new terminal implementation I can’t really see how you’d speed this up significantly.

Answered by shuttle87

Post is based on