Coder Perfect

What does it mean to open a file?


Before you can read or write to a file in any programming language (at least in the ones I use), you must first open it.

But what exactly does this open procedure accomplish?

The manual pages for common functions don’t provide much more than it ‘opens a file for reading/writing’:

Obviously, through usage of the function you can tell it involves creation of some kind of object which facilitates accessing a file.

Another way to say it is, what would an open function have to perform on Linux if I were to implement it?

Asked by jramm

Solution #1

The function that opens a file in practically every high-level language is a wrapper around the matching kernel system call. It may also do other sophisticated things, but in today’s operating systems, opening a file always has to travel via the kernel.

This is why the fopen library function, or Python’s open, has arguments that are quite similar to the open(2) system call.

These functions normally create a buffer that will be used with the read/write operations in addition to opening the file. This buffer’s goal is to ensure that anytime you want to read N bytes, the matching library function returns N bytes, regardless of whether the underlying system calls return less.

A successful call to open on a Unix-like operating system returns a “file descriptor,” which is just an integer in the context of the user process. This descriptor is consequently provided to any call that interacts with the opened file, and the descriptor becomes invalid after calling close on it.

It’s vital to remember that the call to open serves as a validation point for various tests. If any of the conditions are not met, the call will fail and return -1 instead of the descriptor, with the type of error stated in errno. The following are the necessary checks:

There must be some form of mapping between the process’ file descriptors and the physically opened files in the kernel. Another buffer for block-based devices, or an internal pointer to the current read/write location, could be found in the internal data structure mapped to the descriptor.

Answered by Blagovest Buyukliev

Solution #2

I recommend that you check over this guide, which walks you through a simplified version of the open() system call. It makes use of the code snippet below, which represents what happens behind the scenes when you open a file.

0  int sys_open(const char *filename, int flags, int mode) {
1      char *tmp = getname(filename);
2      int fd = get_unused_fd();
3      struct file *f = filp_open(tmp, flags, mode);
4      fd_install(fd, f);
5      putname(tmp);
6      return fd;
7  }

Briefly, here’s what that code does, line by line:

If you’re feeling ambitious, you may compare this simplified example to the Linux kernel’s implementation of the open() system call, do sys open (). You should have no issue spotting the parallels.

Of fact, this is merely the “top layer” of what happens when you call open() – or, to put it another way, it’s the highest-level kernel code that’s called during the process of opening a file. On top of that, a high-level programming language may add more levels. At the lowest levels, there’s a lot going on. (Thank you for explaining, Ruslan and pjc50.) From top to bottom, roughly:

Due to caching, this may possibly be wrong. 😛 But, seriously, there are so many things that I’ve left out that someone (not me) could write many books about how this entire process works. However, this should give you a general sense.

Answered by David Z

Solution #3

I’m open to discussing any file system or operating system you desire. Nice!

Initializing a LOAD command on a ZX Spectrum will put the system into a tight loop, reading the Audio In line.

A steady tone signals the start of data, followed by a series of long/short pulses, with a small pulse indicating a binary 0 and a longer pulse indicating a binary 1 ( Spectrum software). The tight load loop collects bits until a byte (8 bits) is filled, then saves it in memory, increments the memory address, then cycles back to look for additional bits.

A short, fixed format header, indicating at least the number of bytes to expect and optionally extra information such as file name, file type, and loading address, is usually the first thing a loader reads. The program might decide whether to continue loading the main bulk of the data after reading this short header, or to exit the loading routine and display an appropriate message to the user.

Receiving as many bytes as expected could signal an end-of-file status (either a fixed number of bytes, hardwired in the software, or a variable number such as indicated in a header). If the loading loop did not get a pulse in the expected frequency range for a specific amount of time, an error was thrown.

A little backstory on this response

The procedure described loads data from a regular audio tape – hence the need to scan Audio In (it connected with a standard plug to tape recorders). A LOAD command is technically the same as open a file – but it’s physically tied to actually loading the file. This is because the tape recorder is not controlled by the computer, and you cannot (successfully) open a file but not load it.

The “tight loop” is mentioned because (1) the CPU, a Z80-A (if memory serves), ran at a snail’s pace of 3.5 MHz, and (2) the Spectrum had no internal clock! That meant it had to keep track of the T-states (instruction timings) for each and every instruction inside that loop only to keep the beep timing precise. Fortunately, the slow CPU speed had the virtue of allowing you to compute the number of cycles on a sheet of paper, and consequently the amount of time they would take in real life.

Answered by Jongware

Solution #4

What occurs when you open a file depends on your operating system. I’ve described what happens under Linux below to give you an idea of what happens when you open a file, and you can look at the source code for more information if you’re interested. Permissions are not covered because it would make this response too long.

Every file in Linux is recognized by an inode structure. Each structure is assigned a unique number, and each file is assigned only one inode number. This structure stores file meta data, such as file size, file permissions, time stamps, and pointers to disk blocks, but not the actual file name. Each file (and directory) has a file name entry as well as a lookup inode number. When you open a file, a file descriptor is created using the unique inode number associated with the file name, assuming you have the appropriate permissions. Inode contains a link field that keeps track of the total number of links to the file because multiple processes/applications might point to the same file. If a file exists in a directory, its link is preserved.

Answered by Alex

Solution #5

Mostly bookkeeping. This includes checks such as “Does the file exist?” and “Do I have the permissions to write to this file?”

But that’s all kernel stuff; unless you’re creating your own toy OS (in which case, have fun – it’s a terrific learning experience), there’s not much else to learn. Of course, you should learn all of the possible error codes that you can encounter when opening a file so that you can deal with them appropriately – but they are typically pleasant little abstractions.

On a code level, the most significant component is that it offers you a handle to the open file, which you use for all other file operations. Isn’t it possible to utilise the filename instead of this random handle? Sure, however there are several benefits to utilising a handle:

I’m not going to get into how to open a channel without using a physical file (on unix systems, files are also used for devices and several other virtual channels, so this isn’t technically necessary), but they aren’t really related to the open process itself.

Answered by Luaan

Post is based on