Coder Perfect

Is there anything wrong with using varchar(255) for all text-based fields?


I have a contacts table with fields like postcode, first name, last name, town, country, phone number, and so on, all of which are VARCHAR(255), despite the fact that none of these values would ever have 255 characters. (If you’re wondering why, it’s because Ruby on Rails migrations default to mapping String fields to VARCHAR(255), and I never bothered to change it.)

Is there any clear advantage (performance or otherwise) to utilizing VARCHAR(16) over VARCHAR(255) because VARCHAR just stores the number of actual characters in the field (together with the field length)?

In addition, the majority of these fields have indexes. Does a bigger VARCHAR size on the field have any impact on the index’s size or performance?

FYI MySQL 5 is what I’m using.

Asked by Olly

Solution #1

Unlike CHAR(255), which always stores 255 characters, VARCHAR(255) is smart enough to save only the length you need on a particular row.

However, because you asked about MySQL, I’ll give a MySQL-specific tip: when rows are copied from the storage engine layer to the SQL layer, VARCHAR columns are changed to CHAR to enjoy the benefit of working with fixed-width rows. As a result, the strings in memory are padded to the maximum length of the VARCHAR column you declared.

When your query generates a temporary table by default, like as when sorting or using GROUP BY, it can consume a lot of RAM. When you use a lot of VARCHAR(255) fields for data that doesn’t need to be that long, the temporary table can get rather enormous.

You might also be interested to know that a string defined with the utf8 character set pads out to three bytes per character, even if the string contains single-byte information (e.g. ascii or latin1 characters). Similarly, the utf8mb4 character set causes the string in memory to be pad out to four bytes each character.

So, in utf8, a VARCHAR(255) holding a short string like “No opinion” consumes 11 bytes on disk (ten lower-charset characters + one byte for length), but 765 bytes in memory, and hence in temp tables or sorted results.

I’ve assisted MySQL users who had unintentionally built 1.5GB transient tables and were running out of disk space. They had a lot of VARCHAR(255) columns, which were used to store very small strings in practice.

It’s ideal to name the column after the type of data you’ll be storing. As others have stated, it offers advantages in enforcing application-related limitations. However, it offers the physical advantages of avoiding the memory waste I mentioned earlier.

Of course, determining the longest postal address is difficult, which is why many people choose for a long VARCHAR that is at least as long as any address. And 255 is the most common value because it is the maximum length of a VARCHAR that can be encoded in one byte. In MySQL versions prior to 5.0, it was also the maximum VARCHAR length.

Answered by Bill Karwin

Solution #2

The disadvantage of using varchar(255) “just because” is poorer data integrity. This is in addition to the size and performance considerations of setting the size of a varchar (and arguably more relevant, as storage and processing get cheaper by the second).

Setting maximum string lengths prevents larger texts from entering the RDBMS, creating buffer overruns or exceptions/errors later when fetching and parsing values from the database that are longer (more bytes) than expected.

If you have a field that accepts two-character strings for nation abbreviations, for example, there’s no need to expect your users (in this case, programmers) to type in complete country names. You don’t allow them to input “Antigua and Barbuda” (AG) or “Heard Island and McDonald Islands” (HM) at the database layer because you don’t want them to. Also, it’s possible that some programmers haven’t yet RTFMed the design documentation (which must exist) to know that this isn’t a good idea.

Set the field to accept two characters and leave it up to the RDBMS (either gracefully by truncating or ungracefully by rejecting their SQL with an error).

Real-life examples of data that does not need to be longer than a specific length:

And so on…

Take some time to consider your data and its limitations. After all, if you’re an architect, developer, or programmer, it’s your job.

You avoid the problem of people (end-users, programmers, and other programs) entering unexpectedly large data that will come back to bother your code later by utilizing a varchar(n) instead of a varchar(255).

And I didn’t say you couldn’t apply this constraint in your application’s business logic code.

Answered by shufler

Solution #3

I agree with you. Obsessive attention to detail is inconvenient and ineffective.

We used to sweat bullets to optimize disk since it was such a valuable commodity. Because the cost of storage has dropped by a factor of 1,000, the time spent squeezing every bit has become less useful.

You can get fixed-length rows if you exclusively use CHAR fields. If you choose accurate field sizes, this can save some storage space. You might receive more densely packed data (fewer I/Os for table scans) and faster updates if you use this method (easier to locate open spaces in a block for updates and inserts.)

However, if your data sizes are changeable or you overestimate your sizes, you’ll end up wasting space with CHAR fields. Data will be less densely packed, requiring greater I/O for large retrievals.

Attempting to set a size on changeable fields generally yields minimal performance gains. You may easily benchmark by comparing VARCHAR(255) to CHAR(x) to check if the difference can be measured.

However, there are situations when I need to give a “little,” “medium,” or “big” suggestion. As a result, I use sizes of 16, 64, and 255.

Answered by S.Lott

Solution #4

Nowadays, i can’t imagine it really matters any more.

Using variable length fields has a computational expense, but with today’s overabundance of CPUs, it’s not even worth considering. Because the I/O system is so slow, any computational costs associated with handling varchars are virtually non-existent. In reality, the computational cost of a varchar is likely to outweigh the amount of diskspace saved by employing variable length fields instead of fixed length fields. You almost certainly have a higher row density.

The problem with varchar fields is that you can’t simply find a record by its record number. It’s simple to compute the disk block that a row id points to when you have a fixed length row size (with fixed length fields). That kind of goes out the window with a configurable length rowsize.

So, either create a robust row identification that encodes details (such as the block, etc.) in to the identifier, or keep some kind of record number index, just like any other primary key. However, if the row is ever relocated to persistent storage, the id will have to be recalculated. It’s not a big deal; all you have to do now is redo all of the index entries and make sure you either a) never reveal the number to the consumer or b) never state that the number is reliable.

However, since we now have varchar fields, the only advantage of varchar(16) over varchar(255) is that the database will enforce the 16-character restriction (16). Field lengths can be useful if the DB model is designed to be a true representation of the physical data model. If it’s simply “storage” rather than “model AND storage,” though, there’s no requirement.

Then you simply need to distinguish between an indexable text field (such as varchar) and one that isn’t (like a text or CLOB field). To make indexing easier, indexable fields usually have a size limit, whereas CLOB fields don’t (within reason).

Answered by Will Hartung

Solution #5

If you allow a datatype of 255 characters, I’ve found that some inexperienced user (or experienced tester) will actually fill it completely.

Then you’re faced with a slew of issues, including how much room you give those fields in reports and on-screen displays in your app. Not to mention the possibility of your database’s data surpassing the per-row limit (if you had more than a few of these 255 character fields).

It’s far easier to start with a sensible restriction and then enforce it through the application and database.

Answered by BradC

Post is based on