I’m developing a high-volume web application, where part of it is a MySQL database of discussion posts that will need to grow to 20M+ rows, smoothly.
I had intended to use MyISAM for the tables (because of the built-in fulltext search capabilities), but the prospect of the entire database being locked due to a single write transaction makes me nervous. Row-level locks make a lot more sense (especially considering InnoDB’s additional speed advantages when dealing with large tables). As a result, I’m fairly adamant on using InnoDB.
The issue is that InnoDB lacks built-in fulltext search capabilities.
Is it better to use a third-party search engine? Like Sphinx/Lucene(c++)? Do any of you database ninjas have any advice or suggestions? At the moment, LinkedIn’s zoie (based on Lucene) appears to be the best alternative… having been created towards realtime capabilities (which is pretty critical for my application.) I’m hesitant to commit until I get more information…
(FYI: the frontend will be served by PHP on EC2 with high-memory rigs.)
Asked by brianreavis
InnoDB full-text search (FTS) is now available in MySQL 5.6.4, along with the overall phase-out of MyISAM.
https://dev.mysql.com/doc/refman/5.6/en/innodb-fulltext-index.html has a lot of tasty details.
While other engines offer a variety of capabilities, this one is InnoDB-based, which means it’s native (and hence upgradeable), making it a viable option.
Answered by Jeremy Smyth
I can attest to the fact that MyISAM fulltext is a horrible solution; aside from the other issues with MyISAM tables in general, I’ve seen the fulltext stuff run off the tracks and start corrupting itself and crashing MySQL on a regular basis.
The most flexible approach here is to use a specialized search engine to store the post data in MySQL/innodb and then export the text to your search engine. You can easily set up a full index build/publish schedule, and you can even include real-time index updates if you feel the need and have the time.
Lucene and Sphinx, as well as Xapian, which is elegant and lightweight, are good choices. If you follow the Lucene way, don’t assume Clucene will be better, even if you’d rather not deal with Java; I’m not competent to debate the advantages and disadvantages of either.
Answered by Ian Wilkes
You should set out an hour to go over Sphinx and Lucene installation and testing. Check to see if either fits your data-update requirements.
One of the things that has frustrated me about Sphinx is how poorly it handles incremental insertion. That example, reindexing after an insert is quite expensive, thus their suggested solution is to separate your data between older, non-volatile rows and newer, volatile ones. As a result, every search your app performs will require two searches: one on the bigger index for old rows and another on the smaller index for recent rows. This Sphinx isn’t a good option if that doesn’t fit your usage habits (at least not in its current implementation).
I’d like to suggest Google Custom Search as another option for you to consider. Outsource the indexing and search function to Google and incorporate a Google search textfield onto your site if you can add some SEO to your web application. It may be the most cost-effective and scalable method of making your website searchable.
Answered by Bill Karwin
Perhaps you shouldn’t dismiss MySQL’s FT so quickly. Craigslist used to use it.
Craigslist appears to have moved to Sphinx sometime in early 2009, as mentioned below.
Answered by bobobobo
As you point out, Sphinx is a great tool for this. The configuration file contains all of the work. Make sure that whatever table you’re using to store the strings has a unique integer id key, and you’ll be fine.
Answered by Gregg Lind
Post is based on https://stackoverflow.com/questions/1381186/fulltext-search-with-innodb