MySQL:Indexes
Queries can take a while before the result is displayed. Indexing helps to speed up the query process although there are other ways of helping to improve the speed of querying the database, such as:
- Care in writing the query so as to best use the indexes
- Proper configuration of the server software so as to efficiently handle queries from multiple clients
- Proper configuration of the servers hardware to improve performance
You will go a long way to improve the query speed if you first consider indexing your tables before trying any other means of increasing speed. Bear in mind that indexes can also hinder the performance of your database.
Advantages of using indexes
Consider a table consisting of 3 columns:
- ID number
- First name
- Last name
If we were to search for a particular person, we would have to scan the whole table until we find a row where all three columns match our required name. This would be slow and inefficient. Suppose we indexed the first name column, this column would contain all the first names (some could be duplicates), sorted alphabetically. These would link to the corresponding record in the table (ID number, first name & last name). Now if we were to search for a particular first name, the scan would immediately jump to the first occurrence of that name (in the index), display the record (the complete record in the table) linked to that occurrence then move on repeatedly, each time displaying the relevant record, until there is no longer a match in the index (remember that the index is sorted alphabetically). This process is much faster as there is no need to scan through the whole table. The scan immediately jumps to the first occurrence of the match and then leaves once it reaches the last occurrence (due to the sorted index).
This index would only work if we were searching the first name column. What if we were searching the ID or last name columns? No problem, it is possible to have many indexes in one table so we could also have an ID index as well as a last name index. Consider the speed differences in searching a database containing a million records where the id numbers are stored randomly as opposed to searching the index of this database where the indexed ID values are stored in numerical order. The indexed value would be found almost instantaneously.
Indexes are also valuable when queries involve table joins. Single table queries are faster as the maximum number of values to be scanned per column is equal to the number of rows in the table. In a 3 column table containing 10 records, this would be a maximum of 30 records. Compare this to a query involving 3 tables, each with 3 columns and each having 10 records. With no index, we would have to scan all possible combinations: 10 x 10 x 10 or 1000 possible combinations! Naturally the larger the tables and the more records they contain, the more possible combinations and the longer the query will take. An index drastically reduces the number of combinations and consequently the processing time.
MySQL uses indexes as follows:
- speeds up searches for rows matching the WHERE clause or matching rows in a join
- queries of the index using MIN( ) and MAX( ) functions to find the largest and smallest values are solved without examining each row
- uses index to sort or group quickly using ORDER BY and GROUP BY clauses
- can read the queried value directly from the index without having to scan the columns
Disadvantages of using indexes
The advantages far outweigh the disadvantages but bear the following in mind when creating indexes:
- Indexes speed up retrieval but slow down inserts, deletes and updates of values in indexed columns. These operations all involve writing, not only the column values but also in the index. Consequently the more indexes there are, the more changes need to be made and naturally more time is needed for these changes.
- Indexes take up disk space. This may be a problem in certain of the MySQL storage engines where table size is an issue (MyISAM and BDB) while the InnoDB storage engine can be configured to use multiple files and so is not limited by the operating systems file-size limit.
Choosing indexes
- Index columns used for searching, sorting and grouping. Don’t index columns used for display. The best columns are those appearing in the WHERE, ORDER BY or GROUP BY clauses as well as columns named in join clauses. So in the following:
SELECT column1 from table1 LEFT JOIN table2 ON table1.column2 = table2.column3 WHERE column4 = searchTerm;
column2, column3 and column4 are candidates for indexing.
- Column cardinality. This is the number of unique values contained in a column. Indexes work best when there is a high cardinality. Put another way, the more unique values there are (fewer duplicates) the better that column will be for indexing. Consider the ID number column of the previous example. Here there are no duplicates, only unique values. This column will be ideal for indexing. On the other end of the scale may be the first names column. Here there will probably be a number of duplicate names (fewer unique values) and a lower cardinality compared to the ID column.
- Index short values. Use smaller data types as these improve indexing:
- shorter values are easier to compare
- results in smaller indexes
- enables more keys to be stored in memory and therefore fewer disk reads
For clustered indexes (data rows are stored with the primary keys) such as is used by the InnoDB and BDB storage engines, it is best to keep the primary key short. In other indexes (secondary indexes), the primary key is stored in each (secondary) index with the result that longer keys take up more space.
- Index prefixes of strings. If possible indicate a prefix length when indexing strings. For example if a column has a data type of CHAR(200) and most of the unique values are within the first 10 characters, then index only these first 10 characters and not all 200. It will save space and improve performance. Data types such as CHAR, VARCHAR, BINARY, VARBINARY, BLOB and TEXT all lend themselves to indexing the prefixes.
- Leftmost prefixes. A 3 column composite index is actually 3 indexes. The leftmost columns in this index are the leftmost prefix. The index can only be used if the leftmost prefix is involved. So in a 3-columned composite index, ID, first name and last name the leftmost prefix would be:
- ID
- ID, first_name
- ID, first_name, last_name
- you cannot use first_name or last_name as an index as they do not contain the leftmost prefix
- Too many indexes. Remember that each additional index uses space and affects performance. Also, MySQL may not choose the correct index when there are too many.
- Match index types to types of comparisons performed.
- Hash indexes are good for exact-match comparisons (= or <=>) and poor for range values (BETWEEN 100 AND 150). Hash index is the default for MEMORY tables.
- B-tree indexes are good for exact or range-based comparisons and LIKE pattern matches where the pattern begins with a literal string and not a wildcard.
- The slow query log identifies badly performing queries. The slow query log can identify queries that may benefit from indexing. View the log either in text form or by using mysqldumpslow. Duplicate queries here will indicate a need for a query rewrite. The only problem with the slow query log is that a busy server causes more queries to end up here than a not so busy server.
Remember that indexing is just one way of speeding up the query process. Others involve the choice of storage engine, data types, etc.
See also the article on configuring the MyISAM storage engine as well as the article on the MyISAM storage engine.
Sorry, there are no related posts but check these out
Comments
Pingback from Independent Digital » MySQL: query optimization
Time: June 5, 2007, 3:20 pm
[...] speed and accuracy firstly, and most importantly, through indexing. (Have a look at the article MySQL: Indexes). Further nanoseconds can be sliced off through correct formulation of the query. Somewhere deep [...]
Comment from Mattias Skovh?j
Time: March 18, 2007, 7:23 am
Uh, nice design on this site
(check my own blog, I just translated it into danish…..)
I am very happy for reading this article, because I have some websites, that is too huge.. One of them have 6000 users, and they are in 1 table. In that table I have one index – in the id column. But I always do queries to the username-column! I see the problem by this, and maybe I should think about only using the ID, which I do on all of my new websites.. The problem is, that this website has more than 250 PHP-files containing these SQL-queries (and a lot of them), so I started out by indexing the username-column..
The website is on its own on a server with following specs:
P4 2.8 GHz
2 GB ram
4 x 36 gb SCASI hd 15000 rpm
20/20 mbit internet connection.
And with 80-120 online at the same time, there usually is a CPU load of 100%, caused by the mysql-nt.exe! Hope this changes
Thanks in advance..