Independent Digital

Creating and managing dynamic websites using Php, MySql, HTML and Wordpress with a bit of PEAR on the side

Entries Comments



Email This Post    Print This Post   

MySQL: query optimization

1 December, 2006 (10:43) | MySQL | By: clive

When you query any database, you expect the result to be displayed immediately. Any delay is frustrating and in some cases can cause you to toss your laptop out of the window. You?d probably follow, either voluntarily or assisted, if the information delivered by the query was erroneous. It is therefore safe to say that accuracy and speed are high priority items when it comes to using databases.

The following serves to briefly highlight some of the options available when trying to squeeze that extra bit of speed out of your queries. It is not exhaustive and I recommend that you consult the manual for more detailed usage of the options mentioned here.

MySQL achieves speed and accuracy firstly, and most importantly, through indexing. (Have a look at the article MySQL: Indexes). Further nanoseconds can be sliced off through correct formulation of the query. Somewhere deep inside the MySQL black box lurks the query optimizer, which tirelessly analyses each and every query to determine the fastest route to the result that you are seeking. The optimizer examines the query then looks for an index. There are a number of things that you can do to speed things up:

? Try to compare columns of the same data type. For example, CHAR (10) is the same as CHAR (10) but not the same as CHAR (32).
? Compare columns that don?t have any calculations that first have to be resolved before the comparison can be made. For example:
?WHERE column_name < 12 as opposed to
?WHERE column_name * 22 < 12.

The optimizer can make use of the index in the first option but not in the second. Here each row of the column has to first be multiplied by 22 before the comparison can be made.

? Don?t use wildcards at the beginning of a string comparison. The optimizer uses the beginning of the string to search the index.
? Help the optimizer choose the order in which it scans the tables. (It decides on the order but like any machine, it can be wrong). You can also help it choose a particular index (see below).
? Test alternative forms of the query. Try running these a few times and see which ones are faster. (Remember that data is cached so the second time a query is run, it may read directly from the cache. To remedy this, run the alternatives a few times).
? Do not abuse the automatic type conversion ability of MySQL. If possible, compare numbers to numbers and strings to strings.
? Use EXPLAIN to see the effects of writing the query differently.

Overriding optimization

You can override the optimizer?s table-join order and force it to use a particular order by using STRAIGHT_JOIN. Similarly you can force the optimizer to use a particular index by using FORCE INDEX and USE INDEX.

When choosing data types for your columns:

? Try and use the shortest possible data types (have a look at the articles: MySQL: choosing data types and MySQL data types: numeric) for the columns. This will result in smaller tables and shorter indexes that will process faster. Also try and choose an optimal row storage format for your storage engine. The fixed-length format is preferred for the MyISAM (have a look at this article MySQL storage engines: MyISAM) engine as it is faster. The downfall is that fixed-length values take up more storage space. Generally speaking, the variable formats such as VARCHAR, require periodic de-fragmentation of the files because of the manner in which the data blocks are stored (non-contiguously), especially the more updates and deletes are run.
? Sometimes you may not have a choice and will be forced to use variable length data types. For example, any strings larger than 255 characters will have to be stored as a variable length data type. The storage engine also determines which data type to use. The MEMORY storage engine only stores tables as fixed data types.

Try testing the performance of queries using either of the formats to see which is faster.

? Define columns to be NOT NULL as this uses less space and is faster.
? Consider converting string columns with low cardinality to ENUM columns, which are processed faster.
? Use PROCEDURE ANALYSE ( ) to display the characteristics of the data in the selection clause. It may reveal a more efficient data type for this column.
? Use OPTIMIZE TABLE to improve the performance of tables subject to fragmentation. Fragmentation slows down performance and is particularly common in columns containing variable length data types such as BLOB and TEXT. OPTIMIZE TABLE only works for MyISAM tables. With other engines you can dump the table, drop it then recreate it using the dumped file.
? Use client applications to pack data values not easily represented in a standard table structure or data that change over time, into a BLOB or TEXT data type. The benefit of this is that most of the processing is handled by the client application and MySQL only has to deal with a single retrieval or insert.
? Use synthetic indexes for exact matches. You can create your own or use MD( ), SHA( ) or CRC32( ). This is particularly useful for BLOB and TEXT types where it is much faster to search the index of these data types than to search the columns themselves.
? Put large TEXT and BLOB columns into a separate table. This allows the other tables to contain fixed format columns, which reduces fragmentation and increases performance. You are then able to do SELECT * queries without having to search the large BLOB and TEXT tables. It also reduces network traffic (low data throughput).

Loading data efficiently

Try the following to speed up inserting and updating records:

By using bulk or batch loading, you reduce the number of times the key cache is flushed (once per batch as opposed to once per row). The result is faster data loading.
? Use LOAD DATA rather than INSERT as it loads the data in bulk, resulting in the server only having to parse and interpret one statement. Also the key cache is only flushed once. Using LOAD DATA without LOCAL is also faster as the server reads the file directly from the server disk rather than with LOCAL where the client sends the file across the network which is a lot slower.
? When using INSERT, try and insert multiple rows at once by putting the values in parenthesis, for example:

INSERT?.values (?..), (?..), ?..;

This results in fewer statements and less key cache flushing. For transactional storage engines (InnoDB and BDB) use INSERT within a single transaction:

START TRANSACTION;
INSERT INTO table_name?..;
INSERT INTO table_name?..;
INSERT INTO table_name?..;
COMMIT;

For non-transactional engines (MyISAM and MEMORY) write lock the table, issue your INSERT statements then unlock the tables. Both these options only flush the key cache once.
? To reduce the number of key cache flushes when using MyISAM tables, you can use the DELAYED_KEY_WRITE option where data rows are immediately written to the file but the cache is only flushed occasionally.
? Not having an index also speeds things up as an index has to be modified after each record is loaded.
? In MyISAM tables you can create the tables without indexes, load the data then create the indexes. It is faster to create all the indexes at once. In the same vain, you can drop or deactivate existing indexes, load the data then reactivate or rebuild them.
? Shorter SQL statements improve performance as there is less parsing for the server to do and less data throughput across the network.
? You can also use compressed protocols to reduce data throughput but bear in mind than any compression protocol is processor hungry.

Scheduling and locking issues

You can control the way MySQL handles multiple queries. The default order of priority is:

? Writes have a higher priority than reads.
? Writes to tables occur one at a time in the order they arrive.
? Multiple simultaneous reads from a table are possible.
The MyISAM and MEMORY engines use table locks to control multiple queries. This is done automatically by the server and can be overridden by issuing explicit LOCK TABLES and UNLOCK TABLES statements. The write lock allows exclusive access to one writer. No other reads or writes are permitted. The read lock is not exclusive as it allows other readers but not writers.

The LOW_PRIORITY keyword allows you to change the scheduling for DELETE, INSERT, LOAD DATA, REPLACE and UPDATE statements in MyISAM and MEMORY tables. It lowers the priority of these statements. You can set the default at server start-up for all statements to be LOW_PRIORITY. This can be overridden for individual statements by using the INSERT HIGH_PRORITY option. HIGH_PRIORITY allows you to change the scheduling for SELECT and INSERT statements to allow them to take priority.

The DELAYED keyword can be used with INSERT and REPLACE statements when working with MyISAM and MEMORY tables. It puts all INSERT and REPLACE statements in a queue, which is then executed once the server is free. The only problem is that the queue resides in memory and will be lost in the event of a crash.

Concurrent inserts

The MyISAM and MEMORY engines use table-level-locks, the BDB engine uses page-level-locks and InnoDB uses row-level-locks to manage concurrent inserts. Locking at the row level allows more concurrent users as long as they do not access the same row. MyIsam tables are very fast for retrieval but slow down considerably when there are mixed retrievals and inserts because of the table locking strategy. The whole table is locked for each insert or update effectively denying access to all other clients. The BDB and InnoDB engines allow greater concurrency and speed when handling these mixed statements because of their lower level locking strategy. Only the page or row is locked which still allows other clients access to the rest of the table. The downfall of these two locking strategies is that deadlock (one client has a lock that another needs before they can continue) can occur. If it does, the server has to abort one of the transactions to allow the other to continue. This problem does not happen with the table-lock strategy as it allows only one client at a time.

Hopefully this article has touched on a few of the options available to tweak your MySQL queries so as to get that extra bit of speed out of the beast.

Bookmark and Share

Sorry, there are no related posts but check these out