Independent Digital

Creating and managing dynamic websites using Php, MySql, HTML and Wordpress with a bit of PEAR on the side

Entries Comments



Email This Post    Print This Post   

Category: MySQL

MySQL data types: numeric

7 November, 2006 (11:01) | MySQL | By: clive

MySQL supports 2 types of numbers:

  1. Integers with no fractional parts
  2. Floating-point or fixed-point numbers which may have fractional parts

Integers can be specified in either hexadecimal or decimal format. Hexadecimal values are treated as strings by default and can’t be used in scientific notation. Scientific notation is indicated by an integer or floating-point number followed by an ‘e’ or ‘E’ and then either an ‘-’ or a ‘+’, for example:

1.34E+12 or 43.27e-1

Any number can be expressed as a positive or negative value by having a ‘-’ or ‘+’ sign in front of it. Floating-point numbers can be put into integer columns but will be rounded to the nearest integer. Similarly, integers can be put into in floating or fixed-point columns and will be given a fractional part of zero.

Choosing numeric data types

Decide on the range of values needed to represent your data then select the smallest type that will do the job. This will save storage space and improve processing efficiency.

If strict mode is enabled, an error will occur when entering values out of the range. Out of range values will be truncated if strict mode is not enabled. Truncation is determined by the range of the data type and not its display width. Floating-point and fixed-point columns are rounded to the number of decimals specified.

Numeric data types

Type name

Meaning

TINYINT

Very small integer

SMALLINT

Small integer

MEDIUMINT

Medium-sized integer

INT

Standard integer

BIGINT

Large integer

FLOAT

Single-precision floating-point number

DOUBLE

Double-precision floating-point number

DECIMAL

Fixed-point number

BIT

Bit field

Integers

Include:

  • TINYINT
  • SMALLINT
  • MEDIUMINT
  • INT
  • BIGINT

These vary in the values they represent as well as the storage space needed. Integers can be UNSIGNED (only positive) which shifts the range upwards and begins at 0. When defining the integer column you can optionally include a display size (from 1 to 255). This is the number of characters needed to display the integer and does not affect the storage in any way. If no size is given, a default (size of longest integer for that type) size is assigned.

Floating-point and fixed-point

Floating-point types include:

  • Float
  • Double – also know as DOUBLE PRECISION

Fixed-point includes:

  • Decimal – also known as NUMERIC and FIXED

There is a positive maximum and minimum range as well as a negative maximum and minimum range for floating-point types. Both floating-point and fixed-point types can be defined as UNSIGNED (only positive). The range is not shifted upward, only positive values are now allowed. They both also allow you to indicate the number of significant digits (M – a value from 1 to 255) and decimal places (D - a value from 0 to 30) to display. These are optional:

  • FLOAT and DOUBLE – Your hardware then determines the precision of the display
  • DECIMAL – If D omitted it defaults to 0, if M omitted it defaults to 10

The DECIMAL values are stored as strings and have a fixed number of decimals which means that they are not rounded off (good for currency values). This applies only to storage and retrieval as calculations can use floating-point operations.

BIT data type

Bit data types hold bit-field values. You can include an optional column width when defining a BIT column.

Numeric data type attributes

  • UNSIGNED – Disallows negative values. Can be used with all types except BIT. In INTEGERS it shifts the range up, starting at zero (does not increase the range). This effectively doubles your range. With FLOATING-POINT and FIXED-POINT the range is not shifted up. The upper limit remains the same but the negative range is eliminated.
  • SIGNED – Can apply to all numeric types but has no effect because it is the default anyway.
  • ZEROFILL – For all numeric types except BIT. It pads the displayed value with zeros to the make up the display width. Specifying ZEROFILL automatically makes the column UNSIGNED.
  • AUTO_INCREMENT – Only for integers and only one per column per table. The column must be defined as NOT NULL and must be indexed and also be UNSIGNED. Used to generate unique identifiers beginning at 1 (unless you insert NULL in the column) and increases by 1 for each new row. The storage engine determines whether the deleted rows AUTO_INCREMENTED values will be used or not.
  • NULL/NOT NULL – The default is NULL.

Useful information

Type specification

Range

TINYINT[(M)]

Signed:-128 to 127; Unsigned: 0 – 255

SMALLINT[(M)]

Signed: -32768 to 32767; Unsigned: 0 to 65535

MEDIUMINT[(M)]

Signed: -8388608 to 8388607; Unsigned: 0 to 16777215

INT[(M)]

Signed: -2147683648 to 2147483647: Unsigned: 0 to 4294967295

BIGINT[(M)]

Signed: -9223372036854775808 to 9223372036854775807; Unsigned 0 to 18446744073709551615

FLOAT[(M,D)]

Min non-zero: ? 1.175494351E-38; Max non-zero: ? 3.402823466E+38

DOUBLE[(M,D)]

Min non-zero: ? 2.2250738585072014E-308; Max non-zero: ? 1.7976931348623157E+308

DECIMAL([M[,D]])

Varies -depends on values of M and D

BIT[(M)]

Type specification

Storage required

TINYINT[(M)]

1 Byte

SMALLINT[(M)]

2 Bytes

MEDIUMINT[(M)]

3 Bytes

INT[(M)]

4 Bytes

BIGINT[(M)]

8 Bytes

FLOAT[(M,D)]

4 Bytes

DOUBLE[(M,D)]

8 Bytes

DECIMAL([M[,D]])

M + 2 Bytes

BIT[(M)]

Varies – depending on M

Other data types

  • String data types
  • Date and time data types
  • Spatial data types

These will be covered in later articles.

MySQL storage engines: MyISAM

2 November, 2006 (14:49) | MySQL | By: clive

MyISAM storage engine

MyISAM is the default storage engine since MySQL version 3.23.0. It allows you to store your data in a machine-independent format, enabling you to use the same files on any machine, be it a mac, linux, windows, unix, etc. The table contents are stored in 3 separate files:

  1. table_name.frm which contains the table definition
  2. table_name.MYD which contains the data
  3. table_name.MYI which contains the index

The storage engine can be set on table creation (usually unnecessary as it is the default unless this has been changed on server startup) as follows:

CREATE TABLE table_name (I INT) ENGINE=MYISAM;

(Use TYPE instead of ENGINE for versions prior to 5)

Characteristics of the MyISAM storage engine

  • Maximum indexes per table is 64 *
  • Maximum columns per index is 16
  • Maximum key length is 1000 bytes *
  • Can index BLOB and TEXT columns
  • NULL values are allowed in indexed columns
  • One auto-increment column per table $
  • Dynamic sized rows are less fragmented ?
  • Allows concurrent inserts
  • Can increase speed by placing index and data files in different directories
  • Character columns (CHAR, VARCHAR, TEXT) can have different character sets
  • UNIQUE can be on for any number of columns
  • VARCHAR tables can have fixed or dynamic length records
  • VARCHAR and CHAR columns can be up to 64KB in length

* Can change by recompiling

? As it uses adjacent deleted blocks of storage space

$ The values are not reused unless they are in the last column of a multi-column index. Values can be reset with ALTER TABLE or myisamchk

Table storage formats

There are 3 storage formats for the MyISAM storage engine:

  1. Fixed length (static)
  2. Dynamic length
  3. Compressed

Both fixed and dynamic storage is automatically chosen, depending on the contents of the columns. If the column contains variable length data then the dynamic storage option is automatically selected.

Static (fixed-length) table characteristics

This is the default and is used if none of the columns are of variable length (VARCHAR, CHAR, BLOB). The rows are of fixed length. It is the simplest and most secure format (least prone to corruption and generally quite accurate in rebuilding a crashed table with minimum data loss) and also the fastest option. Static tables are:

  • Fast
  • Easy to cache
  • Easy to recover from a crash
  • Space hungry – they use more disk space than the dynamic format

Dynamic table characteristics

This format is automatically chosen if the columns contain variable length data (VARCHAR, CHAR, BLOB) or if the table is created with the:

ROW_FORMAT=DYNAMIC

option. One can also use OPTIMIZE TABLE or myisamchk to defragment the table. Dynamic storage:

  • Allows variable length columns
  • Uses less disk space
  • Is difficult to rebuild after a crash

Compressed format characteristics

It is a read-only format generated by the myisampack tool and can be unpacked with myisamchk. The compressed format:

  • Uses little disk space
  • Compresses records separately (low access overhead)
  • Can handle fixed and dynamic length records

Problems with MyISAM tables

Although the MyISAM storage engine is reliable, corruption can occur for the following reasons:

  • The Mysqld process was killed during a write
  • The computer was turned off
  • Hardware failure
  • Using an external program while the server was accessing the table
  • Software bugs

These are some of the symptoms of a corrupted table:

  • An error message is displayed during a query ? ?Incorrect key file for table: ??
  • Can?t find rows in table
  • Returns incomplete data

You can check the health of a table with CHECK TABLE and repair it with REPAIR TABLE. If mysqld is not running, you can do this with the utility program, myisamchk.

If you get this warning when running either of the above:

Clients are using or haven?t closed the table properly

it may not mean that the table is corrupted but it does warrant that you examine the table. The table index (table_name.MYI) has a counter flag which indicates whether the table was closed properly. This flag could go out of sync for many reasons, some of which are:

  • The tables were copied without first issuing FLUSH TABLE or LOCK TABLE
  • MySQL crashed between an update and close
  • The table was modified while being used by mysqld
  • Multiple mysqld servers were using the table and one issued a REPAIR TABLE or CHECK TABLE

Other storage engines

There are a number of other storage engines. Some are listed below and will be covered in future articles:

  • MERGE
  • MEMORY
  • BDB
  • InnoDB
  • EXAMPLE
  • ARCHIVE
  • CSV
  • FEDERATED
  • ISAM

If you are hosting your site on someone elses server then it would be a good idea to check with them first to see which storage engine they support before you create your tables.

MySQL character sets and collation

31 October, 2006 (14:23) | MySQL | By: clive





The following document serves to introduce character sets and collation which play an inportant part in MySQL databases. It would be to your advantage to know about these before creating your database so as to ensure that the data stored and retrieved is according to your expectations.

A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set. For example, each letter in the alphabet is a symbol which is then encoded. The letter A could be represented by the number 0. A is the symbol and 0 is its encoding. Together the symbols and their encodings are known as the character set.

We can compare the values of symbols in a string by collating their encodings. This is known as collation which is a set of rules. The simplest of all collations is the binary collation. In the above example where one encoding is collated (compared) to another using one collation rule namely, comparing the values of the symbols. This, the simplest of all possible collations is a binary collation. A case-insensitive collation is a little more complex in that it uses two collation rules, by first treating the upper and lower case symbols as equal and then by comparing the values of the symbols.

In reality most character sets have many characters, special symbols and punctuation marks, requiring the collations to have many rules. These include case insensitivity and accent insensitivity (an ?accent? is a mark attached to a character as in the German ?) and multiple-character mappings (such as the rule that ? = OE in one of the two German collations)

Western European character sets cover most West European languages and have latin1 as the default character set with latin1_swedish_ci as the default collation. This is the most common setting for MySQL users.

The single character set model (pre MySQL 4.1)

Here character set values are interpreted with respect to the server?s character set. This is the default character set selected when the server was built and is usually latin1. This can be overridden at runtime by using the default-character-set option.

The problem with this is that it limits the database to one character set at a time and can lead to index-related problems. The index values for the columns are stored in the sorted order according to the collating sequence of the character set in force at the time that the indexes were created. If the server?s character set is changed after having created the tables, the indexed-based queries might not work correctly as the character sets have different collating sequences. This is especially apparent with the accents (such as ?) which are ordered differently in the various character sets.

The solution is to rebuild the indexes for each existing table that has a character-based index, using the collating order of the new character set. There are 2 ways that you can do this:

  • First dump the table using mysqldump, then drop the table and finally reload it. The indexes will be rebuilt as the file is loaded.
  • Drop the indexes then add them again. You will need to know the exact index definitions so that you can re-create them.

Although there are methods to reorder the index, it is still a bother to have to do so. Fortunately this is all taken care of as of MySQL 4.1

The multi character-set model (MySQL 4.1 and above)

  • The MySQL server supports the use of multiple character sets simultaneously (more than 70 collations for more than 30 character sets). However, you cannot mix character sets within a string or use different character sets for different rows in a given column.
  • Character sets can have more than one collation.
  • Unicode support is provided giving the multilingual support that you may need.
  • Functions are available for converting and determining the character set.
  • The COLLATE operator alters the collation of a string and the COLLATION ( ) function returns the collation of a string.
  • The SHOW CHARACTER SET and SHOW COLLATION statements list the available character sets and collations.
  • The server automatically reorders indexes when you change the collation of an indexed character column.

 

There are default settings for character sets and collations at four levels:

  • server
  • database
  • table
  • column

1. Server Character Set and Collation

The server character set and collation are determined:

  • According to the option settings in effect when the server starts. You can use –default-character-set for the character set and –default-collation for the collation. If you don?t specify a character set, then the default is set to latin1. Similarly by not specifying a collation, the default is set to latin1_swedish_ci because this is the default collation for latin1.
  • According to the values set at runtime. The current character set and collation variables can be changed at runtime.

You can override the default character set and collation at server start-up by using character-set-server and collation -server options. If you specify only the character set, its default collation becomes the server?s default collation. If you specify a collation, it must be compatible with the character set. (A collation is compatible with a character set if its name begins with the character set name. For example, latin1_swedish_ci is compatible with latin1.

2. Database Character Set and Collation

Each database has a database character set and a database collation. The CREATE DATABASE and ALTER DATABASE statements have optional clauses for specifying the database character set and collation. For example:

CREATE DATABASE db_name DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci;

To change the database?s default character set and collation , use:
ALTER DATABASE db_name CHARACTER SET charset COLLATE collation;

Bear in mind that if both character set and collation were specified then these will be the current settings. If only the character set is defined, then the default collation for that character set will be allocated. If neither is specified then the server character set and collation are chosen.

It is possible to create databases with different character sets and collations on the same MySQL server. The database character set and collation are used as default values for tables if these are not specified when the table is created. The character set and collation for the default database are available as the values of the character_set_database and collation _database system variables. The server sets these variables whenever the default database changes. If there is no default database then the variables have the same value as the corresponding server-level variables, character_set_server and collation _server.

3. Table Character Set and Collation

Every table has a table character set and table collation. The CREATE TABLE and ALTER TABLE statements have optional clauses for specifying the table character set and collation. For example:

CREATE TABLE abc ( … ) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;

Bear in mind that if both character set and collation were specified then these will be the current settings. If only the character set is defined, then the default collation for that character set will be allocated. If neither is specified then the database character set and collation are used.

4. Column Character Set and Collation

Every ?character? column (that is, a column of type CHAR, VARCHAR, or TEXT) has a column character set and a column collation. Column definition syntax has optional clauses for specifying the column character set and collation. For example:

CREATE TABLE abc (column1 VARCHAR (5) CHARACTER SET latin1 COLLATE latin1_german1_ci);

Bear in mind that if both character set and collation were specified then these will be the current settings. If only the character set is defined, then the default collation for that character set will be allocated. If neither is specified then the table character set and collation are chosen.

5. Examples of Character Set and Collation Assignment

The following examples show how MySQL determines default character set and collation values.

1. Table and Column Definition

CREATE TABLE abc (column1 CHAR (10) CHARACTER SET latin1 COLLATE latin1_german1_ci) DEFAULT CHARACTER SET latin2 COLLATE latin2_bin;

Here the column has a latin1 character set and a latin1_german1_ci collation while the table has a latin2 character set and a latin2_bin collation.

2. Table and Column Definition

CREATE TABLE abc (column1 CHAR (10) CHARACTER SET latin1) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;

Here we have a column with a latin1 character set and a default collation for this character set (which is always latin1_swedish_ci - note that it is not taken from the table level - latin1_danish_ci).

3. Table and Column Definition

CREATE TABLE abc (column1 CHAR (10)) DEFAULT CHARACTER SET latin1 COLLATE latin1_danish_ci;

Here we have a column with a default character set and a default collation (taken from the table level). In this case the character set for column abc is latin1 and its collation is latin1_danish_ci.

4. Database, Table, and Column Definition

CREATE DATABASE dbase1 DEFAULT CHARACTER SET latin2 COLLATE latin2_czech_ci;

USE dbase1;
CREATE TABLE abc
(
column1 CHAR (10)
);

Here we create a column (no character set and collation is specified at column or table level). The database’s settings become the table’s settings, and thereafter become the column’s setting. So, the character set for column1 is latin2 and its collation is latin2_czech_ci.

6. Connection Character Sets and Collations

Some character set and collation variables play a role in the connection between a client and the server. With queries and result sets being passed between client and server, it is important to know:

  • That the server takes the character_set_client variable to be the character set in which queries are sent by the client.
  • That the server converts queries sent by the client from character_set_client to character_set_connection, similarly with the collation (where collation _connection is important for the comparisons of literal strings (char, varchar and text)).
  • That the character_set_results variable determines the character set in which the server returns query results to the client

There are two statements that affect the connection character sets:

  1. SET NAMES ‘charset_name‘ ? tells the server that all incoming messages from the client are in ‘charset_name’ character set. This is also the character set that the server needs to return results back to the client in.
  2. SET CHARACTER SET ‘charset_name‘ – sets the connection character set and collation to be those of the default database.

On connection, the client indicates to the server the name of the character set that it wants to use. The server sets the character_set_client, character_set_results, and character_set_connection variables to that character set. Conversion may be lossy if there are characters that are not in both character sets. If you do not want the server to perform any conversion, set character_set_results to NULL:

mysql> SET character_set_results = NULL;

7. Character String Literal Character Set and Collation

Every character string literal (char, varchar and text) has a character set and a collation and may have an optional character set introducer and COLLATE clause:
[_charset_name]‘string’ [COLLATE collation _name]

For example:

SELECT _latin1 ‘string’ COLLATE latin1_danish_ci;

For the simple statement, SELECT ‘string’, the string has the character set and collation defined by the character_set_connection and collation _connection system variables. The _charset_name expression is called an introducer. It notifies the parser that, ?the following string uses character set abc? An introducer is simply an indicator and does not do any conversion.

Bear in mind that if both character set and collation were specified then these will be the current settings. If only the character set is defined, then the default collation for that character set will be allocated. If neither is specified then the character set and collation given by the character_set_connection and collation _connection system variables are used.

8. Using COLLATE in SQL Statements

You can override whatever the default collation is by using the COLLATE clause. For example:

SELECT abc FROM table ORDER BY abc COLLATE latin1_german2_ci;

SELECT abc COLLATE latin1_german2_ci AS abc1 FROM table1 ORDER BY abc1;

SELECT abc FROM table1 GROUP BY abc COLLATE latin1_german2_ci;

SELECT MAX(abc COLLATE latin1_german2_ci) FROM table1;

SELECT DISTINCT abc COLLATE latin1_german2_ci FROM table1;

SELECT * FROM table1 WHERE _latin1 ‘M?ller’ COLLATE latin1_german2_ci = abc;

SELECT * FROM table1 WHERE abc LIKE _latin1 ‘M?ller’ COLLATE latin1_german2_ci;

SELECT abc FROM table1 GROUP BY abc HAVING abc = _latin1 ‘M?ller’ COLLATE latin1_german2_ci;

9. The BINARY Operator

Each character set has a binary collation. The BINARY operator is a shorter form of the COLLATE clause. For example, the binary collation for the latin1 character set is latin1_bin, so if column1 is of character set latin1, the following have the same effect:

SELECT * FROM table1 ORDER BY BINARY column1;
SELECT * FROM table1 ORDER BY column1 COLLATE latin1_bin;

10. Coercibility rules

When collating, it is sometimes difficult to see which collation takes precedence. Hence the following coercibility rules:

An explicit COLLATE clause = coercibility of 0. (Not coercible at all.)
Concatenating two strings with different collations = coercibility of 1.
Column’s collation = coercibility of 2.
A ?system constant? (the string returned by functions) = coercibility of 3.
A literal’s (char, varchar and text) collation = coercibility of 4.
NULL or an expression that is derived from NULL = coercibility of 5.

Use the collation with the lowest coercibility value. If both sides have the same coercibility and the collations are not the same, then there is an error.

An Example of the Effect of Collation

Suppose that column1 in table1 has these latin1 column values:

Muffler
M?ller
MX Systems
MySQL

And suppose that the column values are retrieved using the following statement:

SELECT column1 FROM table1 ORDER BY column1 COLLATE collation _name;

The resulting order of the values for different collations is shown in this table:

latin1_swedish_ci

latin1_german1_ci

latin1_german2_ci

Muffler

Muffler

M?ller

MX Systems

M?ller

Muffler

M?ller

MX Systems

MX Systems

MySQL

MySQL

MySQL

 

Here the U with two dots over it (?) is collated differently depending on the collation used. The first column shows the result of using the Swedish/Finnish collating rule (U-umlaut sorts with Y). The second column shows the result of using the German DIN-1 rule (U-umlaut sorts with U). The third column shows the result of using the German DIN-2 rule (U-umlaut sorts with UE).

Unicode support

There are many different character sets because of the different encodings required by the various languages. Different languages also require different numbers of bytes to represent a character. All characters in the latin1 character set can be represented by a single byte. Other languages may need more than one byte per character.

Unicode provides a single character-encoding system wherein character sets from all languages can be represented in a consistent manner. Within MySQL there are two Unicode sets:

  1. ucs2 Which corresponds to the Unicode UCS-2 encoding. Each character is represented by two bytes.
  2. utf8 Which has a variable length format, representing characters with from 1 to 3 bytes.

 

You can store text in about 650 languages using these character sets.

Please refer to the MySQL documentation for more detailed information.

Have a look at www.idig.za.net for interesting articles and tutorials on PEAR, WordPress, Search Engine Optimization and many more.