Database Performance Tuning MUST Read

Database Performance Tuning

Query Run Time

A database is also a software and is subject to the same limitations as all software, it can only process as much information as its hardware is capable of handling. The way to make a query run faster is to reduce the number of calculations that the software (and therefore hardware) must perform.  To do this, you will need some understanding of how SQL actually makes calculations.

    Table size: If your query hits one or more tables with millions of rows or more, it could affect performance.
    Joins: If your query joins two tables in a way that substantially increases the row count of the result set, your query is likely to be slow.
    Aggregations: Combining multiple rows to produce a result requires more computation than simply retrieving those rows.
    Other users running queries: This is something you probably can’t control. The more queries running concurrently on a database, the more the database must process at a given time and the slower everything will run.
    Database software and optimization: This is something you probably can’t control too, but if you know the system you’re using, you can work within its bounds to make your queries more efficient.

For now, let’s ignore the things you can’t control and work on the things you can.


Only Retrieve the Data You Really Need

A common way of retrieving the desired columns is to use the * symbol :

SELECT * FROM users WHERE age > 20;

However, specifying the columns might save a lot of computation time:

SELECT id, last_name, sex, age FROM users WHERE age > 20;

Similarly, if you only need a limited number of rows you should use the LIMIT clause (or TOP):

SELECT name, price FROM products LIMIT 10;

You should always think whether you need all the rows returned by an SQL statement. If you don’t, there is always some room for improvement.
Use EXISTS() instead of COUNT()

If your only purpose is to know if a record exists or not and you don’t really care about how many matched records are there, the better way to do it is using EXISTS() function.

SELECT COUNT(uid) FROM users WHERE ulname like ‘wills%’;

Instead, will use EXISTS() function like below:

SELECT EXISTS (SELECT uid FROM users WHERE ulname like ‘wills%’ LIMIT 1);


Avoid Functions on the Left Hand-Side of the Operator

SELECT encounterId FROM enc LIMIT 10 WHERE DATEDIFF(MONTH, date, ‘2017-04-15’) < 0;

In this case, even if there is an index on the date column in the table users, the query will still need to perform a full table scan. This is because we use the DATEDIFF function on the column date. The output of the function is evaluated at run time, so the server has to visit all the rows in the table to retrieve the necessary data.

To enhance performance, the following change can be made:

SELECT encounterId FROM enc LIMIT 10 WHERE date > ‘2017-04-15’;

This time, we aren’t using any functions in the WHERE clause, so the system can utilize an index to seek the data more efficiently.


Avoid Wildcard Characters at the Beginning of a LIKE Pattern

Common way of using LIKE keyword:

SELECT value FROM itemkeys WHERE name LIKE ‘%inventory%‘;  #Full wildcard

The use of the % wildcard at the beginning of the LIKE pattern will prevent the database from using a suitable index if such exists. Since the system doesn’t know what the beginning of the name column is, it will have to perform a full table scan anyway. In many cases, this may slow the query execution.

So you can rewrite this in following way:

SELECT value FROM itemkeys WHERE name LIKE ‘inventory%‘; #Postfix wildcard

Please avoid this too:

SELECT value FROM itemkeys WHERE name LIKE ‘%inventory’;  #Prefix wildcard

You should always consider whether a wildcard character at the beginning is really essential. Doing a full wildcard in a few million records table is equivalence to killing the database.


Symbol Operator

Symbol operator such as >,<,=,!=, etc. are very helpful in our query. We can optimize some of our query with symbol operator provided the column is indexed.

  SELECT uid, ulname FROM users WHERE uid < 122

Now, the above query is not optimized due to the fact that the DBMS will have to look for the value 122 THEN scan forward to value 122 and below. On the other hand, a optimized value will be

  SELECT uid, ulname FROM users WHERE uid <= 121

This way the DBMS might jump straight away to value 121 instead. It’s pretty much the same way how we find a value 121 (we scan through and target ONLY 121) compare to a value smaller than 122 (we have to determine whether the value is smaller than 122; additional operation).


“NOT” Operator

Try to avoid NOT operator in SQL. It is much faster to search for an exact match (positive operator) such as using the

  LIKE, IN, EXIST or = symbol operator

instead of a negative operator such as

  NOT LIKE, NOT IN, NOT EXIST or != symbol.

Using a negative operator will cause the search to find every single row to identify that they are ALL not belong or exist within the table.

On the other hand, using a positive operator just stop immediately once the result has been found.


Avoid queries in loops

It can be tempting to use SQL queries inside a loop. But doing so is like hammering your database with queries.

for (int I = 0; i < 10; ++i) {

    sql = “UPDATE categories SET display_order = ? WHERE id = ?”;

     – – – – – –  – –

     – – – – – – –  – –

}

Here is what you should do instead:

UPDATE categories

   SET display_order = CASE id

       WHEN 1 THEN 3

       WHEN 2 THEN 4

       WHEN 3 THEN 5

    END

WHERE id IN (1, 2, 3)


Use join instead of subqueries

As a programmer, subqueries are something that you can be tempted to use. Subqueries, as show below, can be very useful:

SELECT a.id,

    (SELECT MAX(created) FROM posts WHERE author_id = a.id) AS latest_post

FROM authors a

Although subqueries are useful, they often can be replaced by a join, which is definitely faster to execute.

SELECT a.id, MAX(p.created) AS latest_post

FROM authors a INNER JOIN posts p ON (a.id = p.author_id)

GROUP BY a.id


Use UNION instead of “OR”

The following example use the OR statement to get the result:

  SELECT * FROM a, b WHERE a.p = b.q OR a.x = b.y;

The UNION statement allows you to combine the result sets of 2 or more select queries. The following example will return the same result that the above query gets, but it will be faster:

  SELECT * FROM a, b WHERE a.p = b.q

  UNION

  SELECT * FROM a, b WHERE a.x = b.y


INNER join

Use inner join, instead of outer join if possible. The outer join should only be used if it is necessary.

    The inner join processes exact matches, and results in a smaller result set.
    However, in outer join, a row will result from every non-matching row from the inner table, and this generally implies more I/O.


DISTINCT and UNION

DISTINCT and UNION should be used only if it is necessary. DISTINCT and UNION operators cause sorting, which slows down the SQL execution.

Use UNION ALL instead of UNION, if possible, as it is much more efficient.


ORDER BY clause

The ORDER BY clause is mandatory in SQL if the sorted result set is expected. But be aware of the performance impact of adding the ORDER BY clause, as the database needs to sort the result set, resulting in one of the most expensive operations in SQL execution.


Avoid foreign key constraints

Foreign keys constraints ensure data integrity at the cost of performance. Therefore, if performance is your primary goal you can push the data integrity rules to your application layer.

A good example of a database design that avoids foreign key constraints is the System tables. Although there are relationships among these tables, there is no foreign key relationship.


Indexing

The bottleneck in most database systems (this may change with large memory and SSDs) is disk I/O. This usually means decreasing disk I/O is necessary to make queries perform better.

Ex-1:

       SELECT first_name, last_name FROM contacts WHERE city = “Los Angeles”;

       ALTER TABLE contacts ADD KEY (city);

Ex-2:

  SELECT first_name, last_name FROM contacts WHERE status = “active” AND delivery_method = “mail” AND city = “Los Angeles”;

  ALTER TABLE contacts ADD KEY (status, delivery_method, city);

Note:- Sequence of keys are matter always.



MySQL & SQL Server offer to help with indexing:

MySQL comes with a few commands that help when analyzing potential indexing problems. One of the most important is “EXPLAIN”.

MySQL:

MSSQL:

Some guidelines for index:

    Create Clustered Index for primary keys (most of the RDBMS do this when table is created).
    Index ALL foreign keys columns.
    Create more indexes ONLY if:
    Queries are slow.
    You know the data volume are going to increase significantly.
    Run statistics when populating a lot of data on tables.

If a query is slow, look for the execution plan and:

    If the query for a table only uses few columns put all that columns into an index, then you can help the RDBMS to use only the index.
    Don’t waste resources indexing tiny tables (hundreds of records).
    Index multiple columns in order from high cardinality to less. It means, first the columns with more distinct values followed by columns with fewer distinct values.
    If a query needs to access more than 10% of the data, normally a full scan is better than an index.

Note: More Indexes on table, cause performance issue on DML operations.


Slowness with “IN” query clauseSlowness with “IN” query clause






Some Other Tips

    Use well defined and consistent names for tables and columns (e.g. School, StudentCourse, CourseID …).
    Use singular for table names (i.e. use StudentCourse instead of StudentCourses). Table represents a collection of entities, there is no need for plural names.
    Don’t use unnecessary prefixes or suffixes for table names (i.e. use School instead of TblSchool, SchoolTable etc.).
    Keep passwords as encrypted for security. Decrypt them in application when required.
    Use integer id fields for all tables. If id is not required for the time being, it may be required in the future (for association tables, indexing …).
    Choose columns with the integer data type (or its variants) for indexing. varchar column indexing will cause performance problems.
    Use bit fields for boolean values. Using integer or varchar is unnecessarily storage consuming. Also start those column names with “Is”.
    Image and blob data columns must not be defined in frequently queried tables because of performance issues. These data must be placed in separate tables and their pointer can be used in queried tables.


Useful Links

–https://dev.mysql.com/doc/refman/5.7/en/optimization.html

–https://www.mssqltips.com

–https://kesavan.info/files/MySQL-performance-tuning-step-by-step.pdf

–http://www.monitis.com/blog/101-tips-to-mysql-tuning-and-optimization/

Comments

Popular posts from this blog

How to resolve session timeout issue in JAVA

Blog gives you enthusiasm,zeal

How to encode HTML in JAVA (Remove HTML from your text)