How to tune “order by rand() limit 1” SQL statement ?

Discussion in 'MySQL' started by Richard To, Aug 6, 2020.

  1. Richard To

    Richard To Member

    Joined:
    Sep 18, 2017
    Messages:
    88
    Likes Received:
    36
    Trophy Points:
    18
    Gender:
    Male
    There may be some business requirements that needs to retrieve only some rows from a table (or join tables) randomly. This kind of SQL is normally hard to tune.

    For example the following SQL retrieve one row from two tables join of Employee and Department, where Employee’s department code is ‘AAA’, and both Emp_dept and Dpt_id columns are indexed.

    select *
    from employee ,department
    where emp_dept=dpt_id
    and emp_dept = 'AAA'
    order by rand()
    limit 1


    Here the following is the query plan of this SQL, it takes 3.126 seconds to finish. The query shows a nested loop from Department table to Employee table to extract all records with ‘AAA’ department code. An Order operation is executed followed from the join result.
    [​IMG]
    You can see the most expensive step is to extract all Employee data with department code “AAA”, if there is an unique key such Employee ID (EMP_ID) which can uniquely identify a row from the query, you can use the With common table expressions in MySQL version 8 to randomly select rows from the KEY column only with the same conditions given by the original query, for example the following blue colored SQL text in “With” clause, it randomly select 1 row of EMP_ID from the join query. It not only significantly reduces the size of the data retrieved from Employee table, but it also shrank the size of Order By operation. And then the main query will use the selected EMP_ID to extract specific row from original query, so the whole query will run faster with this new rewrite syntax.

    with DT1
    as (select EMP_ID
    from employee,
    department
    where emp_dept = dpt_id
    and emp_dept = 'AAA'
    order by rand() limit 1
    )
    select *
    from employee,
    department
    where emp_dept = dpt_id
    and emp_dept = 'AAA'
    and EMP_ID in (select EMP_ID
    from DT1)
    order by rand() limit 1


    /* Remark: “order by rand() limit 1” is used to make sure that only 1 row will be selected if the EMP_ID cannot uniquely identify only one row */

    Here is the query plan of the rewritten SQL with less cost and run much faster.
    [​IMG]
    This kind of rewrite can be archived by Tosska SQL Tuning Expert for MySQL automatically, it shows that the rewrite is almost 100 times faster than the original SQL.

    https://tosska.com/tosska-sql-tuning-expert-tse-for-mysql-2/

    [​IMG]
     
    shabbir likes this.

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice