php-development1

Why not to loop on SQL queries?

Many developers tend to write unstable code. One of the wrong practices in development is to miss-use database queries by looping on them in the server side of client-server applications. In this article we will explain why it is wrong, and how to work around it.

1. The problem
Suppose you need to count, like the following Stackoverflow question, if three database rows have values given in a sequence of values $SearchNames.
So we look at $SearchNames = [“Paul”, “John”, “Mary”,….] and we want to check if a database table User, which has 3 rows for the user name – User_Firstname, User_Givenname, User_Nickname – has any of those names that we search for.

Looping through SQL query would then look like that:

$con=mysql_connect(“localhost”,”my_user”,”my_password”,”my_db”);
// Check connection
if (!$con)
{
echo “Failed to connect to MySQL: ” . mysqli_connect_error();
}
$numOfNames = count($S);
$nameExists=”0″;
for ($i; $i < $numOfNames; $i++) {
$query = “SELECT User_ID FROM User WHERE User_Firstname ='”.$SearchNames[$i].”‘ OR User_Givenname = ‘”.$SearchNames[$i].”‘ OR User_Nickname = ‘”.$SearchNames[$i].”‘”;
mysql_query($con,$query);
while ($row = mysql_fetch_assoc($query_results))
{
$all_rows[$j]=$row;
$j++;
}
$nameExists=$nameExists+count($row);
}

mysql_close();

The above code will create performance issues, and also might not work in specific circumstances. We will explain why:

2. Why looping through SQL queries is wrong

There are several reasons why the above code – which loops through many SQL queries, is wrong:

a. Performance issues
Client-Server applications have a session time-out. It is not a standalone computer program that runs on a machine and then outputs the result, but rather an application that initiated by a browser request, and waiting for a result. While the user is waiting, the browser is on stall, you know that from your own user experience when the browser tells you “waiting for www.something.com to reply”. That time is the time that your application is running, the webserver is calling the PHP program which is calling the SQL database which is then sending the result back to PHP and vice versa. Now imagine that you have a database of 100,000 names… the user may wait minutes, and sometimes may even experience a time out: this is because usually web systems serve more than one client – so now imagine 10,000 users hitting the same website, which has the above code: the site would crash as the load on the server is of 100,000 times 10,000 = 1 billion times more than if it was developed properly!

b. Memory issues
The bigger the database, such a code will consume more and more memory. Reason is, that the code stores all the results in a memory cell ($row) and then counts it. But actually, MySQL can do all those things in an in-built query.

c. Database lock
Some database types (note we haven’t mentioned which type of database we use) may lock when querying it. It means that when one user is using it, others won’t be able to.

d. Invalidated data
When you query through a database, you query through lots of records. Once you loop those queries, in some circumstances data which you looped on in the beginning, may have been changed in the mean time by other users. It does not apply to the above example, however consider a program that loops through 1,000,000 names by one user, but when reaches user #543,889 – the first user has updated his/her name in the database. It means that the result won’t be accurate.

3. Resolution
There are lots of ways to construct a SQL query, so it is very hard to tell exactly how to optimize each system. But as a thumb rule, you should loop in order to create a query string, and then query the database once (ideally), instead of looping through many queries. It should look something like this:

Instead of:

Loop ——————-|
|
|
build query         |
query database |
End of loop ———-|

Do this:

Loop ——————-|
|
|
build query         |
add to query      |
End of loop ———-|
Query database

Our code would then look like that:

$con=mysql_connect(“localhost”,”my_user”,”my_password”,”my_db”);
// Check connection
if (!$con)
{
echo “Failed to connect to MySQL: ” . mysqli_connect_error();
}
$nameExists=”0″;
$query = “SELECT User_ID FROM User WHERE User_Firstname IN (‘”.$SearchNames.”‘) OR User_Givenname IN (‘”.$SearchNames.”‘) OR User_Nickname IN (‘”.$SearchNames.”‘)”;
mysql_query($con,$query);
while ($row = mysql_fetch_assoc($query_results))
{
$all_rows[$j]=$row;
$j++;
}
mysql_close();
$nameExists=count($row);

In this code, you can see, we DON’T loop on many queries, but perform just one query. We then loop on the results, to get them all, and count them.

You could improve the performance even more, but just in the case of counting the results, if you use the MySQL build in COUNT query:

$con=mysql_connect(“localhost”,”my_user”,”my_password”,”my_db”);
// Check connection
if (!$con)
{
echo “Failed to connect to MySQL: ” . mysqli_connect_error();
}
$nameExists=”0″;
$query = “SELECT COUNT(User_ID) FROM User WHERE User_Firstname IN (‘”.$SearchNames.”‘) OR User_Givenname IN (‘”.$SearchNames.”‘) OR User_Nickname IN (‘”.$SearchNames.”‘)”;
mysql_query($con,$query);
$row = mysql_fetch_assoc($query_results);
mysql_close();
$nameExists=$row[‘0’][‘COUNT(User_ID)’];

But in that case you won’t be able to manipulate the results, if required.

Good luck 🙂

Comments

comments

Leave a Reply

Your email address will not be published. Required fields are marked *