Signup/Sign In
Ask Question
Not satisfied by the Answer? Still looking for a better solution?

Is there any difference between GROUP BY and DISTINCT

I learned something simple about SQL the other day:

SELECT c FROM myTbl GROUP BY C

Has the same result as:

SELECT DISTINCT C FROM myTbl

What I am curious of, is there anything different in the way an SQL engine processes the command, or are they truly the same thing?

I personally prefer the distinct syntax, but I am sure it's more out of habit than anything else.
by

3 Answers

rahul07
GROUP BY lets you use aggregate functions, like AVG, MAX, MIN, SUM, and COUNT. On the other hand DISTINCT just removes duplicates.

For example, if you have a bunch of purchase records, and you want to know how much was spent by each department, you might do something like:

SELECT department, SUM(amount) FROM purchases GROUP BY department

This will give you one row per department, containing the department name and the sum of all of the amount values in all rows for that department.
sandhya6gczb
Use DISTINCT if you just want to remove duplicates. Use GROUPY BY if you want to apply aggregate operators (MAX, SUM, GROUP_CONCAT, ..., or a HAVING clause).
pankajshivnani123
expect there is the possibility for subtle differences in their execution. I checked the execution plans for two functionally equivalent queries along these lines in Oracle 10g:

core> select sta from zip group by sta;

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 58 | 174 | 44 (19)| 00:00:01 |
| 1 | HASH GROUP BY | | 58 | 174 | 44 (19)| 00:00:01 |
| 2 | TABLE ACCESS FULL| ZIP | 42303 | 123K| 38 (6)| 00:00:01 |
---------------------------------------------------------------------------

core> select distinct sta from zip;

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 58 | 174 | 44 (19)| 00:00:01 |
| 1 | HASH UNIQUE | | 58 | 174 | 44 (19)| 00:00:01 |
| 2 | TABLE ACCESS FULL| ZIP | 42303 | 123K| 38 (6)| 00:00:01 |
---------------------------------------------------------------------------

The middle operation is slightly different: "HASH GROUP BY" vs. "HASH UNIQUE", but the estimated costs etc. are identical. I then executed these with tracing on and the actual operation counts were the same for both (except that the second one didn't have to do any physical reads due to caching).

But I think that because the operation names are different, the execution would follow somewhat different code paths and that opens the possibility of more significant differences.

I think you should prefer the DISTINCT syntax for this purpose. It's not just habit, it more clearly indicates the purpose of the query.

Login / Signup to Answer the Question.