Aggregate Queries and Views
Aggregate Queries and Views
Aggregate Functions
It provide consolidated result it means work on records and then will provide summary of it or the consolidated result of it.
It is basically used to find the facts and figures of table. Means max and other similar type of functions which are consolidate in nature.
- count - it take count of only not null values and will ignore the null values.
- max
- min
- avg
- sum
notes : Distinct is not an aggregate function as it don't combine the records it just help to remove the duplicate records.
All aggregate functions by default work on integers or anything which can be converted into countable information.
count:
Pseudo code:
table = []
count = 0
for row in table:
if row[b_id]:
count++
print(count)
Note: You cannot use aggregate functions in nested loops.
Group by:
Trying to make some groups in a data by using some criteria or conditions.
It use the column to form the group and take action on that group.
Difference between where and having clause:
- Where: On rows(row by row filtering) Can't filter group
- Where: Can't be applied once groups are formed.
- Having: Will be used to filter groups.
Pseudo code for below query might be:
select avg(psp), b.name
from student_scaler s
join batches_detail b
on s.b_id = b.b_id
group by s.b_id
having avg(psp) > 95;
table1 = []
table2 = []
avg= []
for row1 in table1:
result
for row2 in table2:
if row1[b_id] == row2[b_id]:
result = avg(result, avg(psp))
avg.append(result)
result = []
for row in avg:
if(row>95)
result.append(row)
print(result)
-- Aggregate functions
-- count function
use school;
-- Find the total number of students who has batches assigned.
-- select * from student_scaler;
select count(b_id) from student_scaler;
-- it will return count of all not null values under the column named b_id
-- Aggregates can't be nested with another aggregate:
-- Find total number of unique batches assigned to students.
select count(distinct b_id) from student_scaler;
-- wrong query
select distinct (count(b_id)) from student_scaler;
-- it will give the count of all b_id and that result is already distinct so this query is not right.
-- Find total number of entries in a table.
-- Best way to calculate it is to use count on primary key as that will be unique and will not contain null.
-- But if you don't know about the primary key then use the * operator as it will help you to select the records
-- of a table and will help to count the entries in a table.
-- Now their might be a doubt won't we are going to have a duplicate record but that is impossible
-- as table rule don't permit to have duplicate records you can have duplicate information
-- in columns but entire record to be duplicate is not permitted. So this doubt is not useless.
select count(*) from student_scaler;
-- Print a constant value number of rows we have?
select 1 from student_scaler;
-- It will also help to calculate the record count as 1 will be iterated to the total number of table records.
select count(1) from student_scaler;
-- it is more efficient then count(*) as you don't need to consider all the values of the table.
-- So no value is involved only the size of array is used for for loop to count the count of records logically.
-- So time is saved in fetching the record values which is taking place in case of count(*)
-- other aggregate functions
-- Find maximum PSP in student table.
-- select * from student_scaler;
select max(psp) from student_scaler;
select max(psp), min(psp), avg(psp), sum(psp) from student_scaler;
-- Can we print the non aggregate values along with aggregate in select clause of a table.
-- It is not possible as it don't make sense as we are already comparing the entire table record for aggregation.
-- Hence if we need individual information then aggregate will not work and vice-versa.
select max(psp), id from student_scaler;
-- it will give you error that you need to use group by clause to make it possible as both are not making sense.
-- Ask some question to yourself max of what with which comparison if you are asking for individual values as well.
-- As individuals can't have max or min or anyother thing can we compare our progress without checking other for
-- reference.
-- are below query result is for both column result. If yes then why and if not then why?
select avg(psp), sum(psp)/count(*), sum(psp)/count(psp) from student_scaler;
-- as average is a function which is equivalent to sum(psp)/count(psp) and count(*) is not same as count(psp).
-- As count(*) means unique records, count(psp) means non null values present in psp column. so a record can have
-- null in psp but might have information in other column and hence it will be considered.
-- Hence count(*) is not equal to count(col);
-- Some rules of aggregate function.
-- While using aggregate functions don't print any opther column non-aggrate in nature.
-- Don't use nested aggregation to find the result. like aggregate(aggregate) as
-- it will not work as already inner one will provide only single value
-- and aggregation need comparison so as you are not having any thing to compare
-- hence there will be no change in result.
-- example:
select sum(sum(psp)), sum(psp) from student_scaler;
select count(sum(psp)) from student_scaler;
-- It will give error as Invalid use of group function as already sum(psp) has converted information into single
-- value.
-- single value can't be grouped.
-- Group by clause
-- select * from student_scaler;
-- Here we have provided b_id in select clause as it is unique and is the parameter to form the group.alter
-- so b_id is having only one value and result associated with the value.
-- find average psp of every batch.
select avg(psp)
from student_scaler
group by b_id;
-- you can get the average of null valued b_id as well as group by is not an aggregate function it is the mechanism
-- keyword to form the group which can have multiple values under same group category.
-- so group by can form a group of null value as well if column which is used for group by clause have null value in
-- it.
-- Get avegare psp of each batch along with batch name.
-- select * from batches_detail;
select avg(psp), b.name
from student_scaler s
join batches_detail b
using(b_id)
group by b.name;
-- Find batch name with highest avg(psp)
select avg(psp), b.name
from student_scaler s
join batches_detail b
using (b_id)
group by s.b_id
order by avg(psp) desc
limit 1;
-- Having clause.
-- How to filter the data after using the group by clause and aggregate functions.
-- How to get the batches name and avg psp which is greater than 95.
-- As you can see in this question you cannot use the where clause as where clause is not allowed for groups.
-- Hence need to use having clause for filtering the data.
-- Reasons:
-- As where clause works on row by row fashion and group is not a row by row value hence can't use where clause.
-- We have having clause specially designed for this activity of filtering.
select avg(psp), b.name
from student_scaler s
join batches_detail b
on s.b_id = b.b_id
-- where avg(psp) > 95;
group by s.b_id;
-- where avg(psp) > 95;
-- It will raise the syntectical error as where clause is not permitted by using group by clause.
-- If we use the where clause before group by then group by clause is not permitted as you have already filtered
-- the data and that intrupted the group by clause acitivity to group the column values.
-- Right query will be
select avg(psp), b.name
from student_scaler s
join batches_detail b
on s.b_id = b.b_id
group by s.b_id
having avg(psp) > 95;
-- Get avg(psp) of batch id 1
-- select * from student_scaler;
select avg(psp), b_id
from student_scaler
where b_id = 1;
-- Rules for sequence of query execution.
-- 1. Get data using from and joins from table.
-- 2. Then we can group our data. or filter the data by using the where clause.
-- 3. Filter data based on "Having clause" condition. If data is not filtered by using where clause.
-- 4. Order by.
-- 5. Print out data.
-- from/join -> where -> group by -> having -> order -> select.
-- views
-- Get name of movies along with the actors who performed in it ?
-- We need to use three tables to do it.
-- 1. film table. 2. actor table. 3. film_actor table.
use sakila;
select f.title, concat(a.first_name, ' ', a.last_name) full_name
from film f
join film_actor fa
on f.film_id = fa.film_id
join actor a
on fa.actor_id = a.actor_id;
-- Find the actor list who performed in a film and film has earned a lot of money.alter
-- so I want to use the same star cast to another movie in hope of earning once again a lot from this movie.
select f.title, concat(a.first_name, ' ', a.last_name) full_name
from film f
join film_actor fa
on f.film_id = fa.film_id
join actor a
on fa.actor_id = a.actor_id
where f.title = 'AGENT TRUMAN';
-- As you can see this query can be used multiple time where on filter condition is going to change every time.
-- So it wise to make the common query a view and then generate result as per requirement and input for filteration.
-- Creating a view:
create or replace view film_actor_info
as
select f.title, concat(a.first_name, ' ', a.last_name) full_name
from film f
join film_actor fa
on f.film_id = fa.film_id
join actor a
on fa.actor_id = a.actor_id;
-- view will only execute the code and don't store the result. Hence it is not consuming any data.
-- so in short it save us from writing same query again and again and by using the view you can get the result.
-- you can even do manupulation on the result of the view and it will not effect the base table.
-- As a result data is safe and you are free to perform any type data analysis without being scared of data alteration.
-- or data loss.
select * from film_actor_info where title = 'ALABAMA DEVIL';
-- Materialized views
-- MySql don't have this concept.
-- It will store the data and will not execute the code and will fetch the already stored data.
-- Biggest issue is that it will have stale data and will provide old data and will consume space.
-- Hence view is better than materialized views as it will give latest information and don't consume space.
Comments
Post a Comment