Joins

 Joins

It help to stiches rows of two or more table together to provide meaningful information for business usecase. It is the best way to unite distributed information in database responsibly and help to safe space.

To join to tables we must use right condition and criteria else cartition product will be created and will make the information inconsistent and will even leads to incorrect information generation.

create table student_scaler(
id int not null primary key auto_increment,
name varchar(500) not null,
b_id int,
psp int);


select * from student_scaler;

insert into student_scaler (name, b_id, psp)
values('Puneet Kumar Singh', 1, 90),
('Anubha Panwar', 1, 100),
('Devansh Singh', 2, 99),
('Sachin Panwar', 3, 96);



create table batches_detail(
b_id int not null primary key auto_increment,
name varchar(500)
);

select * from batches_detail;
insert into batches_detail (name) values ('A'),
('B'),
('C');

-- joining student_scaler and batches_detail table without any condition and criteria.

select * from student_scaler join batches_detail; -- It will generate the cartition product and will make all records of one table to join with other table and that will not make any sense.

Sample data after this cartition query.
# id, name, b_id, psp, b_id, name
'1', 'Puneet Kumar Singh', '1', '90', '3', 'C'
'1', 'Puneet Kumar Singh', '1', '90', '2', 'B'
'1', 'Puneet Kumar Singh', '1', '90', '1', 'A'
'2', 'Anubha Panwar', '1', '100', '3', 'C'
'2', 'Anubha Panwar', '1', '100', '2', 'B'
'2', 'Anubha Panwar', '1', '100', '1', 'A'
'3', 'Devansh Singh', '2', '99', '3', 'C'
'3', 'Devansh Singh', '2', '99', '2', 'B'
'3', 'Devansh Singh', '2', '99', '1', 'A'
'4', 'Sachin Panwar', '3', '96', '3', 'C'
'4', 'Sachin Panwar', '3', '96', '2', 'B'
'4', 'Sachin Panwar', '3', '96', '1', 'A'


Now joining only the desired record from one table to another to have right information which can lead us to go for relevent information and for desired output.

-- joining student_scaler and batches_detail table by using b_id column to get desired records with relevent information for analysis.

select * from student_scaler as a join batches_detail as b on a.b_id = b.b_id; 

# id, name, b_id, psp, b_id, name
'1', 'Puneet Kumar Singh', '1', '90', '1', 'A'
'2', 'Anubha Panwar', '1', '100', '1', 'A'
'3', 'Devansh Singh', '2', '99', '2', 'B'
'4', 'Sachin Panwar', '3', '96', '3', 'C'

Join query generates intermediate table and to access these tables we need to use select clause.


Pseudo code:
ans = []
for row1 in students:
    for row2 in batches:
        if row1[b_id] = row2[b_id]
                ans.add(row1+row2)

for rows in ans:
    print [rows[name], rows[batch_name]]

If you want only the matched records as output it means you need to use inner join. As inner join provide only intersaction as result.


-- get students record with batch_name not the ids.

select a.id, a.name, b.name, a.psp
from student_scaler as a inner join batches_detail as b
on a.b_id = b.b_id;


# id, name, name, psp
'1', 'Puneet Kumar Singh', 'A', '90'
'2', 'Anubha Panwar', 'A', '100'
'3', 'Devansh Singh', 'B', '99'
'4', 'Sachin Panwar', 'C', '96'


We get the columns with respect to the table which is used first and then the one which is used after it.
If we have not specified the order of columns in select clause. Now if we need to use any table in any order we need to get the columns in select clause as per requirement then we need to provide desired column positions in select clause.

Joining the mutliple tables:

We need to understand this concept as in real world we always need to join more than two tables and if we use it efficiently then we can save lots of time and can solve the real life problems and can create new business usecase and can solve them as well.

Rules to join multiple tables:
Think is as cumulative sum property where you need to first join two tables and then with it's result you need to join the other tables.

So join one table with result of other two tables joints.

This is the golden rule and if followed then decrease the changes of complicating the join queries and will even help in getting desired output in one go logically.

Example to understand it:
select * from student2  s join batches_detial b on s.b_id = b.b_id;
It will generate the intermediate table as shown in above article.

Now join another table to this intermediate table.
So now syntax extension will be.


select *
from student2 s
join batches_detail b
on s.b_id = b.b_id
join instrutor i
on s.instructor_id = i.id;


create table instructor (
id int not null primary key auto_increment,
name varchar(500) not null);

select * from instructor;

insert into instructor (name) values ('Puneet'),
('Rahul'), ('Karthik');

create table student2(
id int not null primary key auto_increment,
name varchar(500) not null,
instructor_id int not null,
b_id int not null);

use school;
select * from student2;
insert into student2 (name, instructor_id, b_id) values('Puneet Kumar Singh', 1, 1),
('Anubha Singh', 2, 1),
('Devansh Singh', 1, 1),
('Pushpendra', 1, 2),
('Rupesh', 1, 1);

-- join student with instructor and batches_detial table to have only student name with instructor name and batch name and nothing else in detial

select s.name, i.name, b.name
from student2 s
join instructor i
on s.instructor_id = i.id
join batches_detail b
on s.b_id = b.b_id;

# name, name, name
'Puneet Kumar Singh', 'Puneet', 'A'
'Anubha Singh', 'Rahul', 'A'
'Devansh Singh', 'Puneet', 'A'
'Pushpendra', 'Puneet', 'B'
'Rupesh', 'Puneet', 'A'

Some practices for joins:
  1. for every film get name/list of actors who performed in it.
  2. Display a list of customers who rented a film in 'Horror' category.
    1. include name, last name, email and film they rented
  3. Get all movies where Penelope Guiness worked.

At the time of join we must first analyze the tables and then need to understand what all tables will be need to perform join and what will be the criteria for joining the tables. As that will help us to get the desired result.

Example:
For film title find the names of actors who worked in it.

Q1. Ans:
use sakila;

-- join the film table and actor table to get name of all the actors who worked in a film. Order result with respect to films not with respect to actors.

-- step 1: analyze film table.
describe film; -- as we don't have actor id in this table hence we cannot join film table with actor table directly.

desc actor; -- as we don't have film id so can't join it as well with film table directly.

-- now check database do we have any common table which can act as linkage between both the tables to join them.

desc film_actor; -- we can use this table as it has both film_id and actor_id for joining film and actor table.

select f.title, concat(a.first_name, ' ', a.last_name) as actor_full_name
from film f
join film_actor fa 
on f.film_id = fa.film_id
join actor a
on fa.actor_id = a.actor_id
order by f.title;

I hope it might help.

Q3. Ans:
select * from actor where first_name = 'Penelope' and last_name = 'Guiness'; -- this is the query which will filter the result as per our requirement.

-- find all movies in which Penelope Guiness worked.

-- Step 1: check film and actor table and then identfy if we can directly join them or we need another table to join them as in this case need film_actor help to join them.
-- step 2: Join actor table with film_actor table to get the list of film_ids in which Penelop Guiness worked.
-- step 3: diplay the actor name and films name

select concat(a.first_name,' ', last_name) as actor_full_name, f.title as film_name
from actor a
join film_actor fa
on a.actor_id =  fa.actor_id
join film f
on f.film_id = fa.film_id
where a.first_name = 'Penelope' and a.last_name = 'Guiness';

Q2. Ans:
-- Get list of customers who rented movies in "Horror" category and do provide customer full name (combination of first_name and last_name) email and film they rented.
-- Step 1 understand what all tables are required for this query.
-- film table to get the title of film, 

desc film; -- it has title, film_id, rental_duration, rental_rate, description
desc customer;  -- it has customer_id, first_name, last_name, email, store_id
desc rental; -- it has customer_id, rental_id, staff_id, inventory_id can join rental table and customer table by using customer_id.
desc film_category; -- it has film_id and category_id
desc store; -- store_id, manager_staff_id
desc inventory; -- film_id and store_id
desc staff; -- staff_id and store_id
desc category; -- category_id and name of category
-- we can get the list of customers who has rented a movie by using rental and customer table.
-- we can use inventory table to know which film is present in which store and from that store which customer has purchased the film. Inventory table will also provide the film id to get the film name from film table.
-- use plam is to filter out all the customers who rented a movie by joining rental and customer table.

select distinct concat(c.first_name, ' ', c.last_name) as customer_full_name, f.title
from customer c
join rental r 
on c.customer_id = r.customer_id -- filtered out all the customers who has rented a movie.
join inventory i
on i.store_id = c.store_id -- filtering the store_id to get the film id.
join film_category fc
on i.film_id = fc.film_id
join category ca 
on fc.category_id = ca.category_id -- to identify the category of movie which is rented
join film f
on fc.film_id = f.film_id
where ca.name like '%horror%'; -- filtering only horror category films.

Compound joints:

  • Compound joins are similar to regular joins.
  • We just apply multiple conditions to multiple columns.
Example:
If you have an array and you need to find greater values present in an array for each element then you need to run a nested loop to do it.

arr = [1,2,3,4,5,6,7]
need to find greater elements for all elements present in the array.

so 1 - 2, 3, 4, 5, 6, 7
2 - 3, 4, 5, 6, 7
3 - 4, 5, 6, 7
4 - 5, 6, 7
and so on.


self join is like a nested loop running on same table to get the desired result.

Query to make it feasible.
select f1.*, f2.*
from film f1 
join film f2
on f2.release_year between f1.release_year-2 and f1.release_year+2;

Adding more complexity to the question.
You need to get the movie detail only if you have the rental rate is greater then the movie which is being compared.

select f1.*, f2.*
from film f1 
join film f2
on f2.release_year between f1.release_year-2 and f1.release_year+2
and f2.rate>f1.rate;

Difference between joining the table by using on and by using where.
on will asap check the condition and according to it will produce the result.
Where clause will first have the join result and then will filter out the result.

Hence on is optimized with respect to where clause.


Types of joints:

  • inner join - all the join where condition is need to be satisfied to get the result.
  • self join - is always on same table when you need to find some value which is already present in table.
  • outer join - It can give you results where even condition is not matching.
    • Left join - All data is from left table and only matching from right table.
    • Right join - All data from right table and only matching from left table.
    • Full join - full join is not present in MySQL you can get it by using union or any other mechanism. It means get all records from both the tables even it match or not.
  • cross join - It is used when you want result to have all combinations of records.
    • select * from colors cross join sizes;
We can use any join left or right as they can be used for both of them just need to use them to make the thing simple for you and to provide desired result.


-- left join:
 -- Problem statement: Get all the students along with their batch names, include students
 -- with unassigned batches as well.
 
 select *
 from student_scaler s
 left join batches_detail b
 on s.b_id = b.b_id; 
 
-- right join:
-- problem statement: Get all students  with their assigned batches.
-- Get all batch names even though no students are assigned to them.

select *
from student_scaler s
right join batches_detail b
on s.b_id = b.b_id;

-- full outer join
-- problem statement: Get list of all colors and size as collection.

create table color(
id int not null primary key auto_increment,
name varchar(500));

create table sizes(
id int not null primary key auto_increment,
name varchar(2));

insert into color(name) value('Red'), ('Purple');

insert into sizes(name) value('M'),('L');


select * from color;

select * from sizes;

select *
from color
cross join
sizes
order by color.name;


select s.name, b.name
from student_scaler s
cross join batches_detail b;


-- syntactical sugers for sql 
-- using clause
--  It is used to join two tables when both have column name.

select * 
from student_scaler s
join batches_detail b
using (b_id);

-- natural join
-- When we have two tables with same column name then we can use natural 
-- join to get the result.
-- we need to have foreign defined in table to use natural join.

select *
from student_scaler s
natural join batches_detail b; 

-- another way of writing cross join - implicit cross join.

select * from color, sizes;

Difference between ON and Where clause:

  • Where - it will perform first cross join and then will filter the data and will make query inefficient and will take time to provide desired result.
  • ON - It will use conditions to perform join and will provide result without wasting space and time.
-- Where:
select * 
from student_scaler s
join batches_detail b
where s.b_id = b.b_id;

Pseudo code for ON clause:

ans = []
for row1 in table1:
    for row2 in table2:
        if('ON' clause condition) matches: 
            ans.add(row1+row2)

for i in ans:
    print(i)


Pseudo code for where clause:

ans=[]
for row1 in table1:
    for row2 in table2:
        ans.add(row1+row2)

for i in ans:
     if('where' clause condition) matches:
        print(i)

As you can see the Pseudo code On clause is efficient as it will generate less data for display and will automatically will make the filteration at the time of joining.


Union:

When we need to club result of different SQL queries then we use union but do remember output column count and name must be same. It will give you unique records as you are using the union and it will de-duplicate the data but if you want duplicate information then use union all.

  • union - no duplicate
  • union all - include duplicate provide all information.

example:
-- need to see the name of students, instructor and invertors as list combined.

It will remove the duplicate entries.

select name from students
union
select name from instructors
union 
select name from investors;


Result with duplicates:

select name from students
union all
select name from instructors
union all
select name from investors;






Comments

Popular posts from this blog

Aggregate Queries and Views

Schema Design

Transactions in SQL