DBMS | Basic concepts

kartikkhk
8 min readMay 22, 2020

--

Disadvantages of a file System :

1. Data Redundancy : Repetitive data

2. Data inconsistency : Data Changed somewhere should also be changed in other places it is used.

3. Difficulty in Accessing Data

4. Data Isolation

5. Security Problem

6. Atomicity

7. Integrity Problem

Entity : Entity is an Object.

  • Entity Set : Similar types of entities grouped together is called Entity Set.

Types of Attributes :

  • Simple — Composite
  • Single — MultiValued
  • Stored — Derived

Relation : Association b/w two or more entities of same or different entity Set.

  • Degree Of A Relationship Set : Means Number of entity Sets participating in the relation.
  • UNARY
  • BINARY
  • N-ARY

Mapping Cardinalities / Cardinality Ratio :

  • Express the number of Entities to which other entity can be related via a relationship.
  • 1 : 1 ( One to One )
  • 1 : N ( One to Many )
  • N : 1 (Many to One )
  • N : M (Many to Many )

Functional Dependency :

  • f : (A → B) This Means that given the value of A There is only one value of B Corresponding to A .
  • Closure Set represents the set of attributes from which you can find the values of all the other attributes. eg : If : A->B && B->C then A->BC hence (A)+

EQUIVALENCE SET OF FUNCTIONAL DEPENDENCIES

KEYS :

  • Consider A table :- | A | B | C | And the Dependency A-> BC here We can say A is the key i.e (A)+ = R
  • (Key)+ = R (Closure of a Key is R)
  • SUPER KEY — (ALL KEYS) Satisfies the relation mentioned above. (Any Key i.e). — (SO IT BASICALLY IS THE SET OF ALL POSSIBLE KEYS)
  • CANDIDATE KEY — It is the most efficient SUPER KEY . This means that the subset of the candidate key isn’t any super key. SO BASICALLY IN A Venn diagram CANDIDATE KEY WOULD BE INSIDE SUPER KEY.

Why Normalization is done :

  • Used to reduce redundancy.
  • Insertion Anomaly : When certain data (attribute) cannot be inserted into a Database without the presence of other data.
  • Deletion Anomaly : If we want to delete some unwanted data and it leads to the deletion of Some other important data then it is called deletion Anomaly.
  • Updating Anomaly : When We want to update a piece of information and it has to be updated in all of its Copies.

First Normal Form : ( 1NF )

  • 1 NF states that there should be Atomicity in your DB. ( i.e There Should be only one corresponding value to each tuple )

Second Normal Form : ( 2NF )

  • Attributes of the candidate key are called prime Attributes.
  • Attributes not a part of the candidate key are called non prime Attributes.
  • There Shouldn’t be ANY Partial Dependency :
  • Partial Dependency : When A Non — Prime Attribute instead of depending on all the prime Attributes of a candidate key depends only on some of the attributes.Then it is called Partial Dependency.

Third Normal Form : (3NF)

  • Relation is in 2NF and there is No Transitive Dependency.
  • Transitive Dependency : if ( A->B && B->C ) : (When a non prime attribute derives another non prime attribute)
  • Transitive dependency : If A is a prime attribute and B and C are non — prime Attributes then it is called Transitive Dependency.

BCNF — When a prime attribute is derived by a prime / non-prime attribute

  • A relation R is in BCNF if R is in Third Normal Form and for every Functional Dependency, LHS is super key.
  • A relation is in BCNF iff in every non-trivial functional dependency X –> Y, X is a super key.

Indexing : Indexing is a data structure technique to efficiently retrieve records from the database files based on some attributes on which the indexing has been done

  • Indexing is basically a method to improve access time to get data from a database.
  • Data maybe sorted or unsorted but the amount of time taken to reach the block of records is high so indexing is used.
  • Index File is Created.
  • Index file stores the Search Attribute and the Block number (Block Pointer) associated with it.
  • Index file is always sorted.

Types of Indexing:

  • Primary Indexing:
  • Main File sorted
  • Primary Key used as search key
  • It’s an example of sparse indexing
  • No. of entries = no. of blocks acquired in index file by the main file.
  • No. Of access required = log2n + 1 (binary search)
  • Clustered Indexing:
  • Main File sorted
  • Non key attribute used as search key
  • There will be one entry for each unique value of the non key attribute
  • If the number of blocks acquired by index file is n then block access time required = log2n + 1
  • Secondary Indexing:
  • Main file unsorted
  • Key or Non-key attribute can be used as search key
  • Called secondary because indexing is generally done more than or equal to 2 times to improve access time.
  • Multi-Level Indexing:
  • An another index file is created for the index file present. (Or probably more) to optimise memory and search speed.

Atomicity : (All or Nothing) : Ensures that a transaction will run to completion as an indivisible unit, at the end of which either no changes have occurred to the Database or the database has been changed in a consistent manner.

Consistency : (A1+B1=A2+B2)Correctness or ensures that if the database was in consistent state before the transaction then it will be in a consistent state after the transaction.

Isolation : Indicates that the actions performed by a transaction will be isolated or hidden from outside the transaction until it terminates.

Durability : All updates done by a transaction must be permanent.

Concurrency : When more than one transactions run together.

  • Advantages of Concurrency :
  • Waiting time decreases
  • Response time decreases
  • Resource utilisation increases
  • Efficiency increases

Dirty Read Problem : A dirty read occurs when one transaction is permitted to read data that is being modified by another transaction which is running concurrently but which has not yet committed itself.

Lossless Join : This property guarantees that the extra or less tuple generation problem does not occur after decomposition.

  • So basically after normalisation if your table gets divided into two or more tables and you are not able to get the same (previous) table after joining the new tables then your decomposition is lossy.
  • It is a mandatory property and must always hold good.
  • If a relation R is decomposed into two relations R1 and R2, then it will be loss-less if these three conditions are satisfied :
  • attr(R1) U attr(R2) = R
  • attr(R1) INTERSECTION attr(R2) != (Empty Set)
  • attr(R1) INTERSECTION attr(R2) → attr(R2) || attr(R1) INTERSECTION attr(R2) → attr(R1)
  • What this basically means is that the intersection of R1 and R2 should be a candidate key.

SQL : -

Data Definition Language : (Defines Relational Schema) DDL statements create and modify database objects such as tables, indexes, and users. Common DDL statements are CREATE, ALTER, and DROP.

Data Manipulation Language : A data manipulation language (DML) is a computer programming language used for adding (inserting), deleting, and modifying (updating) data in a database.

Select Clause:

  • Select (Column Name)
  • From (Relation Name)
  • Where (Condition)
  • eg.
  • Select * From Branch;
  • Select C_name from Depositor;
  • Select distinct C_name from Depositor;
  • Select loan_no. , amount from Loan;
  • Select A_no. , Balance * 1.06 from Account;
  • Select A_no. From Account where balance <1000;
  • Select b_name from Branch where B_city = ‘delhi’ and assets < 1,00,000;
  • Select b_name , A_no. From Account where balance >= 100 and balance <=1000; or Select b_name , A_no. From Account where balance between 100 and 1000

Set operators — (Union / Intersection / minus)

  • Set operators are only applicable between two relations if both of them are compatible i.e. (Same number of Columns with same domain of attributes)

( Select C_name from Depositor ) Union ( Select C_name from Borrower)

( Select C_name from Depositor) Intersect ( Select C_name from Borrower)

( Select C_name from Depositor) Minus ( Select C_name from Borrower)

Cartesian product in SQL -

  • Cartesian product combines two relations into one by concatenating each tuple of first relation with all the tuples of second relation.
  • If the same attribute appears in both the relations then the relation name is added as prefix to the attributes

Select C_name from Account,Depositor where b_name = ‘Delhi’ and Account.A_no. = Depositor.A_no.

Select Loan_no. , amount from Loan,Branch where b_city = ‘Delhi’ and Branch.B_name = Loan.B_name

Select C_name From Borrower,Branch,Loan where assets < 1,00,000 and Branch.B_name = Laon.B_name and Loan.loan_no. = Borrower.loan_no.

Queries on Multiple Relations (Natural Join) : -

  • Basically natural join works the same as Cartesian product but considers only those pairs of tuples with the same value

on those attribute that appear in the schemas of both relations.

  • May Lead to data loss when the attributes to be matched have different names which isn’t an issue while using Cartesian product.
  • May Lead to data loss if some values only occur in one table and not in the another table.

Select C_name from Account Natural Join Depositor where B_name. = ‘Delhi’

Select Loan_No. , Amount from Loan Natural Join Branch where b_city = ‘Delhi’

Select C_name From Borrower Natural Join Branch Natural Join Loan Where assets < 1,00,000

Join/(INNER JOIN) : -

  • Is an operation which works same as Cartesian product if used alone.
  • But if JOIN is used with the keyword USING it provides additional functionality.
  • Provides ability to explicitly share columns which must be used by join for comparison and removal of

redundant tuples.

  • If there are more than one tuples common between two tables but we do not want each of them to be considered.
  • If JOIN is used with the keyword ON :
  • In SQL JOIN with ON keyword allows us to use a general predicate over the relations being joined.
  • SO ON basically acts (kinda) like a “WHERE” and helps us to solve the disadvantages faced in NATURAL JOIN where

you cannot join the attributes which have the same type but don’t have the same name. For example :

Select C_name

from Loan JOIN Borrower ON L.L_no = B.Loan_No. AND A<1000

  • In Case from Outer Join
  • Eg:
  • SELECT A from R1 JOIN R2 USING (B) WHERE D = 10
  • OUTER JOIN : -
  • One problem with natural join is that only those values that appear in both the relations will manage to reach the final Table.

but if some data is explicitly in table one or table two then that data is lost.

  • TYPES : -
  • Left Outer Join (Left Join)
  • Right Outer Join (Right Join)
  • Full Outer Join (Full Join / Complete Join)
  • But Some data loss happens in left and right joins.
  • There is no data loss in Full Outer Join.

Interview Questions :

Q. What is the difference between delete, truncate and drop statements?

  • Delete Command:
  • Used to Delete a row in the Table
  • It’s a DML command.
  • You can rollback data after Delete command
  • Slower than truncate command
  • Truncate Command:
  • Used to Delete all the rows from a table
  • You cannot rollback data
  • It’s a DDL command
  • It is faster than Delete command
  • Drop Command:
  • Used to remove a table and rollback isn’t allowed.

Q. Name subsets of SQL :

  • DML : Data manipulation language deals with manipulation of data present in the database.
  • DDL : Data definition language consists of commands to define the relational schema.

Q. What is the difference between Char and Varchar data types:

  • Char : Used for Strings of fixed length
  • Varchar : Used for Strings of variable length

Q. Define Primary Key.

  • A set of attributes that can be used to identify all the other attributes of the relation

Q. Difference between SQL and MySQL :

  • SQL is a language.
  • MySQL is an open source Database Management System.

Q. Define Foreign Key:

  • Foreign Key is a set of attributes that act as a link between two tables

Q. Write a SQL query to get date?

  • Select GETDATE();

Q. What is indexing?

  • Used for faster retrieval of records

--

--

No responses yet