View PDF File

Data Cleaning in SQL

Transform Raw Data into Actionable Insights Using SQL

Created by: Muhammad Umar Hanif

What is Data Cleaning?

Data cleaning in SQL involves identifying and correcting errors or inconsistencies in data to improve its quality and accuracy.

01. Handle Missing Values

COALESCE()

Returns the first non-NULL value from a list.

SELECT name, base_salary, COALESCE(bonus, 0) AS bonus FROM employees;

IFNULL()

Replaces NULL with a specified value.

SELECT name, base_salary, IFNULL(bonus, 0) AS bonus FROM employees;

ISNULL()

Checks if a value is NULL and replaces it.

SELECT name, base_salary, ISNULL(bonus, 0) AS bonus FROM employees;

02. Remove Duplicates

DISTINCT()

Removes duplicate rows.

SELECT DISTINCT name, department FROM employees;

ROW_NUMBER()

Assigns a unique row number and helps remove duplicates.

SELECT name, department, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id) AS row_num FROM employees;

03. Standardize Text

LOWER()

SELECT LOWER(name) FROM employees;

UPPER()

SELECT UPPER(name) FROM employees;

TRIM()

SELECT TRIM(name) FROM employees;

04. Correct Inconsistent Data

UPDATE products SET product_code = CONCAT('PROD-', SUBSTR(TRIM(product_code), INSTR(product_code, '-') + 1));

05. Change Data Types

SELECT CAST(sale_id AS VARCHAR(10)) FROM sales; SELECT CONVERT(VARCHAR(10), sale_date, 103) FROM sales;

06. Handle Date Format Issues

SELECT STR_TO_DATE(order_date, '%d/%m/%Y') FROM orders; SELECT EXTRACT(YEAR FROM order_date) FROM orders;

07. Enforce Data Integrity

CREATE TABLE Customers (CustomerID INT PRIMARY KEY, Age INT, CHECK (Age >= 18 AND Age <= 100));

08. Handle Numeric Values

SELECT ROUND(sale_amount, 2) FROM sales_data; SELECT CEIL(sale_amount) FROM sales_data; SELECT FLOOR(sale_amount) FROM sales_data; SELECT ABS(sale_amount) FROM sales_data;