Getting Started with Postgres

How to connect to Postgres

Connect via VS-Code SQLTools Extension
Connect via DBeaver
Connect via pgpAdmin
Connect via CLI
Connect via Python

Connect via Python and Create a new database named `mydb`

note

Install the psycopg2 library with pip install psycopg2-binary in the terminal environment.

import os
import pandas as pd
from sqlalchemy import create_engine

POSTGRES_USERNAME = ''
POSTGRES_PASSWORD = ''
POSTGRES_ENDPOINT = ''
POSTGRES_DATABASE = ''

CONN = f"postgresql://{POSTGRES_USERNAME}:{POSTGRES_PASSWORD}@{POSTGRES_ENDPOINT}:5432/{POSTGRES_DATABASE}"

engine = create_engine(CONN)

conn = engine.connect()

try: 
    conn.execution_options(isolation_level="AUTOCOMMIT").execute("CREATE DATABASE mydb")
except Exception as e:
    print(e)
finally:
    conn.close()

Create the following table schema into the database

field name	data type
student_id	integer
name	varchar
age	integer
gender	varchar
subject	varchar
marks	integer

conn = engine.connect()

try: 
    conn.execute("CREATE TABLE IF NOT EXISTS students (student_id int, name varchar,\
    age int, gender varchar, subject varchar, marks int);")
except Exception as e: 
    print("Error: Issue creating table")
    print(e)
finally:
    conn.close()

Insert the following two rows in the table

First Row: 1, "Raj", 23, "Male", "Python", 85
Second Row: 2, "Priya", 22, "Female", "Python", 86

conn = engine.connect()

try: 
    conn.execute("INSERT INTO students (student_id, name, age, gender, subject, marks) \
                 VALUES (%s, %s, %s, %s, %s, %s)", \
                 (1, "Raj", 23, "Male", "Python", 85))
    conn.execute("INSERT INTO students (student_id, name, age, gender, subject, marks) \
                  VALUES (%s, %s, %s, %s, %s, %s)",
                  ( 2, "Priya", 22, "Female", "Python", 86))
except Exception as e: 
    print("Error: Inserting Rows")
    print (e)
finally:
    conn.close()

Retrieve the results and print

Method 1: Using connection fetch method

conn = engine.connect()

try: 
    result = conn.execute("SELECT * FROM students;")
except Exception as e: 
    print("Error: select *")
    print(e)

row = result.fetchone()
while row:
    print(row)
    row = result.fetchone()

Method 2: Using Pandas

conn = engine.connect()
pd.read_sql("SELECT * FROM students", con=conn)

Perform CRUD operations on a sample music data

query = """
CREATE TABLE Music (
    Artist VARCHAR(20) NOT NULL, 
    SongTitle VARCHAR(30) NOT NULL,
    AlbumTitle VARCHAR(25),
    Year INT,
    Price FLOAT,
    Genre VARCHAR(10),
    CriticRating FLOAT,
    Tags TEXT,
    PRIMARY KEY(Artist, SongTitle)
);
"""

conn = engine.connect()

try: 
    result = conn.execute(query)
except Exception as e: 
    print(e)

query = """
INSERT INTO Music 
    (Artist, SongTitle, AlbumTitle, 
    Year, Price, Genre, CriticRating, 
    Tags)
VALUES(
    'No One You Know', 'Call Me Today', 'Somewhat Famous',
    2015, 2.14, 'Country', 7.8,
    '{"Composers": ["Smith", "Jones", "Davis"],"LengthInSeconds": 214}'
);
INSERT INTO Music 
    (Artist, SongTitle, AlbumTitle, 
    Price, Genre, CriticRating)
VALUES(
    'No One You Know', 'My Dog Spot', 'Hey Now',
    1.98, 'Country', 8.4
);
INSERT INTO Music 
    (Artist, SongTitle, AlbumTitle, 
    Price, Genre)
VALUES(
    'The Acme Band', 'Look Out, World', 'The Buck Starts Here',
    0.99, 'Rock'
);
INSERT INTO Music 
    (Artist, SongTitle, AlbumTitle, 
    Price, Genre, 
    Tags)
VALUES(
    'The Acme Band', 'Still In Love', 'The Buck Starts Here',
    2.47, 'Rock', 
    '{"radioStationsPlaying": ["KHCR", "KBQX", "WTNR", "WJJH"], "tourDates": { "Seattle": "20150625", "Cleveland": "20150630"}, "rotation": Heavy}'
);
"""

conn = engine.connect()

try: 
    result = conn.execute(query)
except Exception as e: 
    print(e)

query = """
SELECT * FROM Music;
"""

conn = engine.connect()
pd.read_sql(query, conn)

query = """
SELECT * FROM Music
WHERE Artist='No One You Know';
"""

conn = engine.connect()
pd.read_sql(query, conn)

query = """
SELECT * FROM Music
WHERE Artist='No One You Know' AND SongTitle LIKE 'Call%%';
"""

conn = engine.connect()
pd.read_sql(query, conn)

query = """
SELECT * FROM Music
WHERE Artist='No One You Know' AND SongTitle LIKE '%%Today%%'
AND Price > 1.00;
"""

conn = engine.connect()
pd.read_sql(query, conn)

query = """
UPDATE Music
SET Genre = 'Disco'
WHERE Artist = 'The Acme Band' AND SongTitle = 'Still In Love';
"""

conn = engine.connect()

try: 
    conn.execute(query)
except Exception as e: 
    print(e)

query = """
SELECT * FROM Music;
"""

conn = engine.connect()
pd.read_sql(query, conn)

query = """
DELETE FROM Music
WHERE Artist = 'The Acme Band' AND SongTitle = 'Look Out, World';
"""

conn = engine.connect()

try: 
    conn.execute(query)
except Exception as e: 
    print(e)

query = """
SELECT * FROM Music;
"""

conn = engine.connect()
pd.read_sql(query, conn)

query = """
DROP TABLE Music;
"""

conn = engine.connect()

try: 
    conn.execute(query)
except Exception as e: 
    print(e)

Download and insert data from a csv file into the database

!wget -q --show-progress https://github.com/RecoHut-Datasets/retail_general/raw/main/sample_superstore.csv

!head sample_superstore.csv

data = pd.read_csv('sample_superstore.csv')

data.to_sql('superstore',
            con=engine,
            if_exists='replace',
            index=False,
            method='multi')

conn = engine.connect()
pd.read_sql("SELECT * FROM superstore LIMIT 10", con=conn)

Create Fact and Dimension Tables from Denormalized Raw Data

In data warehousing world there are occasions where developers have to reverse engineer model from flat csv files. We will understand this with simple example.

Login to RDS Postgres using psql

psql --host=database-1.cy8ltogyfgas.us-east-1.rds.amazonaws.com --port=5432 --username=postgres --password --dbname=sparsh

Create Schema & Rawdata table

This step is the dump the entire CSV into a ProstgreSQL table so its easier to clean or create Dimension tables.

Here we will be creating 3 schemas called landing, dim and fact. Schemas are very useful in grouping the tables logically.

create schema elt_landing;
create schema elt_dim;
create schema elt_fact;

create table elt_landing.rawdata (
    name varchar(100)
    ,gender varchar(20)
    ,profession varchar(50)
    ,state varchar(2)
    ,asofdate date
    ,temperature float
    ,pulse int
);

Verify the table & schema creation:

\dt elt_*.*

Import CSV data into Postgres table

data.csv
Name,Gender,Profession,state,asOfDate,temperature,pulse
Rachel Green,Female,Designer,NY,2020-11-01,98.4,60
Sheldon Cooper,Male,Physicist,CA,2020-11-05,98.5,70
Penny,Female,Waitress,CA,2020-11-15,99.2,75
George Costanza,Male,Agent,NJ,2020-05-05,98.7,90
Jerry Seinfeld,Male,Comedian,NY,2020-01-01,98.6,65

\COPY elt_landing.rawdata FROM 'data.csv' DELIMITER ',' CSV HEADER;

Verify the data:

select count(*) from elt_landing.rawdata;
select * from elt_landing.rawdata limit 10;

Add a surrogate ID column

surrogate column means, column with sequence of numbers, generally auto generated.

alter table elt_landing.rawdata add id serial;
select * from elt_landing.rawdata limit 10;

Identify the possible Dimensions

In this sample we can choose Gender, Name, State, Profession as possible dimensions.

Using select statement generate Dimension tables based on Distinct values.

Creating Gender dimension. Here the sub query returns the distinct genders and using the Windowing Function (row_number()) we are generating a unique ID for each gender.

create table elt_dim.gender as 
select 
    row_number() Over(order by gender) as genderid
    ,gender 
from 
    (select distinct gender from elt_landing.rawdata) t;

Similarly creating other Dimension tables:

-- Second Query

create table elt_dim.person as
select 
    row_number() Over(order by name) as personid
   ,name 
from 
    (select distinct name from elt_landing.rawdata) t;

-- Third Query

create table elt_dim.profession as
select 
    row_number() Over(order by profession) as professionid
    ,profession 
from 
    (select distinct profession from elt_landing.rawdata) t;

-- Fourth Query

create table elt_dim.state as 
select 
    row_number() Over(order by state) as stateid
    ,state 
from 
    (select distinct state from elt_landing.rawdata) t;

Verify the Dimension tables:

select * from elt_dim.person;
select * from elt_dim.profession;
select * from elt_dim.state;
select * from elt_dim.gender;

Build Fact table based on IDs from Dimension Table

This is the key step which will be generating the necessary Fact table. As the Dimensions are generated from landing data, JOIN will be used to build the fact table.

create table elt_fact.user
as
select
    r.id
    ,p.personid
    ,g.genderid
    ,pr.professionID
    ,s.stateID
    ,r.asofdate
    ,r.temperature
    ,r.pulse
from
    elt_landing.rawdata r
    JOIN elt_dim.person as p on r.name = p.name
    JOIN elt_dim.gender as g on r.gender = g.gender
    JOIN elt_dim.profession as pr on r.profession = pr.profession
    JOIN elt_dim.state as s on r.state = s.state;

In the above query r.id is the Original Surrogate key from elt_landing.rawdata.

Compare and verify the data between elt_landing.rawdata and elt_fact.user table:

select * from elt_landing.rawdata where id = 1;

select * from elt_fact.user where id = 1;
select * from elt_dim.person where personid = 4;

This is the basics, if needed the data can be normalized / modeled further.

Example : asofdate is used as part of Fact, if needed date can be normalized into Year, Month, Day for Snowflake Schema.

Building a Database for Crime Reports

info

This is assignment-oriented.

Using PostgreSQL for storing data related to crimes that occurred in Boston. Dataquest provided the dataset boston.csv for input. Your goal is to:

Create the table boston_crimes
Create the table with the appropriate data types for storing the information from boston.csv
Store the table inside the schema crimes
Create the user groups readonly and readwrite with appropriate privileges
Create the users data_analyst and data_scientist and assign to readonly and readwrite groups, respectively
Verify if the privileges of user groups are set accordingly

To accomplish your goals in this project, you would have to perform the following:

Create the required database and schema after installing PostgreSQL and psycopg2 module
Explore the column headings and content of boston.csv to determine the appropriate data types
Create the required table using the appropriate data types
Load the data from boston.csv into the table
Create the user group readonly which has the following privileges: database connection, schema usage, and data selection from all tables in the schema
Create the user group readwrite which has similar privileges with readonly and capable of inserting, deleting, and updating the data in all tables in the schema
Create the requested users and assign them to their respective user groups
Test the database if correct objects is created and users/groups have the right privileges

Follow this link for more information.

Getting Started with Postgres

How to connect to Postgres​

Connect via Python and Create a new database named mydb​

Create the following table schema into the database​

Insert the following two rows in the table​

Retrieve the results and print​

Perform CRUD operations on a sample music data​

Download and insert data from a csv file into the database​

Building a Database for Crime Reports​