SQL Assignment 3
Setup the environment
!mkdir -p ~/.aws
!pip install -qq psycopg2-binary awscli boto3 s3fs
%%writefile ~/.aws/credentials
[default]
aws_access_key_id=<enter access key>
aws_secret_access_key=<enter secret access key>
region=us-east-1
output=json
import boto3
import json
import pandas as pd
%reload_ext sql
def get_secret(secret_name):
region_name = "us-east-1"
session = boto3.session.Session()
client = session.client(
service_name='secretsmanager',
region_name=region_name)
get_secret_value_response = client.get_secret_value(SecretId=secret_name)
get_secret_value_response = json.loads(get_secret_value_response['SecretString'])
return get_secret_value_response
db_credentials = get_secret(secret_name='wysde')
USERNAME = db_credentials["RDS_POSTGRES_USERNAME"]
PASSWORD = db_credentials["RDS_POSTGRES_PASSWORD"]
HOST = db_credentials["RDS_POSTGRES_HOST"]
PORT = 5432
DBNAME = "postgres"
CONN = f"postgresql://{USERNAME}:{PASSWORD}@{HOST}:{PORT}/{DBNAME}"
%sql {CONN}
Data Schema Analysis
%sql select * from city limit 5;
index | id | name | countrycode | district | population |
---|
0 | 6 | Rotterdam | NLD | Zuid-Holland | 593321 |
1 | 3878 | Scottsdale | USA | Arizona | 202705 |
2 | 3965 | Corona | USA | California | 124966 |
3 | 3973 | Concord | USA | California | 121780 |
4 | 3977 | Cedar-Rapids | USA | Iowa | 120758 |
%sql select * from country limit 5;
index | code | name | continent | region | surfacearea | indepyear | population | lifeexpectancy | gnp | gnpold | localname | governmentform | headofstate | capital | code2 |
---|
0 | ABW | Aruba | North America | Caribbean | 193 | None | 103000 | 78.4 | 828 | 793.0 | Aruba | Nonmetropolitan Territory of The Netherlands | Beatrix | 129.0 | AW |
1 | AFG | Afghanistan | Asia | Southern and Central Asia | 652090 | 1919.0 | 22720000 | 45.9 | 5976 | None | Afganistan/Afqanestan | Islamic Emirate | Mohammad Omar | 1.0 | AF |
2 | AIA | Anguilla | North America | Caribbean | 96 | None | 8000 | 76.1 | 63 | None | Anguilla | Dependent Territory of the UK | Elisabeth II | 62.0 | AI |
3 | AND | Andorra | Europe | Southern Europe | 468 | 1278.0 | 78000 | 83.5 | 1630 | None | Andorra | Parliamentary Coprincipality | None | 55.0 | AD |
4 | ANT | Netherlands Antilles | North America | Caribbean | 800 | None | 217000 | 74.7 | 1941 | None | Nederlandse Antillen | Nonmetropolitan Territory of The Netherlands | Beatrix | 33.0 | AN |
%sql select * from station limit 5;
index | id | city | state | lat_n | long_w |
---|
0 | 794 | Kissee Mills | MO | 140 | 73 |
1 | 824 | Loma Mar | CA | 49 | 131 |
2 | 603 | Sandy Hook | CT | 72 | 148 |
3 | 478 | Tipton | IN | 34 | 98 |
4 | 619 | Arlington | CO | 75 | 93 |
%sql select * from students limit 5;
index | id | name | marks |
---|
0 | 19 | Samantha | 87 |
1 | 21 | Julia | 96 |
2 | 11 | Britney | 95 |
3 | 32 | Kristeen | 100 |
4 | 12 | Dyana | 55 |
Questions
Query all columns for all American cities in the CITY table with populations larger than 100000. The CountryCode for America is USA
index | id | name | countrycode | district | population |
---|
1 | 3878 | Scottsdale | USA | Arizona | 202705 |
2 | 3965 | Corona | USA | California | 124966 |
3 | 3973 | Concord | USA | California | 121780 |
4 | 3977 | Cedar-Rapids | USA | Iowa | 120758 |
5 | 3982 | Coral-Springs | USA | Florida | 117549 |
Query the NAME field for all American cities in the CITY table with populations larger than 120000. The CountryCode for America is USA
name |
---|
Scottsdale |
Corona |
Concord |
Cedar-Rapids |
Query the average population for all cities in CITY, rounded down to the nearest integer
Query a count of the number of cities in CITY having a Population larger than 100,000
Query the total population of all cities in CITY where District is California
Query the difference between the maximum and minimum populations in CITY
Query all attributes of every Japanese city in the CITY table. The COUNTRYCODE for Japan is JPN
index | id | name | countrycode | district | population |
---|
35 | 1613 | Neyagawa | JPN | Osaka | 257315 |
36 | 1630 | Ageo | JPN | Saitama | 209442 |
37 | 1661 | Sayama | JPN | Saitama | 162472 |
38 | 1681 | Omuta | JPN | Fukuoka | 142889 |
39 | 1739 | Tokuyama | JPN | Yamaguchi | 107078 |
Given the CITY and COUNTRY tables, query the names of all cities where the CONTINENT is 'Africa'
Note: CITY.CountryCode and COUNTRY.Code are matching key columns.
name |
---|
Bujumbura |
al-Minya |
Banjul |
Monrovia |
Taza |
Malabo |
Given the CITY and COUNTRY tables, query the sum of the populations of all cities where the CONTINENT is 'Asia'
Note: CITY.CountryCode and COUNTRY.Code are matching key columns.
Given the CITY and COUNTRY tables, query the names of all the continents (COUNTRY.Continent) and their respective average city populations (CITY.Population) rounded down to the nearest integer
Note: CITY.CountryCode and COUNTRY.Code are matching key columns.
continent | floor |
---|
Africa | 254397 |
Asia | 262486 |
South America | 2042407 |
North America | 95368 |
Europe | 221749 |
Oceania | 8600 |
Query a list of CITY names from STATION for cities that have an even ID number. Print the results in any order, but exclude duplicates from the answer
city |
---|
Aguanga |
Alba |
Albany |
Amo |
Andersonville |
Find the difference between the total number of CITY entries in the table and the number of distinct CITY entries in the table
For example, if there are three records in the table with CITY values 'New York', 'New York', 'Bengalaru', there are 2 different city names: 'New York' and 'Bengalaru'. The query returns 1, because
total number of records - number of unique city names = 3 - 2 = 1
Query the two cities in STATION with the shortest and longest CITY names, as well as their respective lengths (i.e.: number of characters in the name).
If there is more than one smallest or largest city, choose the one that comes first when ordered alphabetically. For example, CITY has four entries: DEF, ABC, PQRS and WXY, the output would be:
city | length |
---|
Marine On Saint Croix | 21 |
Query the list of CITY names starting with vowels (i.e., a, e, i, o, or u) from STATION. Your result cannot contain duplicates
city |
---|
Arlington |
Albany |
Upperco |
Aguanga |
Odin |
Query the list of CITY names from STATION which have vowels (i.e., a, e, i, o, and u) as both their first and last characters. Your result cannot contain duplicates
city |
---|
Acme |
Aguanga |
Alba |
Aliso Viejo |
Alpine |
Query the sum of Northern Latitudes (LAT_N) from STATION having values greater than 38.7880 and less than 137.2345. Truncate your answer to 4 decimal places
Query the smallest Northern Latitude (LAT_N) from STATION that is greater than 38.7780. Round your answer to 4 decimal places
Query the Manhattan Distance between points P1 and P2 and round it to a scale of 4 decimal places
Consider P1(a, b) and P2(c, d) to be two points on a 2D plane.
- happens to equal the minimum value in Northern Latitude (LAT_N in STATION).
- happens to equal the minimum value in Western Longitude (LONG_W in STATION).
- happens to equal the maximum value in Northern Latitude (LAT_N in STATION).
- happens to equal the maximum value in Western Longitude (LONG_W in STATION).
Query the Name of any student in STUDENTS who scored higher than Marks. Order your output by the last three characters of each name. If two or more students both have names ending in the same last three characters (i.e.: Bobby, Robby, etc.), secondary sort them by ascending ID
name |
---|
Stuart |
Kristeen |
Christene |
Amina |
Aamina |
Priya |
Heraldo |
Scarlet |
Julia |
Salma |
Britney |
Priyanka |
Samantha |
Vivek |
Belvet |
Devil |
Evil |