Exploring SQL Databases with DuckDB¶
Database¶
- A database is a structured collection of data.
- It allows for the storage, retrieval, modification, and deletion of data in an organized manner.
- Databases come in various types, including relational and NoSQL. Relational databases organize data into tables, which are interconnected.
- Each table represents a related data collection and is organized into rows and columns.
SQL stands for Structured Query Language.¶
It's a programming language that stores and processes information in relational databases.
Types of SQL Statements¶
- DDL (Data Definition Language): DDL statements are used to define, modify, and remove database structures, but not the data within them. Common DDL statements include CREATE, ALTER, and DROP.
- DML (Data Manipulation Language): DML statements manage data within schema objects. They include SELECT, INSERT, UPDATE, and DELETE.
DuckDB¶
- An in-memory, columnar database management system optimized for analytical queries."
- Open-source and designed for simplicity, speed, and efficiency in processing analytical workloads."
- Supports SQL standards for easy integration with existing tools and workflows.
Install DuckDB¶
With Python set up, you can now install DuckDB using pip. In your terminal or command line, execute:¶
pip install duckdb
To ensure DuckDB is installed correctly, launch your jupyter Notebook and try importing the DuckDB module:¶
pip install duckdb
If no errors occur, DuckDB is successfully installed and ready to use.¶
In your Jupyter Notebook, import duckdb by running¶
import duckdb
For this tutorial, we will be using a pre-configured Duckdb Database.¶
Let's download the database file:¶
!wget --content-disposition https://arizona.box.com/shared/static/uozg0z86rtdjupwpc7i971xwzuzhp42o.duckdb
Connect to the database using:¶
conn = duckdb.connect(database='/content/my_database.duckdb', read_only=True)
Viewing which tables are available inside the database¶
conn.sql("SHOW TABLES;")
┌───────────────────┐
│ name │
│ varchar │
├───────────────────┤
│ allergies │
│ careplans │
│ conditions │
│ devices │
│ encounters │
│ imaging_studies │
│ immunizations │
│ medications │
│ observations │
│ organizations │
│ patients │
│ payer_transitions │
│ payers │
│ procedures │
│ providers │
│ supplies │
├───────────────────┤
│ 16 rows │
└───────────────────┘
Before running any query, we need to know the columns inside particular tables¶
conn.sql("DESCRIBE patients;")
┌─────────────────────┬─────────────┬─────────┬─────────┬─────────┬───────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ int32 │
├─────────────────────┼─────────────┼─────────┼─────────┼─────────┼───────┤
│ Id │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ BIRTHDATE │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ DEATHDATE │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ SSN │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ DRIVERS │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ PASSPORT │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ PREFIX │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ FIRST │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ LAST │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ SUFFIX │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ · │ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │ · │
│ BIRTHPLACE │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ ADDRESS │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ CITY │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ STATE │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ COUNTY │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ ZIP │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ LAT │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ LON │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ HEALTHCARE_EXPENSES │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ HEALTHCARE_COVERAGE │ DOUBLE │ YES │ NULL │ NULL │ NULL │
├─────────────────────┴─────────────┴─────────┴─────────┴─────────┴───────┤
│ 25 rows (20 shown) 6 columns │
└─────────────────────────────────────────────────────────────────────────┘
conn.sql("DESCRIBE medications;")
conn.sql("DESCRIBE medications;")
┌───────────────────┬─────────────┬─────────┬─────────┬─────────┬───────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ int32 │
├───────────────────┼─────────────┼─────────┼─────────┼─────────┼───────┤
│ START │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ STOP │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ PATIENT │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ PAYER │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ ENCOUNTER │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ CODE │ BIGINT │ YES │ NULL │ NULL │ NULL │
│ DESCRIPTION │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ BASE_COST │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ PAYER_COVERAGE │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ DISPENSES │ BIGINT │ YES │ NULL │ NULL │ NULL │
│ TOTALCOST │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ REASONCODE │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ REASONDESCRIPTION │ VARCHAR │ YES │ NULL │ NULL │ NULL │
├───────────────────┴─────────────┴─────────┴─────────┴─────────┴───────┤
│ 13 rows 6 columns │
└───────────────────────────────────────────────────────────────────────┘
conn.sql("DESCRIBE immunizations;")
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬───────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ int32 │
├─────────────┼─────────────┼─────────┼─────────┼─────────┼───────┤
│ DATE │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ PATIENT │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ ENCOUNTER │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ CODE │ BIGINT │ YES │ NULL │ NULL │ NULL │
│ DESCRIPTION │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ BASE_COST │ DOUBLE │ YES │ NULL │ NULL │ NULL │
└─────────────┴─────────────┴─────────┴─────────┴─────────┴───────┘
Querying Data¶
SELECT - The SELECT statement selects data from a database.¶
E.g.: Query the patients table and display patient first name, last name, gender, and the city they live in.
conn.sql("SELECT FIRST, LAST, GENDER, CITY FROM patients;")
┌─────────────────┬────────────────┬─────────┬─────────────┐
│ FIRST │ LAST │ GENDER │ CITY │
│ varchar │ varchar │ varchar │ varchar │
├─────────────────┼────────────────┼─────────┼─────────────┤
│ Jacinto644 │ Kris249 │ M │ Springfield │
│ Alva958 │ Krajcik437 │ F │ Walpole │
│ Jimmie93 │ Harris789 │ F │ Pembroke │
│ Gregorio366 │ Auer97 │ M │ Boston │
│ Karyn217 │ Mueller846 │ F │ Colrain │
│ Jayson808 │ Fadel536 │ M │ Chicopee │
│ José Eduardo181 │ Gómez206 │ M │ Chicopee │
│ Milo271 │ Feil794 │ M │ Somerville │
│ Karyn217 │ Metz686 │ F │ Medfield │
│ Jeffrey461 │ Greenfelder433 │ M │ Springfield │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ Raymond398 │ Kuvalis369 │ M │ Hingham │
│ Gearldine455 │ Boyer713 │ F │ Boston │
│ Nichol11 │ Gleichner915 │ F │ Lynn │
│ Louvenia131 │ Marks830 │ F │ Springfield │
│ Raymon366 │ Beer512 │ M │ Springfield │
│ Camelia346 │ Stamm704 │ F │ Dedham │
│ William805 │ Pacocha935 │ M │ Brockton │
│ Guillermo498 │ Téllez750 │ M │ Somerville │
│ Milton509 │ Bailey598 │ M │ Hull │
│ Cecilia788 │ Wisozk929 │ F │ Worcester │
├─────────────────┴────────────────┴─────────┴─────────────┤
│ ? rows (>9999 rows, 20 shown) 4 columns │
└──────────────────────────────────────────────────────────┘
Count the number of rows in patients table¶
This query will count the total number of rows in the patients table, effectively giving you the total number of patients.¶
conn.sql("SELECT COUNT(*) FROM patients;")
┌──────────────┐
│ count_star() │
│ int64 │
├──────────────┤
│ 124150 │
└──────────────┘
COUNT and DISTINCT¶
If you want to count distinct values of a specific column, for example, distinct cities, you can modify the query as follows:
conn.sql("SELECT COUNT(DISTINCT CITY) FROM patients;")
┌──────────────────────┐
│ count(DISTINCT CITY) │
│ int64 │
├──────────────────────┤
│ 351 │
└──────────────────────┘
Filtering data based on a condition¶
The WHERE clause filters records that fulfill a specified condition. e.g: Patients in City = 'Springfield';
conn.sql("SELECT FIRST, LAST, GENDER, CITY, PASSPORT FROM patients WHERE city = 'Springfield';")
┌──────────────┬────────────────┬─────────┬─────────────┬────────────┐
│ FIRST │ LAST │ GENDER │ CITY │ PASSPORT │
│ varchar │ varchar │ varchar │ varchar │ varchar │
├──────────────┼────────────────┼─────────┼─────────────┼────────────┤
│ Jacinto644 │ Kris249 │ M │ Springfield │ NULL │
│ Jeffrey461 │ Greenfelder433 │ M │ Springfield │ NULL │
│ Sabina296 │ Flatley871 │ F │ Springfield │ X85058581X │
│ Theodora872 │ Johnson679 │ F │ Springfield │ X21164602X │
│ Lavera253 │ Anderson154 │ F │ Springfield │ X83686992X │
│ Golden321 │ Pollich983 │ F │ Springfield │ NULL │
│ Georgiann138 │ Greenfelder433 │ F │ Springfield │ X58134116X │
│ Fausto876 │ Bechtelar572 │ M │ Springfield │ NULL │
│ Talisha682 │ Brakus656 │ F │ Springfield │ X53645004X │
│ Golden321 │ Durgan499 │ F │ Springfield │ X49016634X │
│ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │
│ Carla633 │ Terán294 │ F │ Springfield │ X50212869X │
│ Ángela136 │ Muñiz642 │ F │ Springfield │ NULL │
│ Marceline716 │ Kuhlman484 │ F │ Springfield │ X70432308X │
│ Ana María762 │ Cotto891 │ F │ Springfield │ X4604857X │
│ Josefina523 │ Vanegas191 │ F │ Springfield │ X85396222X │
│ Ester635 │ Sevilla788 │ F │ Springfield │ X50601786X │
│ Tomás404 │ Galarza986 │ M │ Springfield │ X76790663X │
│ Arturo47 │ Delafuente833 │ M │ Springfield │ X45087804X │
│ Gilberto712 │ Martínez540 │ M │ Springfield │ X89729075X │
│ Juanita470 │ Connelly992 │ F │ Springfield │ X71156217X │
├──────────────┴────────────────┴─────────┴─────────────┴────────────┤
│ 2814 rows (20 shown) 5 columns │
└────────────────────────────────────────────────────────────────────┘
Multiple conditions in Selection¶
e.g 1: Female Patients in City = 'Springfield';
conn.sql("SELECT FIRST, LAST, GENDER, CITY FROM patients WHERE city = 'Springfield' AND gender = 'F';")
conn.sql("SELECT FIRST, LAST, GENDER, CITY FROM patients WHERE city = 'Springfield' AND gender = 'F';")
┌──────────────┬────────────────┬─────────┬─────────────┐
│ FIRST │ LAST │ GENDER │ CITY │
│ varchar │ varchar │ varchar │ varchar │
├──────────────┼────────────────┼─────────┼─────────────┤
│ Sabina296 │ Flatley871 │ F │ Springfield │
│ Theodora872 │ Johnson679 │ F │ Springfield │
│ Lavera253 │ Anderson154 │ F │ Springfield │
│ Golden321 │ Pollich983 │ F │ Springfield │
│ Georgiann138 │ Greenfelder433 │ F │ Springfield │
│ Talisha682 │ Brakus656 │ F │ Springfield │
│ Golden321 │ Durgan499 │ F │ Springfield │
│ Jerrie417 │ Gislason620 │ F │ Springfield │
│ Venus149 │ Hodkiewicz467 │ F │ Springfield │
│ Refugia211 │ Wintheiser220 │ F │ Springfield │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ Tisha655 │ Renner328 │ F │ Springfield │
│ Mona85 │ Senger904 │ F │ Springfield │
│ Isabel214 │ Camacho176 │ F │ Springfield │
│ Carla633 │ Terán294 │ F │ Springfield │
│ Ángela136 │ Muñiz642 │ F │ Springfield │
│ Marceline716 │ Kuhlman484 │ F │ Springfield │
│ Ana María762 │ Cotto891 │ F │ Springfield │
│ Josefina523 │ Vanegas191 │ F │ Springfield │
│ Ester635 │ Sevilla788 │ F │ Springfield │
│ Juanita470 │ Connelly992 │ F │ Springfield │
├──────────────┴────────────────┴─────────┴─────────────┤
│ 1351 rows (20 shown) 4 columns │
└───────────────────────────────────────────────────────┘
Nested selection¶
e.g. Female patients from City Springfield or Boston
conn.sql("SELECT FIRST, LAST, GENDER, CITY FROM patients WHERE gender = 'F' AND (city = 'Springfield' OR city = 'Boston');")
┌──────────────┬────────────────┬─────────┬─────────────┐
│ FIRST │ LAST │ GENDER │ CITY │
│ varchar │ varchar │ varchar │ varchar │
├──────────────┼────────────────┼─────────┼─────────────┤
│ Sabina296 │ Flatley871 │ F │ Springfield │
│ Theodora872 │ Johnson679 │ F │ Springfield │
│ Lavera253 │ Anderson154 │ F │ Springfield │
│ Golden321 │ Pollich983 │ F │ Springfield │
│ Georgiann138 │ Greenfelder433 │ F │ Springfield │
│ Talisha682 │ Brakus656 │ F │ Springfield │
│ Golden321 │ Durgan499 │ F │ Springfield │
│ Jerrie417 │ Gislason620 │ F │ Springfield │
│ Venus149 │ Hodkiewicz467 │ F │ Springfield │
│ Refugia211 │ Wintheiser220 │ F │ Springfield │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ Freida957 │ Hand679 │ F │ Boston │
│ Clora637 │ Rempel203 │ F │ Boston │
│ Bonny428 │ Turner526 │ F │ Boston │
│ Tien590 │ Gaylord332 │ F │ Boston │
│ Arlyne429 │ Deckow585 │ F │ Boston │
│ Doreen575 │ Johnson679 │ F │ Boston │
│ Kym935 │ Hayes766 │ F │ Boston │
│ Julia241 │ Nevárez403 │ F │ Boston │
│ Ja391 │ Murray856 │ F │ Boston │
│ Creola518 │ Spinka232 │ F │ Boston │
├──────────────┴────────────────┴─────────┴─────────────┤
│ 7191 rows (20 shown) 4 columns │
└───────────────────────────────────────────────────────┘
Alternative¶
conn.sql("SELECT FIRST, LAST, GENDER, CITY FROM patients WHERE gender = 'F' AND (city IN ('Springfield','Boston'));")
┌──────────────┬────────────────┬─────────┬─────────────┐
│ FIRST │ LAST │ GENDER │ CITY │
│ varchar │ varchar │ varchar │ varchar │
├──────────────┼────────────────┼─────────┼─────────────┤
│ Sabina296 │ Flatley871 │ F │ Springfield │
│ Theodora872 │ Johnson679 │ F │ Springfield │
│ Lavera253 │ Anderson154 │ F │ Springfield │
│ Golden321 │ Pollich983 │ F │ Springfield │
│ Georgiann138 │ Greenfelder433 │ F │ Springfield │
│ Talisha682 │ Brakus656 │ F │ Springfield │
│ Golden321 │ Durgan499 │ F │ Springfield │
│ Jerrie417 │ Gislason620 │ F │ Springfield │
│ Venus149 │ Hodkiewicz467 │ F │ Springfield │
│ Refugia211 │ Wintheiser220 │ F │ Springfield │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ Freida957 │ Hand679 │ F │ Boston │
│ Clora637 │ Rempel203 │ F │ Boston │
│ Bonny428 │ Turner526 │ F │ Boston │
│ Tien590 │ Gaylord332 │ F │ Boston │
│ Arlyne429 │ Deckow585 │ F │ Boston │
│ Doreen575 │ Johnson679 │ F │ Boston │
│ Kym935 │ Hayes766 │ F │ Boston │
│ Julia241 │ Nevárez403 │ F │ Boston │
│ Ja391 │ Murray856 │ F │ Boston │
│ Creola518 │ Spinka232 │ F │ Boston │
├──────────────┴────────────────┴─────────┴─────────────┤
│ 7191 rows (20 shown) 4 columns │
└───────────────────────────────────────────────────────┘
Filter on missing data¶
- Filtering on missing data is crucial for maintaining data integrity and ensuring accurate analysis by identifying and handling incomplete records effectively.
- Retrieve the first name, last name, gender, and city for all patients who are from Springfield and have a passport number recorded (i.e., the passport field is not empty).
conn.sql("SELECT FIRST, LAST, GENDER, CITY, PASSPORT FROM patients WHERE city = 'Springfield' AND PASSPORT IS NOT NULL;")
┌──────────────┬────────────────┬─────────┬─────────────┬────────────┐
│ FIRST │ LAST │ GENDER │ CITY │ PASSPORT │
│ varchar │ varchar │ varchar │ varchar │ varchar │
├──────────────┼────────────────┼─────────┼─────────────┼────────────┤
│ Sabina296 │ Flatley871 │ F │ Springfield │ X85058581X │
│ Theodora872 │ Johnson679 │ F │ Springfield │ X21164602X │
│ Lavera253 │ Anderson154 │ F │ Springfield │ X83686992X │
│ Georgiann138 │ Greenfelder433 │ F │ Springfield │ X58134116X │
│ Talisha682 │ Brakus656 │ F │ Springfield │ X53645004X │
│ Golden321 │ Durgan499 │ F │ Springfield │ X49016634X │
│ Ty725 │ Schmeler639 │ M │ Springfield │ X22960735X │
│ Pilar644 │ Pouros728 │ F │ Springfield │ X21434326X │
│ Georgette866 │ Stark857 │ F │ Springfield │ X84034866X │
│ Annika454 │ Gutmann970 │ F │ Springfield │ X3275916X │
│ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │
│ Isabel214 │ Camacho176 │ F │ Springfield │ X52173808X │
│ Carla633 │ Terán294 │ F │ Springfield │ X50212869X │
│ Marceline716 │ Kuhlman484 │ F │ Springfield │ X70432308X │
│ Ana María762 │ Cotto891 │ F │ Springfield │ X4604857X │
│ Josefina523 │ Vanegas191 │ F │ Springfield │ X85396222X │
│ Ester635 │ Sevilla788 │ F │ Springfield │ X50601786X │
│ Tomás404 │ Galarza986 │ M │ Springfield │ X76790663X │
│ Arturo47 │ Delafuente833 │ M │ Springfield │ X45087804X │
│ Gilberto712 │ Martínez540 │ M │ Springfield │ X89729075X │
│ Juanita470 │ Connelly992 │ F │ Springfield │ X71156217X │
├──────────────┴────────────────┴─────────┴─────────────┴────────────┤
│ 2221 rows (20 shown) 5 columns │
└────────────────────────────────────────────────────────────────────┘
Filter and select in numeric range¶
Patients in City = 'Springfield' where the HEALTHCARE_EXPENSES between 1.5M and 2M
conn.sql("SELECT FIRST, LAST, GENDER, CITY FROM patients WHERE city = 'Springfield' AND HEALTHCARE_EXPENSES BETWEEN 1500000 AND 2000000;")
┌──────────────┬────────────────┬─────────┬─────────────┐
│ FIRST │ LAST │ GENDER │ CITY │
│ varchar │ varchar │ varchar │ varchar │
├──────────────┼────────────────┼─────────┼─────────────┤
│ Talisha682 │ Brakus656 │ F │ Springfield │
│ Jerold208 │ Harber290 │ M │ Springfield │
│ Dean966 │ Tillman293 │ M │ Springfield │
│ Orval846 │ Cartwright189 │ M │ Springfield │
│ Jacinto644 │ Abernathy524 │ M │ Springfield │
│ Bethel526 │ Satterfield305 │ F │ Springfield │
│ Dorene845 │ Botsford977 │ F │ Springfield │
│ Lula998 │ Langosh790 │ F │ Springfield │
│ Deeanna316 │ Koss676 │ F │ Springfield │
│ Loriann967 │ Torp761 │ F │ Springfield │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ Ashley34 │ Murazik203 │ M │ Springfield │
│ Ulysses632 │ Donnelly343 │ M │ Springfield │
│ Horace32 │ Hammes673 │ M │ Springfield │
│ Roman389 │ Lubowitz58 │ M │ Springfield │
│ Jerald662 │ Grady603 │ M │ Springfield │
│ Damien170 │ Hoppe518 │ M │ Springfield │
│ Charley358 │ Vandervort697 │ M │ Springfield │
│ Carla633 │ Terán294 │ F │ Springfield │
│ Ana María762 │ Cotto891 │ F │ Springfield │
│ Juanita470 │ Connelly992 │ F │ Springfield │
├──────────────┴────────────────┴─────────┴─────────────┤
│ 241 rows (20 shown) 4 columns │
└───────────────────────────────────────────────────────┘
LIMIT¶
- LIMIT specifies the maximum number of records the query will return.
conn.sql("SELECT FIRST, LAST, GENDER, CITY FROM patients LIMIT 20;")
conn.sql("SELECT FIRST, LAST, GENDER, CITY FROM patients LIMIT 20;")
┌─────────────────┬────────────────┬─────────┬─────────────┐
│ FIRST │ LAST │ GENDER │ CITY │
│ varchar │ varchar │ varchar │ varchar │
├─────────────────┼────────────────┼─────────┼─────────────┤
│ Jacinto644 │ Kris249 │ M │ Springfield │
│ Alva958 │ Krajcik437 │ F │ Walpole │
│ Jimmie93 │ Harris789 │ F │ Pembroke │
│ Gregorio366 │ Auer97 │ M │ Boston │
│ Karyn217 │ Mueller846 │ F │ Colrain │
│ Jayson808 │ Fadel536 │ M │ Chicopee │
│ José Eduardo181 │ Gómez206 │ M │ Chicopee │
│ Milo271 │ Feil794 │ M │ Somerville │
│ Karyn217 │ Metz686 │ F │ Medfield │
│ Jeffrey461 │ Greenfelder433 │ M │ Springfield │
│ Mariana775 │ Gulgowski816 │ F │ Lowell │
│ Leann224 │ Deckow585 │ F │ Needham │
│ Isabel214 │ Hinojosa147 │ F │ Fall River │
│ Christal240 │ Brown30 │ F │ Boston │
│ Carmelia328 │ Konopelski743 │ F │ Ashland │
│ Raye931 │ Wyman904 │ F │ Quincy │
│ Lisbeth69 │ Rowe323 │ F │ Malden │
│ Amada498 │ Spinka232 │ F │ Foxborough │
│ Cythia210 │ Reichel38 │ F │ Peabody │
│ María Soledad68 │ Aparicio848 │ F │ Boston │
├─────────────────┴────────────────┴─────────┴─────────────┤
│ 20 rows 4 columns │
└──────────────────────────────────────────────────────────┘
ORDER BY and ASC¶
This query returns the first 20 patients from the patients table, ordered alphabetically by their last name.
conn.sql("SELECT FIRST, LAST, GENDER, CITY FROM patients ORDER BY LAST ASC LIMIT 20;")
┌────────────────┬───────────┬─────────┬──────────────┐
│ FIRST │ LAST │ GENDER │ CITY │
│ varchar │ varchar │ varchar │ varchar │
├────────────────┼───────────┼─────────┼──────────────┤
│ Jc393 │ Abbott774 │ M │ Westfield │
│ Lloyd546 │ Abbott774 │ M │ Northampton │
│ Charles364 │ Abbott774 │ M │ Brockton │
│ Lelia627 │ Abbott774 │ F │ Gloucester │
│ Devorah937 │ Abbott774 │ F │ Cambridge │
│ Warren653 │ Abbott774 │ M │ Tyngsborough │
│ Rhona164 │ Abbott774 │ F │ Stoneham │
│ Jackqueline794 │ Abbott774 │ F │ Randolph │
│ Willa615 │ Abbott774 │ F │ Dartmouth │
│ Cyril535 │ Abbott774 │ M │ Millis │
│ Lorette239 │ Abbott774 │ F │ Dennis │
│ Jimmy858 │ Abbott774 │ M │ Lowell │
│ Laine739 │ Abbott774 │ F │ Agawam │
│ Bernetta267 │ Abbott774 │ F │ Ware │
│ Lauri399 │ Abbott774 │ F │ Springfield │
│ Miesha237 │ Abbott774 │ F │ Stoneham │
│ Darrin898 │ Abbott774 │ M │ Newton │
│ Grant908 │ Abbott774 │ M │ Arlington │
│ Arden380 │ Abbott774 │ M │ Worcester │
│ German382 │ Abbott774 │ M │ Taunton │
├────────────────┴───────────┴─────────┴──────────────┤
│ 20 rows 4 columns │
└─────────────────────────────────────────────────────┘
ORDER BY and DESC¶
This query returns the first 20 patients from the patients table, ordered reverse alphabetically by their first name.
conn.sql("SELECT FIRST, LAST, GENDER, CITY FROM patients ORDER BY FIRST DESC LIMIT 20;")
┌──────────┬────────────────┬─────────┬────────────┐
│ FIRST │ LAST │ GENDER │ CITY │
│ varchar │ varchar │ varchar │ varchar │
├──────────┼────────────────┼─────────┼────────────┤
│ Óscar156 │ Ballesteros368 │ M │ Taunton │
│ Óscar156 │ Curiel392 │ M │ Swampscott │
│ Óscar156 │ Zelaya592 │ M │ Lynn │
│ Óscar156 │ Puente961 │ M │ Greenfield │
│ Óscar156 │ Olivas524 │ M │ Norwell │
│ Óscar156 │ Rivero165 │ M │ Lancaster │
│ Óscar156 │ Canales95 │ M │ Hamilton │
│ Óscar156 │ Romero158 │ M │ Boston │
│ Óscar156 │ Garza151 │ M │ Longmeadow │
│ Óscar156 │ Delgado712 │ M │ Haverhill │
│ Óscar156 │ Ureña88 │ M │ Bedford │
│ Óscar156 │ Henríquez109 │ M │ Boxborough │
│ Óscar156 │ Guerrero997 │ M │ Boston │
│ Óscar156 │ Ojeda263 │ M │ Boston │
│ Óscar156 │ Meléndez48 │ M │ Newton │
│ Óscar156 │ Muro989 │ M │ Boston │
│ Óscar156 │ Rendón540 │ M │ Wilbraham │
│ Óscar156 │ Santacruz647 │ M │ Winthrop │
│ Óscar156 │ Santacruz647 │ M │ Boston │
│ Óscar156 │ Otero621 │ M │ Boston │
├──────────┴────────────────┴─────────┴────────────┤
│ 20 rows 4 columns │
└──────────────────────────────────────────────────┘
Aggregating Data using GROUP BY¶
Counting Patients by City¶
conn.sql("SELECT CITY, COUNT(*) AS patient_count FROM patients GROUP BY CITY;") ## each run gives random order
┌─────────────┬───────────────┐
│ CITY │ patient_count │
│ varchar │ int64 │
├─────────────┼───────────────┤
│ Dartmouth │ 656 │
│ Fitchburg │ 743 │
│ Plymouth │ 972 │
│ Worcester │ 3263 │
│ Beverly │ 778 │
│ Westfield │ 781 │
│ Hingham │ 431 │
│ Rochester │ 103 │
│ Sandwich │ 387 │
│ Watertown │ 534 │
│ · │ · │
│ · │ · │
│ · │ · │
│ Ashfield │ 29 │
│ Pelham │ 26 │
│ Hancock │ 22 │
│ Monterey │ 13 │
│ Richmond │ 23 │
│ Hinsdale │ 30 │
│ Middlefield │ 11 │
│ Westhampton │ 14 │
│ Sandisfield │ 12 │
│ Monroe │ 2 │
├─────────────┴───────────────┤
│ 351 rows (20 shown) │
└─────────────────────────────┘
Counting Patients by City and sorting them highest to lowest¶
conn.sql("SELECT CITY, COUNT(*) AS patient_count FROM patients GROUP BY CITY ORDER BY patient_count DESC;")
┌─────────────┬───────────────┐
│ CITY │ patient_count │
│ varchar │ int64 │
├─────────────┼───────────────┤
│ Boston │ 11496 │
│ Worcester │ 3263 │
│ Springfield │ 2814 │
│ Cambridge │ 2044 │
│ Lowell │ 2027 │
│ Brockton │ 1833 │
│ Lynn │ 1714 │
│ New Bedford │ 1706 │
│ Quincy │ 1698 │
│ Newton │ 1695 │
│ · │ · │
│ · │ · │
│ · │ · │
│ Blandford │ 10 │
│ Heath │ 8 │
│ Tolland │ 7 │
│ Hawley │ 6 │
│ Alford │ 5 │
│ New Ashford │ 4 │
│ Aquinnah │ 4 │
│ Tyringham │ 3 │
│ Monroe │ 2 │
│ Gosnold │ 1 │
├─────────────┴───────────────┤
│ 351 rows (20 shown) │
└─────────────────────────────┘
Total Number of Patients by Gender in Each City¶
conn.sql("SELECT CITY, GENDER, COUNT(*) AS total_patients FROM patients GROUP BY CITY, GENDER ORDER BY total_patients DESC LIMIT 10;")
┌─────────────┬─────────┬────────────────┐
│ CITY │ GENDER │ total_patients │
│ varchar │ varchar │ int64 │
├─────────────┼─────────┼────────────────┤
│ Boston │ F │ 5840 │
│ Boston │ M │ 5656 │
│ Worcester │ M │ 1694 │
│ Worcester │ F │ 1569 │
│ Springfield │ M │ 1463 │
│ Springfield │ F │ 1351 │
│ Cambridge │ M │ 1023 │
│ Cambridge │ F │ 1021 │
│ Lowell │ F │ 1015 │
│ Lowell │ M │ 1012 │
├─────────────┴─────────┴────────────────┤
│ 10 rows 3 columns │
└────────────────────────────────────────┘
conn.sql("SELECT PATIENT,ENCOUNTER,CODE, DESCRIPTION, VALUE from observations;")
┌──────────────────────┬──────────────────────┬─────────┬───────────────────────────────────────────────┬──────────────┐
│ PATIENT │ ENCOUNTER │ CODE │ DESCRIPTION │ VALUE │
│ varchar │ varchar │ varchar │ varchar │ varchar │
├──────────────────────┼──────────────────────┼─────────┼───────────────────────────────────────────────┼──────────────┤
│ 1ff7f10f-a204-4bb1… │ 52051c30-c6c3-45fe… │ 8302-2 │ Body Height │ 82.7 │
│ 1ff7f10f-a204-4bb1… │ 52051c30-c6c3-45fe… │ 72514-3 │ Pain severity - 0-10 verbal numeric rating … │ 2.0 │
│ 1ff7f10f-a204-4bb1… │ 52051c30-c6c3-45fe… │ 29463-7 │ Body Weight │ 11.5 │
│ 1ff7f10f-a204-4bb1… │ 52051c30-c6c3-45fe… │ 77606-2 │ Weight-for-length Per age and sex │ 47.0 │
│ 1ff7f10f-a204-4bb1… │ 52051c30-c6c3-45fe… │ 9843-4 │ Head Occipital-frontal circumference │ 46.9 │
│ 1ff7f10f-a204-4bb1… │ 52051c30-c6c3-45fe… │ 8462-4 │ Diastolic Blood Pressure │ 76.0 │
│ 1ff7f10f-a204-4bb1… │ 52051c30-c6c3-45fe… │ 8480-6 │ Systolic Blood Pressure │ 107.0 │
│ 1ff7f10f-a204-4bb1… │ 52051c30-c6c3-45fe… │ 8867-4 │ Heart rate │ 68.0 │
│ 1ff7f10f-a204-4bb1… │ 52051c30-c6c3-45fe… │ 9279-1 │ Respiratory rate │ 13.0 │
│ 1ff7f10f-a204-4bb1… │ 52051c30-c6c3-45fe… │ 72166-2 │ Tobacco smoking status NHIS │ Never smoker │
│ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │
│ 87537cb1-92e1-4a11… │ 69e8eaf0-7714-4c20… │ 2160-0 │ Creatinine [Mass/volume] in Serum or Plasma │ 2.9 │
│ 87537cb1-92e1-4a11… │ 69e8eaf0-7714-4c20… │ 17861-6 │ Calcium [Mass/volume] in Serum or Plasma │ 9.2 │
│ 87537cb1-92e1-4a11… │ 69e8eaf0-7714-4c20… │ 2951-2 │ Sodium [Moles/volume] in Serum or Plasma │ 143.8 │
│ 87537cb1-92e1-4a11… │ 69e8eaf0-7714-4c20… │ 2823-3 │ Potassium [Moles/volume] in Serum or Plasma │ 4.3 │
│ 87537cb1-92e1-4a11… │ 69e8eaf0-7714-4c20… │ 2075-0 │ Chloride [Moles/volume] in Serum or Plasma │ 106.4 │
│ 87537cb1-92e1-4a11… │ 69e8eaf0-7714-4c20… │ 2028-9 │ Carbon dioxide total [Moles/volume] in Ser… │ 22.9 │
│ 87537cb1-92e1-4a11… │ 69e8eaf0-7714-4c20… │ 33914-3 │ Glomerular filtration rate/1.73 sq M.predic… │ 9.9 │
│ 87537cb1-92e1-4a11… │ 69e8eaf0-7714-4c20… │ 2885-2 │ Protein [Mass/volume] in Serum or Plasma │ 5.7 │
│ 87537cb1-92e1-4a11… │ 69e8eaf0-7714-4c20… │ 1751-7 │ Albumin [Mass/volume] in Serum or Plasma │ 5.2 │
│ 87537cb1-92e1-4a11… │ 69e8eaf0-7714-4c20… │ 1975-2 │ Bilirubin.total [Mass/volume] in Serum or P… │ 14.2 │
├──────────────────────┴──────────────────────┴─────────┴───────────────────────────────────────────────┴──────────────┤
│ ? rows (>9999 rows, 20 shown) 5 columns │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Average Heart Rate by Patient¶
conn.sql("SELECT PATIENT, ROUND(AVG(CAST(VALUE AS DECIMAL(10,6))), 2) AS average_heart_rate FROM observations WHERE DESCRIPTION LIKE '%Heart rate%' GROUP BY PATIENT ORDER BY average_heart_rate DESC LIMIT 20;")
┌──────────────────────────────────────┬────────────────────┐
│ PATIENT │ average_heart_rate │
│ varchar │ double │
├──────────────────────────────────────┼────────────────────┤
│ ff2209f2-a4c9-4737-9aa7-adc8d71b6961 │ 200.0 │
│ 8cea2d7e-6227-4924-a340-18eafb8564ac │ 200.0 │
│ 6ec823e7-839e-4491-8b6f-6b275951b456 │ 200.0 │
│ 0191f8aa-96dc-41e6-b718-312686a7a867 │ 200.0 │
│ c10db4e1-de51-495f-98ab-8322c6d9550a │ 200.0 │
│ 46ea7226-63f6-404f-bbed-904ea39f706d │ 199.9 │
│ ec1ce32c-57a3-44c4-8f64-e050f2f94e03 │ 199.9 │
│ a05e3bd1-1c51-453a-9969-646cb0168d23 │ 199.9 │
│ 9cc52681-4446-4e4a-80be-63418dd06e66 │ 199.9 │
│ b0112890-a24a-4564-b2b3-9b77879e5e79 │ 199.9 │
│ 18f53179-2449-47af-b4da-a9e51095076f │ 199.9 │
│ c3d165d2-597e-45a8-8331-f3338ce19fdf │ 199.9 │
│ 04dd3439-2116-4f9a-b40e-a84395cf6f21 │ 199.9 │
│ 371e14a1-31bd-4407-bdbb-4078781fde05 │ 199.8 │
│ 3e5c63bf-65fe-48bb-8361-6b21a909888a │ 199.8 │
│ 423f79cf-f7b8-41cd-bb3c-8459084f88eb │ 199.8 │
│ 2d4f57ee-4001-44f8-9a94-c1885f26b408 │ 199.8 │
│ bd6784c0-3529-4a12-87bd-746affb17739 │ 199.8 │
│ ef43848b-a4f1-4c45-a106-5928e29b72f2 │ 199.7 │
│ 1d050245-097c-44a5-b29b-6a18a22731e9 │ 199.7 │
├──────────────────────────────────────┴────────────────────┤
│ 20 rows 2 columns │
└───────────────────────────────────────────────────────────┘
Count of Heart Rate Records by Patient¶
conn.sql("SELECT PATIENT, COUNT(*) AS heart_rate_records FROM observations WHERE DESCRIPTION LIKE '%Heart rate%' GROUP BY PATIENT ORDER BY heart_rate_records DESC LIMIT 20;")
┌──────────────────────────────────────┬────────────────────┐
│ PATIENT │ heart_rate_records │
│ varchar │ int64 │
├──────────────────────────────────────┼────────────────────┤
│ 3e21a156-da54-4fb7-815e-550fdf4afbbd │ 33 │
│ ffbf0392-1643-4b05-819f-489072c8c4d4 │ 30 │
│ b9ef6005-438e-4b47-afaf-9dba32184adf │ 30 │
│ f8080701-2b5c-4128-9a7f-9a098b737b27 │ 30 │
│ a2172279-3d63-4d14-867d-1a39d9280690 │ 30 │
│ 01a5a5a6-ef7b-42ba-899c-de66e1b1e27e │ 30 │
│ 62572c44-a802-40d1-8d60-d02bc287d548 │ 30 │
│ 445953fd-15fa-424e-9926-f93ebf1bca7a │ 30 │
│ c3ea3c46-f8d1-4abe-8fb2-8fc8f86b4ef4 │ 29 │
│ bd18ba0d-2e65-4427-98f1-cbfc09dbaa33 │ 29 │
│ 5736b489-0e15-4693-b9d6-05192a702a2d │ 29 │
│ b5a0b060-96ae-44d5-bd10-f532a445009b │ 29 │
│ ee9b44a1-36c6-456e-b77c-cd200611ca0f │ 29 │
│ 0d5290dd-a3e3-4a81-91d5-6e722e58ec3e │ 29 │
│ 54ec8b8e-fe2a-4cf5-a087-1a65df1ab1b8 │ 29 │
│ e658626d-0f72-4570-8071-2dd63827a24c │ 29 │
│ ec3f3da2-09a1-43c7-8cae-ed34847d2534 │ 28 │
│ c7bd54b1-e036-4702-9062-acfa3677a6e8 │ 28 │
│ 8b6ee161-eb5f-413a-8e86-f5b3adb3f5ff │ 28 │
│ 03c1210e-d145-4aed-9522-14cff41b3c28 │ 28 │
├──────────────────────────────────────┴────────────────────┤
│ 20 rows 2 columns │
└───────────────────────────────────────────────────────────┘
Total Cost of Vaccinations per Patient¶
conn.sql("SELECT PATIENT, SUM(BASE_COST) AS total_cost FROM immunizations GROUP BY PATIENT ORDER BY total_cost DESC;")
┌──────────────────────────────────────┬────────────┐
│ PATIENT │ total_cost │
│ varchar │ double │
├──────────────────────────────────────┼────────────┤
│ 8eacdc17-de2f-4024-88bd-976401b30979 │ 2810.4 │
│ bb14de6a-c77d-44bb-a3a8-338a6a246520 │ 2810.4 │
│ a5b60d43-9776-4811-badf-67d49edbc175 │ 2810.4 │
│ 1c8a4026-5dbe-478a-b647-ecb605775977 │ 2810.4 │
│ 3e38e20b-6dd8-44fe-9ed3-81e1e3d8e2ab │ 2810.4 │
│ e23c7c68-0319-41f5-90d9-1db160321a94 │ 2810.4 │
│ 49e7d878-93a6-4e71-971b-213abacf2235 │ 2810.4 │
│ 7ded4116-1783-4a93-9bb2-7a65370f55f5 │ 2810.4 │
│ 594660e4-6a02-4e3c-a8b9-2c61d4d4af60 │ 2810.4 │
│ fee1edbd-277d-4b12-901e-60b6135cf877 │ 2810.4 │
│ · │ · │
│ · │ · │
│ · │ · │
│ 594d1ecb-b1a4-4bad-9ccd-5cebceebd6f5 │ 281.04 │
│ 456fa9f9-7eb7-4316-a74b-5a9ff87f8fc5 │ 281.04 │
│ 52568520-7163-429d-bc90-fc245debb697 │ 281.04 │
│ 30a16a19-ee93-4bfd-b1ed-bba90de36266 │ 281.04 │
│ 0ef633b0-2996-44c5-936c-600820310223 │ 281.04 │
│ 613319c8-2b76-494b-86b8-e776a2ee5e0b │ 281.04 │
│ 4dceeb57-4fe3-4c32-a748-621df5ce3c30 │ 281.04 │
│ 93d590e8-9f3e-4741-bee2-9b8e9e8b21e7 │ 281.04 │
│ bad20ffc-48f9-4a7d-a2be-39a50fa4e0ac │ 281.04 │
│ 3cf75ec3-5e1c-4783-bb20-282a8cf66f30 │ 281.04 │
├──────────────────────────────────────┴────────────┤
│ ? rows (>9999 rows, 20 shown) 2 columns │
└───────────────────────────────────────────────────┘
Top 10 Most Common Vaccine Administered¶
conn.sql("SELECT DESCRIPTION, COUNT(*) AS count FROM immunizations GROUP BY DESCRIPTION ORDER BY count DESC LIMIT 10;")
┌────────────────────────────────────────────────────┬────────┐
│ DESCRIPTION │ count │
│ varchar │ int64 │
├────────────────────────────────────────────────────┼────────┤
│ Influenza seasonal injectable preservative free │ 106564 │
│ Td (adult) preservative free │ 9815 │
│ Pneumococcal conjugate PCV 13 │ 5747 │
│ DTaP │ 5735 │
│ IPV │ 4962 │
│ meningococcal MCV4P │ 4010 │
│ Hib (PRP-OMP) │ 3615 │
│ HPV quadrivalent │ 3494 │
│ Hep B adolescent or pediatric │ 3490 │
│ zoster │ 3469 │
├────────────────────────────────────────────────────┴────────┤
│ 10 rows 2 columns │
└─────────────────────────────────────────────────────────────┘
JOINS¶
SQL joins are used to combine rows from two or more tables, based on a related column between them. There are several types of joins in SQL, each serving different purposes depending on how you want to combine your data.¶
- credit: educba.com
For the following examples: Left table: Patients and Right Table: Immunizations
__
INNER JOIN¶
-
The Inner join returns rows when there is at least one match in both tables. If there is no match, the rows are not returned.
-
Find Matching Records: Question: Which medical treatments have been administered to patients, including the patient's name and the cost of each treatment?
conn.sql("SELECT patients.FIRST, patients.LAST, immunizations.DESCRIPTION, immunizations.BASE_COST FROM immunizations INNER JOIN patients ON immunizations.PATIENT = patients.Id ORDER BY patients.FIRST ASC;")
┌──────────┬───────────────┬────────────────────────────────────────────────────┬───────────┐ │ FIRST │ LAST │ DESCRIPTION │ BASE_COST │ │ varchar │ varchar │ varchar │ double │ ├──────────┼───────────────┼────────────────────────────────────────────────────┼───────────┤ │ Aaron697 │ Cummings51 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Aaron697 │ Crooks415 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Aaron697 │ Anderson154 │ Pneumococcal conjugate PCV 13 │ 140.52 │ │ Aaron697 │ Hartmann983 │ Hep B adolescent or pediatric │ 140.52 │ │ Aaron697 │ Thompson596 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Aaron697 │ Rodriguez71 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Aaron697 │ Deckow585 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Aaron697 │ Swaniawski813 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Aaron697 │ Beer512 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Aaron697 │ Ullrich385 │ Influenza seasonal injectable preservative free │ 140.52 │ │ · │ · │ · │ · │ │ · │ · │ · │ · │ │ · │ · │ · │ · │ │ Ariel183 │ Schuster709 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Ariel183 │ Mohr916 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Ariel183 │ Waters156 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Ariel183 │ Will178 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Ariel183 │ Leffler128 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Ariel183 │ Bode78 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Ariel183 │ Grady603 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Ariel183 │ Funk324 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Ariel183 │ McGlynn426 │ Td (adult) preservative free │ 140.52 │ │ Ariel183 │ Boehm581 │ Influenza seasonal injectable preservative free │ 140.52 │ ├──────────┴───────────────┴────────────────────────────────────────────────────┴───────────┤ │ ? rows (>9999 rows, 20 shown) 4 columns │ └───────────────────────────────────────────────────────────────────────────────────────────┘
LEFT JOIN¶
- It returns all rows from the left table, and the matched rows from the right table. The result is NULL from the right side, if there is no match.
Question: For all patients, what treatments have they received, if any?
conn.sql("SELECT patients.FIRST, patients.LAST, immunizations.DESCRIPTION, immunizations.BASE_COST FROM patients LEFT JOIN immunizations ON patients.Id = immunizations.PATIENT ORDER BY patients.FIRST ASC;")
conn.sql("SELECT patients.FIRST, patients.LAST, immunizations.DESCRIPTION, immunizations.BASE_COST FROM patients LEFT JOIN immunizations ON patients.Id = immunizations.PATIENT ORDER BY patients.FIRST ASC;") ┌───────────┬─────────────┬────────────────────────────────────────────────────┬───────────┐ │ FIRST │ LAST │ DESCRIPTION │ BASE_COST │ │ varchar │ varchar │ varchar │ double │ ├───────────┼─────────────┼────────────────────────────────────────────────────┼───────────┤ │ Aaron697 │ Legros616 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Aaron697 │ Deckow585 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Aaron697 │ Thompson596 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Aaron697 │ Auer97 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Aaron697 │ Crooks415 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Aaron697 │ Crooks415 │ Td (adult) preservative free │ 140.52 │ │ Aaron697 │ Volkman526 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Aaron697 │ Hartmann983 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Aaron697 │ Orn563 │ NULL │ NULL │ │ Aaron697 │ Hartmann983 │ IPV │ 140.52 │ │ · │ · │ · │ · │ │ · │ · │ · │ · │ │ · │ · │ · │ · │ │ Antonia30 │ Goodwin327 │ HPV quadrivalent │ 140.52 │ │ Antonia30 │ Funk324 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Antonia30 │ West559 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Antonia30 │ Lowe577 │ NULL │ NULL │ │ Antonia30 │ D'Amore443 │ Pneumococcal conjugate PCV 13 │ 140.52 │ │ Antonia30 │ Montes106 │ meningococcal MCV4P │ 140.52 │ │ Antonia30 │ González124 │ Influenza seasonal injectable preservative free │ 140.52 │ │ Antonia30 │ Franecki195 │ zoster │ 140.52 │ │ Antonia30 │ Rojo930 │ Pneumococcal conjugate PCV 13 │ 140.52 │ │ Antonia30 │ Puente961 │ HPV quadrivalent │ 140.52 │ ├───────────┴─────────────┴────────────────────────────────────────────────────┴───────────┤ │ ? rows (>9999 rows, 20 shown) 4 columns │ └──────────────────────────────────────────────────────────────────────────────────────────┘
RIGHT JOIN¶
- It returns all rows from the right table, and the matched rows from the left table. The result is NULL from the left side, if there is no match. Question: For all Immunizations recorded, which patients received them, if identifiable?
conn.sql("SELECT patients.FIRST, patients.LAST, immunizations.DESCRIPTION, immunizations.BASE_COST FROM patients RIGHT JOIN immunizations ON patients.Id = immunizations.PATIENT ORDER BY immunizations.DESCRIPTION;")
┌────────────────┬──────────────┬──────────────┬───────────┐
│ FIRST │ LAST │ DESCRIPTION │ BASE_COST │
│ varchar │ varchar │ varchar │ double │
├────────────────┼──────────────┼──────────────┼───────────┤
│ Shantay950 │ Quitzon246 │ DTaP │ 140.52 │
│ Cleveland582 │ VonRueden376 │ DTaP │ 140.52 │
│ Flo729 │ Quigley282 │ DTaP │ 140.52 │
│ Flo729 │ Quigley282 │ DTaP │ 140.52 │
│ Werner409 │ Schaden604 │ DTaP │ 140.52 │
│ Pasty639 │ Ortiz186 │ DTaP │ 140.52 │
│ Pasty639 │ Ortiz186 │ DTaP │ 140.52 │
│ Angelica442 │ Kovacek682 │ DTaP │ 140.52 │
│ Cathie710 │ Hegmann834 │ DTaP │ 140.52 │
│ Pasty639 │ Ortiz186 │ DTaP │ 140.52 │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ Art115 │ Roberts511 │ Hep A adult │ 140.52 │
│ Christopher407 │ Davis923 │ Hep A adult │ 140.52 │
│ Michiko564 │ Dooley940 │ Hep A adult │ 140.52 │
│ Celine582 │ Sipes176 │ Hep A adult │ 140.52 │
│ Darci883 │ Miller503 │ Hep A adult │ 140.52 │
│ Ethan766 │ Morar593 │ Hep A adult │ 140.52 │
│ Georgiann138 │ Heathcote539 │ Hep A adult │ 140.52 │
│ Herbert830 │ Wolff180 │ Hep A adult │ 140.52 │
│ Ismael683 │ King743 │ Hep A adult │ 140.52 │
│ Ellsworth48 │ Mertz280 │ Hep A adult │ 140.52 │
├────────────────┴──────────────┴──────────────┴───────────┤
│ ? rows (>9999 rows, 20 shown) 4 columns │
└──────────────────────────────────────────────────────────┘
FULL OUTER JOIN¶
- It returns rows when there is a match in one of the tables. It effectively combines the results of both LEFT JOIN and RIGHT JOIN. Question: What is the complete list of patients and their Immunizations, including those without recorded treatments or identifiable patients?
conn.sql("SELECT patients.FIRST, patients.LAST, immunizations.DESCRIPTION, immunizations.BASE_COST FROM patients FULL OUTER JOIN immunizations ON patients.Id = immunizations.PATIENT ORDER BY patients.FIRST ASC;")
┌───────────┬─────────────┬────────────────────────────────────────────────────┬───────────┐
│ FIRST │ LAST │ DESCRIPTION │ BASE_COST │
│ varchar │ varchar │ varchar │ double │
├───────────┼─────────────┼────────────────────────────────────────────────────┼───────────┤
│ Aaron697 │ Legros616 │ Influenza seasonal injectable preservative free │ 140.52 │
│ Aaron697 │ Deckow585 │ Influenza seasonal injectable preservative free │ 140.52 │
│ Aaron697 │ Thompson596 │ Influenza seasonal injectable preservative free │ 140.52 │
│ Aaron697 │ Auer97 │ Influenza seasonal injectable preservative free │ 140.52 │
│ Aaron697 │ Crooks415 │ Influenza seasonal injectable preservative free │ 140.52 │
│ Aaron697 │ Crooks415 │ Td (adult) preservative free │ 140.52 │
│ Aaron697 │ Volkman526 │ Influenza seasonal injectable preservative free │ 140.52 │
│ Aaron697 │ Hartmann983 │ Influenza seasonal injectable preservative free │ 140.52 │
│ Aaron697 │ Orn563 │ NULL │ NULL │
│ Aaron697 │ Hartmann983 │ IPV │ 140.52 │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ Antonia30 │ Goodwin327 │ HPV quadrivalent │ 140.52 │
│ Antonia30 │ Funk324 │ Influenza seasonal injectable preservative free │ 140.52 │
│ Antonia30 │ West559 │ Influenza seasonal injectable preservative free │ 140.52 │
│ Antonia30 │ Lowe577 │ NULL │ NULL │
│ Antonia30 │ D'Amore443 │ Pneumococcal conjugate PCV 13 │ 140.52 │
│ Antonia30 │ Montes106 │ meningococcal MCV4P │ 140.52 │
│ Antonia30 │ González124 │ Influenza seasonal injectable preservative free │ 140.52 │
│ Antonia30 │ Franecki195 │ zoster │ 140.52 │
│ Antonia30 │ Rojo930 │ Pneumococcal conjugate PCV 13 │ 140.52 │
│ Antonia30 │ Puente961 │ HPV quadrivalent │ 140.52 │
├───────────┴─────────────┴────────────────────────────────────────────────────┴───────────┤
│ ? rows (>9999 rows, 20 shown) 4 columns │
└──────────────────────────────────────────────────────────────────────────────────────────┘
CROSS JOIN¶
- It returns a Cartesian product of the two tables, i.e., it returns rows combining each row from the first table with each row from the second table.
Question: What are all possible combinations of patients and immunizations?
conn.sql("SELECT patients.FIRST, patients.LAST, immunizations.DESCRIPTION FROM patients CROSS JOIN immunizations;")
┌───────────┬─────────────┬────────────────────────────────────────────────────┐ │ FIRST │ LAST │ DESCRIPTION │ │ varchar │ varchar │ varchar │ ├───────────┼─────────────┼────────────────────────────────────────────────────┤ │ Tammy740 │ Ernser583 │ Influenza seasonal injectable preservative free │ │ Tammy740 │ Ernser583 │ Hep A ped/adol 2 dose │ │ Tammy740 │ Ernser583 │ Influenza seasonal injectable preservative free │ │ Tammy740 │ Ernser583 │ Influenza seasonal injectable preservative free │ │ Tammy740 │ Ernser583 │ meningococcal MCV4P │ │ Tammy740 │ Ernser583 │ Hep B adolescent or pediatric │ │ Tammy740 │ Ernser583 │ Hep B adolescent or pediatric │ │ Tammy740 │ Ernser583 │ Hib (PRP-OMP) │ │ Tammy740 │ Ernser583 │ rotavirus monovalent │ │ Tammy740 │ Ernser583 │ IPV │ │ · │ · │ · │ │ · │ · │ · │ │ · │ · │ · │ │ Iliana226 │ Schmeler639 │ Influenza seasonal injectable preservative free │ │ Iliana226 │ Schmeler639 │ zoster │ │ Iliana226 │ Schmeler639 │ Influenza seasonal injectable preservative free │ │ Iliana226 │ Schmeler639 │ Influenza seasonal injectable preservative free │ │ Iliana226 │ Schmeler639 │ Influenza seasonal injectable preservative free │ │ Iliana226 │ Schmeler639 │ Influenza seasonal injectable preservative free │ │ Iliana226 │ Schmeler639 │ Influenza seasonal injectable preservative free │ │ Iliana226 │ Schmeler639 │ Influenza seasonal injectable preservative free │ │ Iliana226 │ Schmeler639 │ Influenza seasonal injectable preservative free │ │ Iliana226 │ Schmeler639 │ Influenza seasonal injectable preservative free │ ├───────────┴─────────────┴────────────────────────────────────────────────────┤ │ ? rows (>9999 rows, 20 shown) 3 columns │ └──────────────────────────────────────────────────────────────────────────────┘
SELF JOIN¶
- It is not a different type of join, but it's a regular join used to join a table to itself. It's useful for queries where you need to compare rows within the same table.
Question: Which pair of patients are from the same city?
conn.sql("SELECT A.FIRST AS FirstPatient, B.FIRST AS SecondPatient, A.CITY FROM patients A, patients B WHERE A.CITY = B.CITY AND A.Id != B.Id ORDER BY A.CITY ASC;")
┌──────────────┬───────────────┬──────────┐ │ FirstPatient │ SecondPatient │ CITY │ │ varchar │ varchar │ varchar │ ├──────────────┼───────────────┼──────────┤ │ Susan422 │ Stefani254 │ Abington │ │ Jennell254 │ Lionel365 │ Abington │ │ Robbie31 │ Antonio44 │ Abington │ │ Joe656 │ Antonio44 │ Abington │ │ Susan422 │ Stevie682 │ Abington │ │ Shaun461 │ Stevie682 │ Abington │ │ Shaun461 │ Weston546 │ Abington │ │ Susan422 │ Elwood28 │ Abington │ │ Susan422 │ Daria61 │ Abington │ │ Shaun461 │ Lita714 │ Abington │ │ · │ · │ · │ │ · │ · │ · │ │ · │ · │ · │ │ Elane105 │ Conrad619 │ Abington │ │ Logan497 │ Conrad619 │ Abington │ │ Sherlene302 │ Conrad619 │ Abington │ │ Milan77 │ Conrad619 │ Abington │ │ Frankie174 │ Seema671 │ Abington │ │ Elane105 │ Josefina523 │ Abington │ │ Lisha487 │ Shawanna357 │ Abington │ │ Marleen824 │ Marcelino726 │ Abington │ │ Sandi885 │ Lionel365 │ Abington │ │ Morton637 │ Viva686 │ Abington │ ├──────────────┴───────────────┴──────────┤ │ ? rows (>9999 rows, 20 shown) │ └─────────────────────────────────────────┘
Introduction to Subqueries and Nested Selects¶
- Subqueries, also known as inner queries or nested queries, are SQL queries nested inside a larger query. They allow you to perform operations that usually require multiple steps in a single query, making your data retrieval process more efficient and concise. Nested selects are a type of subquery used specifically within the SELECT, FROM, or WHERE clauses to provide a dataset for the outer query to process.
- Subqueries can return a single value, a single row, multiple rows, or a table. They are used for comparison, as a condition, or to provide a list of values for the outer query. The main distinction between correlated and non-correlated subqueries is that correlated subqueries reference column(s) from the outer query, thus running once for each row selected by the outer query, while non-correlated subqueries run independently of the outer query and can be run as standalone queries.
Types of subqueries¶
- Single-Row Subqueries: Single-row subqueries return only one row and are used with single row comparison operators like =, >, <, >=, <=. They are often used in the WHERE clause to compare a column value against the result of the subquery.
- Multi-Row Subqueries: Multi-row subqueries return more than one row and are used with operators like IN, ANY, ALL, which allow comparison against multiple values. They're useful for filtering based on a set of criteria returned by the subquery.
- Correlated Subqueries: Correlated subqueries reference column(s) from the outer query, making them dependent on the outer query. They are executed once for each row processed by the outer query, often leading to performance considerations.
Use subqueries to find patients based on specific criteria.¶
Find all patients who have been prescribed medication with a base cost higher than the average base cost of all medications.¶
query = """SELECT p.FIRST, p.LAST
FROM patients p
WHERE p.Id IN ( SELECT m.PATIENT FROM medications m WHERE m.BASE_COST > (SELECT AVG(BASE_COST) FROM medications));
"""
conn.sql(query)
┌─────────────┬─────────────┐
│ FIRST │ LAST │
│ varchar │ varchar │
├─────────────┼─────────────┤
│ Dewitt635 │ Reichel38 │
│ Anisa442 │ Purdy2 │
│ Ross213 │ Mayert710 │
│ Jim478 │ Mueller846 │
│ Maria750 │ Schimmel440 │
│ Ignacio928 │ Gorczany269 │
│ Kip442 │ Zboncak558 │
│ Jean712 │ Kuhlman484 │
│ Mac103 │ Moen819 │
│ Dianna917 │ Goldner995 │
│ · │ · │
│ · │ · │
│ · │ · │
│ Cameron381 │ Bogan287 │
│ Andrew29 │ Donnelly343 │
│ Charis952 │ Littel644 │
│ Lyndon118 │ Swift555 │
│ Edward499 │ Zieme486 │
│ Myron933 │ Ritchie586 │
│ Yolonda722 │ Champlin946 │
│ Alayna598 │ Kozey370 │
│ Kristian973 │ Ledner144 │
│ Sydney660 │ Zulauf375 │
├─────────────┴─────────────┤
│ ? rows 2 columns │
└───────────────────────────┘
Correlated Subqueries¶
Learn how to use correlated subqueries to perform row-specific comparisons. Find patients whose healthcare expenses are higher than the average expenses in their county.
query = """SELECT
p.FIRST,
p.LAST,
p.COUNTY,
p.HEALTHCARE_EXPENSES,
ROUND(AVG_EXPENSES.COUNTY_AVG_EXPENSES, 2) AS ROUNDED_AVG_HEALTHCARE_EXPENSES
FROM
patients p
INNER JOIN (
SELECT
COUNTY,
AVG(CAST(HEALTHCARE_EXPENSES AS DECIMAL(20,6))) AS COUNTY_AVG_EXPENSES
FROM
patients
GROUP BY
COUNTY
) AS AVG_EXPENSES ON p.COUNTY = AVG_EXPENSES.COUNTY
WHERE
p.HEALTHCARE_EXPENSES > (
SELECT AVG(p2.HEALTHCARE_EXPENSES)
FROM patients p2
WHERE p2.COUNTY = p.COUNTY
);
"""
conn.sql(query)
query = """SELECT
p.FIRST,
p.LAST,
p.COUNTY,
p.HEALTHCARE_EXPENSES,
ROUND(AVG_EXPENSES.COUNTY_AVG_EXPENSES, 2) AS ROUNDED_AVG_HEALTHCARE_EXPENSES
FROM
patients p
INNER JOIN (
SELECT
… FROM patients p2
WHERE p2.COUNTY = p.COUNTY
);
"""
conn.sql(query)
┌────────────┬───────────────┬───────────────────┬─────────────────────┬─────────────────────────────────┐
│ FIRST │ LAST │ COUNTY │ HEALTHCARE_EXPENSES │ ROUNDED_AVG_HEALTHCARE_EXPENSES │
│ varchar │ varchar │ varchar │ double │ double │
├────────────┼───────────────┼───────────────────┼─────────────────────┼─────────────────────────────────┤
│ Tammy740 │ Ernser583 │ Bristol County │ 1546025.67 │ 793296.73 │
│ Iliana226 │ Schmeler639 │ Barnstable County │ 1407960.93 │ 949155.23 │
│ Anthony633 │ Yundt842 │ Essex County │ 1575731.48 │ 814437.7 │
│ Jim478 │ Mueller846 │ Bristol County │ 1112473.78 │ 793296.73 │
│ Sina65 │ Howell947 │ Middlesex County │ 1479425.58 │ 841491.73 │
│ Maria750 │ Schimmel440 │ Plymouth County │ 952814.55 │ 856778.1 │
│ Lorenzo669 │ Vandervort697 │ Middlesex County │ 873001.47 │ 841491.73 │
│ David908 │ Carter549 │ Worcester County │ 834576.02 │ 804752.74 │
│ Noah480 │ O'Reilly797 │ Essex County │ 915110.6 │ 814437.7 │
│ Kip442 │ Zboncak558 │ Middlesex County │ 1704103.7200000002 │ 841491.73 │
│ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │
│ Tinisha932 │ Zulauf375 │ Bristol County │ 1035244.36 │ 793296.73 │
│ Mario764 │ Estrada938 │ Suffolk County │ 1380401.5 │ 664340.79 │
│ Glenn0 │ Schulist381 │ Barnstable County │ 1357123.8 │ 949155.23 │
│ Porter490 │ Schulist381 │ Bristol County │ 1534216.57 │ 793296.73 │
│ Darius626 │ Hackett68 │ Essex County │ 1715140.52 │ 814437.7 │
│ Leonel449 │ Kunze215 │ Essex County │ 1492352.86 │ 814437.7 │
│ Tyrell880 │ Kassulke119 │ Hampden County │ 1053520.21 │ 725636.71 │
│ Arlen68 │ Gusikowski974 │ Worcester County │ 1363289.9 │ 804752.74 │
│ Ashli227 │ Huel628 │ Worcester County │ 813730.02 │ 804752.74 │
│ Loriann967 │ Mertz280 │ Norfolk County │ 1618397.57 │ 874351.83 │
├────────────┴───────────────┴───────────────────┴─────────────────────┴─────────────────────────────────┤
│ ? rows (>9999 rows, 20 shown) 5 columns │
└────────────────────────────────────────────────────────────────────────────────────────────────────────┘
query = """
SELECT p.FIRST, p.LAST
FROM patients p
WHERE NOT EXISTS (
SELECT 1 FROM medications m WHERE m.PATIENT = p.Id
);
"""
conn.sql(query)
query = """
SELECT p.FIRST, p.LAST
FROM patients p
WHERE NOT EXISTS (
SELECT 1 FROM medications m WHERE m.PATIENT = p.Id
);
"""
conn.sql(query)
┌────────────┬───────────────┐
│ FIRST │ LAST │
│ varchar │ varchar │
├────────────┼───────────────┤
│ Iliana226 │ Schmeler639 │
│ Toshiko149 │ Swaniawski813 │
│ Heath320 │ Streich926 │
│ Rana586 │ Langworth352 │
│ Dorathy429 │ Yost751 │
│ Deon400 │ Littel644 │
│ Emerson869 │ D'Amore443 │
│ Alec433 │ Sanford861 │
│ Roselyn270 │ McLaughlin530 │
│ Claudia969 │ Smith67 │
│ · │ · │
│ · │ · │
│ · │ · │
│ Lou594 │ Gleason633 │
│ Hank686 │ Russel238 │
│ Earl438 │ Block661 │
│ Mariano761 │ Heller342 │
│ Sandra485 │ Fisher429 │
│ Claire652 │ Murray856 │
│ Doreen575 │ Graham902 │
│ Fausto876 │ Kuhn96 │
│ Vernon254 │ Gutmann970 │
│ Ernest565 │ Block661 │
├────────────┴───────────────┤
│ ? rows 2 columns │
└────────────────────────────┘
query = """
SELECT
p.FIRST,
p.LAST,
p.CITY,
ROUND(p.HEALTHCARE_COVERAGE,2) AS INDIVIDUAL_COVERAGE,
ROUND(c.AVG_COVERAGE,2) AS CITY_AVERAGE_COVERAGE
FROM
patients p
INNER JOIN (
SELECT
CITY,
AVG(HEALTHCARE_COVERAGE) AS AVG_COVERAGE
FROM
patients
GROUP BY
CITY
) AS c ON p.CITY = c.CITY
WHERE
p.HEALTHCARE_COVERAGE > (
SELECT AVG(HEALTHCARE_COVERAGE)
FROM patients
WHERE CITY = p.CITY
);
"""
conn.sql(query)
query = """
SELECT
p.FIRST,
p.LAST,
p.CITY,
ROUND(p.HEALTHCARE_COVERAGE,2) AS INDIVIDUAL_COVERAGE,
ROUND(c.AVG_COVERAGE,2) AS CITY_AVERAGE_COVERAGE
FROM
patients p
INNER JOIN (
… SELECT AVG(HEALTHCARE_COVERAGE)
FROM patients
WHERE CITY = p.CITY
);
"""
conn.sql(query)
┌──────────────┬────────────────┬───────────────┬─────────────────────┬───────────────────────┐
│ FIRST │ LAST │ CITY │ INDIVIDUAL_COVERAGE │ CITY_AVERAGE_COVERAGE │
│ varchar │ varchar │ varchar │ double │ double │
├──────────────┼────────────────┼───────────────┼─────────────────────┼───────────────────────┤
│ Anisa442 │ Purdy2 │ Methuen │ 34393.57 │ 12949.31 │
│ Sina65 │ Howell947 │ Sudbury │ 15481.43 │ 15385.85 │
│ Maria750 │ Schimmel440 │ Plymouth │ 14231.22 │ 10826.53 │
│ Kip442 │ Zboncak558 │ Hopkinton │ 155478.44 │ 15439.17 │
│ Hans694 │ Wilkinson796 │ Framingham │ 13729.73 │ 11988.99 │
│ Jean712 │ Kuhlman484 │ Boxford │ 11421.48 │ 8660.53 │
│ Lorette239 │ Abbott774 │ Dennis │ 78740.23 │ 14918.28 │
│ Tobi258 │ Bernier607 │ Wilmington │ 13889.6 │ 9411.52 │
│ Rosia390 │ Reichel38 │ Rochester │ 16234.29 │ 9736.74 │
│ Florencia449 │ Mercado213 │ Wilbraham │ 11427.25 │ 10468.26 │
│ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │
│ · │ · │ · │ · │ · │
│ Jc393 │ Marks830 │ Brockton │ 16284.35 │ 12287.77 │
│ Cedric746 │ Marvin195 │ Dedham │ 19806.72 │ 13840.99 │
│ Milford485 │ Schmitt836 │ Hubbardston │ 10592.34 │ 8172.48 │
│ Tegan755 │ Ruecker817 │ Waltham │ 30493.65 │ 14495.07 │
│ Lindsay928 │ Lang846 │ Boston │ 11837.05 │ 10725.8 │
│ Claudie965 │ Hermann103 │ Medford │ 15420.54 │ 12610.05 │
│ Julie245 │ Nolan344 │ Essex │ 31903.53 │ 31787.13 │
│ Mui729 │ Stoltenberg489 │ Wakefield │ 21528.91 │ 13699.32 │
│ Oliver401 │ Lesch175 │ Yarmouth │ 27442.87 │ 19907.83 │
│ Mi492 │ Gusikowski974 │ North Andover │ 18481.09 │ 17677.13 │
├──────────────┴────────────────┴───────────────┴─────────────────────┴───────────────────────┤
│ ? rows (>9999 rows, 20 shown) 5 columns │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
FUNCTIONS IN SQL¶
String Functions¶
CONCAT: Concatenates two or more strings.¶
query = """SELECT CONCAT(FIRST, ' ', LAST) AS FullName FROM patients LIMIT 10;
;"""
conn.sql(query)
┌───────────────────────────┐
│ FullName │
│ varchar │
├───────────────────────────┤
│ Jacinto644 Kris249 │
│ Alva958 Krajcik437 │
│ Jimmie93 Harris789 │
│ Gregorio366 Auer97 │
│ Karyn217 Mueller846 │
│ Jayson808 Fadel536 │
│ José Eduardo181 Gómez206 │
│ Milo271 Feil794 │
│ Karyn217 Metz686 │
│ Jeffrey461 Greenfelder433 │
├───────────────────────────┤
│ 10 rows │
└───────────────────────────┘
DATALENGTH: Returns the number of bytes used to represent any expre¶
query = """
SELECT FIRST, LENGTH(FIRST) AS LengthInBytes FROM patients LIMIT 10;
"""
conn.sql(query)
query = """
SELECT FIRST, LENGTH(FIRST) AS LengthInBytes FROM patients LIMIT 10;
"""
conn.sql(query)
┌─────────────────┬───────────────┐
│ FIRST │ LengthInBytes │
│ varchar │ int64 │
├─────────────────┼───────────────┤
│ Jacinto644 │ 10 │
│ Alva958 │ 7 │
│ Jimmie93 │ 8 │
│ Gregorio366 │ 11 │
│ Karyn217 │ 8 │
│ Jayson808 │ 9 │
│ José Eduardo181 │ 15 │
│ Milo271 │ 7 │
│ Karyn217 │ 8 │
│ Jeffrey461 │ 10 │
├─────────────────┴───────────────┤
│ 10 rows 2 columns │
└─────────────────────────────────┘
LEFT: Returns the left part of a character string with the specified number of characters.¶
query = """SELECT LEFT(FIRST, 5) AS Initial FROM patients LIMIT 10;"""
conn.sql(query)
query = """SELECT LEFT(FIRST, 5) AS Initial FROM patients LIMIT 10;"""
conn.sql(query)
┌─────────┐
│ Initial │
│ varchar │
├─────────┤
│ Jacin │
│ Alva9 │
│ Jimmi │
│ Grego │
│ Karyn │
│ Jayso │
│ José │
│ Milo2 │
│ Karyn │
│ Jeffr │
├─────────┤
│ 10 rows │
└─────────┘
LOWER: Converts all characters in the specified string to lowercase.¶
UPPER: Convert to uppercase¶
query = """
SELECT LOWER(FIRST) AS LowercaseFirstName, UPPER(LAST) AS UppercaseLastName FROM patients LIMIT 10;
"""
conn.sql(query)
query = """
SELECT LOWER(FIRST) AS LowercaseFirstName, UPPER(LAST) AS UppercaseLastName FROM patients LIMIT 10;
"""
conn.sql(query)
┌────────────────────┬───────────────────┐
│ LowercaseFirstName │ UppercaseLastName │
│ varchar │ varchar │
├────────────────────┼───────────────────┤
│ jacinto644 │ KRIS249 │
│ alva958 │ KRAJCIK437 │
│ jimmie93 │ HARRIS789 │
│ gregorio366 │ AUER97 │
│ karyn217 │ MUELLER846 │
│ jayson808 │ FADEL536 │
│ josé eduardo181 │ GÓMEZ206 │
│ milo271 │ FEIL794 │
│ karyn217 │ METZ686 │
│ jeffrey461 │ GREENFELDER433 │
├────────────────────┴───────────────────┤
│ 10 rows 2 columns │
└────────────────────────────────────────┘
query = """SELECT LTRIM(ADDRESS) AS TrimmedAddress FROM patients LIMIT 10;"""
conn.sql(query)
query = """SELECT LTRIM(ADDRESS) AS TrimmedAddress FROM patients LIMIT 10;"""
conn.sql(query)
┌────────────────────────────────┐
│ TrimmedAddress │
│ varchar │
├────────────────────────────────┤
│ 888 Hickle Ferry Suite 38 │
│ 1048 Skiles Trailer │
│ 201 Mitchell Lodge Unit 67 │
│ 1050 Lindgren Extension Apt 38 │
│ 570 Abshire Forge Suite 32 │
│ 1056 Harris Lane Suite 70 │
│ 427 Balistreri Way Unit 19 │
│ 422 Farrell Path Unit 69 │
│ 181 Feest Passage Suite 64 │
│ 428 Wiza Glen Unit 91 │
├────────────────────────────────┤
│ 10 rows │
└────────────────────────────────┘
query = """SELECT FIRST, LAST,
ADDRESS AS LongAddress,
REPLACE(ADDRESS, 'Street', 'St') AS ShortAddress
FROM
patients
WHERE
ADDRESS LIKE '%Street%' LIMIT 10;
"""
conn.sql(query)
┌────────────┬───────────────┬───────────────────────────┬───────────────────────┐
│ FIRST │ LAST │ LongAddress │ ShortAddress │
│ varchar │ varchar │ varchar │ varchar │
├────────────┼───────────────┼───────────────────────────┼───────────────────────┤
│ Lorrie905 │ Leannon79 │ 813 Casper Street │ 813 Casper St │
│ Asa127 │ Block661 │ 140 Rohan Street Suite 50 │ 140 Rohan St Suite 50 │
│ Logan497 │ Brekke496 │ 1081 Orn Street │ 1081 Orn St │
│ Cletus494 │ Strosin214 │ 1019 Haley Street │ 1019 Haley St │
│ Cortney940 │ Stehr398 │ 458 Streich Street │ 458 Streich St │
│ Latoyia537 │ Gaylord332 │ 200 Heaney Street │ 200 Heaney St │
│ Clark193 │ Hilll811 │ 202 Tromp Street Suite 0 │ 202 Tromp St Suite 0 │
│ Jc393 │ Bosco882 │ 792 Walsh Street │ 792 Walsh St │
│ Noel608 │ Swaniawski813 │ 155 Walter Street │ 155 Walter St │
│ Sari509 │ Hoppe518 │ 550 Lang Street Suite 65 │ 550 Lang St Suite 65 │
├────────────┴───────────────┴───────────────────────────┴───────────────────────┤
│ 10 rows 4 columns │
└────────────────────────────────────────────────────────────────────────────────┘
query = """SELECT FIRST, RIGHT(FIRST, 6) AS LastSixChars FROM patients LIMIT 10;
"""
conn.sql(query)
┌─────────────────┬──────────────┐
│ FIRST │ LastSixChars │
│ varchar │ varchar │
├─────────────────┼──────────────┤
│ Jacinto644 │ nto644 │
│ Alva958 │ lva958 │
│ Jimmie93 │ mmie93 │
│ Gregorio366 │ rio366 │
│ Karyn217 │ ryn217 │
│ Jayson808 │ son808 │
│ José Eduardo181 │ rdo181 │
│ Milo271 │ ilo271 │
│ Karyn217 │ ryn217 │
│ Jeffrey461 │ rey461 │
├─────────────────┴──────────────┤
│ 10 rows 2 columns │
└────────────────────────────────┘
query = """SELECT FIRST, SUBSTRING(FIRST, 1, 4) AS FirstFourChars FROM patients LIMIT 5;
"""
conn.sql(query)
┌─────────────┬────────────────┐
│ FIRST │ FirstFourChars │
│ varchar │ varchar │
├─────────────┼────────────────┤
│ Jacinto644 │ Jaci │
│ Alva958 │ Alva │
│ Jimmie93 │ Jimm │
│ Gregorio366 │ Greg │
│ Karyn217 │ Kary │
└─────────────┴────────────────┘
query = """SELECT
FIRST, REGEXP_REPLACE(FIRST, '[0-9]', '', 'g') AS FirstNameStripped, LAST, REGEXP_REPLACE(LAST, '[0-9]', '', 'g') AS LastNameStripped
FROM patients;
"""
conn.sql(query)
query = """SELECT
FIRST, REGEXP_REPLACE(FIRST, '[0-9]', '', 'g') AS FirstNameStripped, LAST, REGEXP_REPLACE(LAST, '[0-9]', '', 'g') AS LastNameStripped
FROM patients;
"""
conn.sql(query)
┌─────────────────┬───────────────────┬────────────────┬──────────────────┐
│ FIRST │ FirstNameStripped │ LAST │ LastNameStripped │
│ varchar │ varchar │ varchar │ varchar │
├─────────────────┼───────────────────┼────────────────┼──────────────────┤
│ Jacinto644 │ Jacinto │ Kris249 │ Kris │
│ Alva958 │ Alva │ Krajcik437 │ Krajcik │
│ Jimmie93 │ Jimmie │ Harris789 │ Harris │
│ Gregorio366 │ Gregorio │ Auer97 │ Auer │
│ Karyn217 │ Karyn │ Mueller846 │ Mueller │
│ Jayson808 │ Jayson │ Fadel536 │ Fadel │
│ José Eduardo181 │ José Eduardo │ Gómez206 │ Gómez │
│ Milo271 │ Milo │ Feil794 │ Feil │
│ Karyn217 │ Karyn │ Metz686 │ Metz │
│ Jeffrey461 │ Jeffrey │ Greenfelder433 │ Greenfelder │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ Raymond398 │ Raymond │ Kuvalis369 │ Kuvalis │
│ Gearldine455 │ Gearldine │ Boyer713 │ Boyer │
│ Nichol11 │ Nichol │ Gleichner915 │ Gleichner │
│ Louvenia131 │ Louvenia │ Marks830 │ Marks │
│ Raymon366 │ Raymon │ Beer512 │ Beer │
│ Camelia346 │ Camelia │ Stamm704 │ Stamm │
│ William805 │ William │ Pacocha935 │ Pacocha │
│ Guillermo498 │ Guillermo │ Téllez750 │ Téllez │
│ Milton509 │ Milton │ Bailey598 │ Bailey │
│ Cecilia788 │ Cecilia │ Wisozk929 │ Wisozk │
├─────────────────┴───────────────────┴────────────────┴──────────────────┤
│ ? rows (>9999 rows, 20 shown) 4 columns │
└─────────────────────────────────────────────────────────────────────────┘
Creating a database and adding data to tables¶
CREATE, ALTER, DROP, INSERT commands¶
First lets have a in-memory database
newdb = duckdb.connect()
query = """CREATE TABLE books (
book_id INTEGER PRIMARY KEY,
title VARCHAR,
author VARCHAR,
published_year INTEGER
);
"""
newdb.sql(query)
check
newdb.sql("SHOW TABLES;")
┌─────────┐
│ name │
│ varchar │
├─────────┤
│ books │
└─────────┘
newdb.sql("DESCRIBE books;")
newdb.sql("DESCRIBE books;")
┌────────────────┬─────────────┬─────────┬─────────┬─────────┬───────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ int32 │
├────────────────┼─────────────┼─────────┼─────────┼─────────┼───────┤
│ book_id │ INTEGER │ NO │ PRI │ NULL │ NULL │
│ title │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ author │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ published_year │ INTEGER │ YES │ NULL │ NULL │ NULL │
└────────────────┴─────────────┴─────────┴─────────┴─────────┴───────┘
query = """ALTER TABLE books ADD COLUMN genre VARCHAR;
"""
newdb.sql(query)
# This adds a genre column to the books table, allowing you to categorize books.
newdb.sql("DESCRIBE books;")
newdb.sql("DESCRIBE books;")
┌────────────────┬─────────────┬─────────┬─────────┬─────────┬───────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ int32 │
├────────────────┼─────────────┼─────────┼─────────┼─────────┼───────┤
│ book_id │ INTEGER │ NO │ PRI │ NULL │ NULL │
│ title │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ author │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ published_year │ INTEGER │ YES │ NULL │ NULL │ NULL │
│ genre │ VARCHAR │ YES │ NULL │ NULL │ NULL │
└────────────────┴─────────────┴─────────┴─────────┴─────────┴───────┘
query = """CREATE TABLE authors (
author_id INTEGER PRIMARY KEY,
firstname VARCHAR,
lastname VARCHAR,
birth_year INTEGER
);
"""
newdb.sql(query)
newdb.sql("SHOW TABLES;")
query = """DROP TABLE IF EXISTS authors;
"""
newdb.sql(query)
newdb.sql("SHOW TABLES;")
Let's add some data now¶
INSERT To populate the books table with data. This inserts 10 records into the books table, providing a diverse set of examples.¶
query = """INSERT INTO books (book_id, title, author, published_year, genre) VALUES
(1, '1984', 'George Orwell', 1949, 'Dystopian'),
(2, 'The Great Gatsby', 'F. Scott Fitzgerald', 1925, 'Classic'),
(3, 'To Kill a Mockingbird', 'Harper Lee', 1960, 'Classic'),
(4, 'Pride and Prejudice', 'Jane Austen', 1813, 'Romance'),
(5, 'The Hobbit', 'J.R.R. Tolkien', 1937, 'Fantasy'),
(6, 'The Catcher in the Rye', 'J.D. Salinger', 1951, 'Literary Fiction'),
(7, 'Harry Potter and the Sorcerer’s Stone', 'J.K. Rowling', 1997, 'Fantasy'),
(8, 'The Da Vinci Code', 'Dan Brown', 2003, 'Thriller'),
(9, 'Sapiens: A Brief History of Humankind', 'Yuval Noah Harari', 2011, 'Non-fiction'),
(10, 'Beloved', 'Toni Morrison', 1987, 'Historical Fiction');"""
newdb.sql(query)
query = """SELECT title AS "Book Title", author AS "Author Name", Genre FROM books;
"""
newdb.sql(query)
┌───────────────────────────────────────┬─────────────────────┬────────────────────┐
│ Book Title │ Author Name │ genre │
│ varchar │ varchar │ varchar │
├───────────────────────────────────────┼─────────────────────┼────────────────────┤
│ 1984 │ George Orwell │ Dystopian │
│ The Great Gatsby │ F. Scott Fitzgerald │ Classic │
│ To Kill a Mockingbird │ Harper Lee │ Classic │
│ Pride and Prejudice │ Jane Austen │ Romance │
│ The Hobbit │ J.R.R. Tolkien │ Fantasy │
│ The Catcher in the Rye │ J.D. Salinger │ Literary Fiction │
│ Harry Potter and the Sorcerer’s Stone │ J.K. Rowling │ Fantasy │
│ The Da Vinci Code │ Dan Brown │ Thriller │
│ Sapiens: A Brief History of Humankind │ Yuval Noah Harari │ Non-fiction │
│ Beloved │ Toni Morrison │ Historical Fiction │
├───────────────────────────────────────┴─────────────────────┴────────────────────┤
│ 10 rows 3 columns │
└──────────────────────────────────────────────────────────────────────────────────┘
Updating Data: UPDATE¶
This changes the genre of the book titled '1984' to 'Science Fiction'.
query = """UPDATE books SET genre = 'Science Fiction' WHERE title = '1984';
"""
newdb.sql(query)
query = """SELECT title AS "Book Title", author AS "Author Name", Genre FROM books;
"""
newdb.sql(query)
┌───────────────────────────────────────┬─────────────────────┬────────────────────┐
│ Book Title │ Author Name │ genre │
│ varchar │ varchar │ varchar │
├───────────────────────────────────────┼─────────────────────┼────────────────────┤
│ 1984 │ George Orwell │ Science Fiction │
│ The Great Gatsby │ F. Scott Fitzgerald │ Classic │
│ To Kill a Mockingbird │ Harper Lee │ Classic │
│ Pride and Prejudice │ Jane Austen │ Romance │
│ The Hobbit │ J.R.R. Tolkien │ Fantasy │
│ The Catcher in the Rye │ J.D. Salinger │ Literary Fiction │
│ Harry Potter and the Sorcerer’s Stone │ J.K. Rowling │ Fantasy │
│ The Da Vinci Code │ Dan Brown │ Thriller │
│ Sapiens: A Brief History of Humankind │ Yuval Noah Harari │ Non-fiction │
│ Beloved │ Toni Morrison │ Historical Fiction │
├───────────────────────────────────────┴─────────────────────┴────────────────────┤
│ 10 rows 3 columns │
└──────────────────────────────────────────────────────────────────────────────────┘
DELETE¶
e.g. To remove a book from the catalog
query = """DELETE FROM books WHERE title = 'The Great Gatsby';"""
newdb.sql(query)
query = """SELECT title AS "Book Title", author AS "Author Name", Genre FROM books;
"""
newdb.sql(query)
┌───────────────────────────────────────┬───────────────────┬────────────────────┐
│ Book Title │ Author Name │ genre │
│ varchar │ varchar │ varchar │
├───────────────────────────────────────┼───────────────────┼────────────────────┤
│ 1984 │ George Orwell │ Science Fiction │
│ To Kill a Mockingbird │ Harper Lee │ Classic │
│ Pride and Prejudice │ Jane Austen │ Romance │
│ The Hobbit │ J.R.R. Tolkien │ Fantasy │
│ The Catcher in the Rye │ J.D. Salinger │ Literary Fiction │
│ Harry Potter and the Sorcerer’s Stone │ J.K. Rowling │ Fantasy │
│ The Da Vinci Code │ Dan Brown │ Thriller │
│ Sapiens: A Brief History of Humankind │ Yuval Noah Harari │ Non-fiction │
│ Beloved │ Toni Morrison │ Historical Fiction │
└───────────────────────────────────────┴───────────────────┴────────────────────┘
TRUNCATE¶
- The TRUNCATE TABLE statement is used to delete all rows in a table without logging the individual row deletions.
- This is faster than using DELETE without a WHERE clause because TRUNCATE immediately deallocates data pages used by the table.
- Use TRUNCATE when you want to quickly remove all records from a table but keep the table structure for future use.
- Unlike DELETE, TRUNCATE does not generate a large number of transaction logs, making it more efficient for completely clearing a table.
query = """TRUNCATE TABLE books;"""
newdb.sql(query)
query = """SELECT * FROM books;
"""
newdb.sql(query)
Adding one more table; named 'authors'¶
query = """CREATE TABLE authors (
author_id INTEGER PRIMARY KEY,
name VARCHAR,
birth_year INTEGER,
nationality VARCHAR
);
"""
newdb.sql(query)
query = """INSERT INTO authors (author_id, name, birth_year, nationality) VALUES
(1, 'George Orwell', 1903, 'British'),
(2, 'F. Scott Fitzgerald', 1896, 'American'),
(3, 'Harper Lee', 1926, 'American'),
(4, 'Jane Austen', 1775, 'British'),
(5, 'J.R.R. Tolkien', 1892, 'British'),
(6, 'J.D. Salinger', 1919, 'American'),
(7, 'J.K. Rowling', 1965, 'British'),
(8, 'Dan Brown', 1964, 'American'),
(9, 'Yuval Noah Harari', 1976, 'Israeli'),
(10, 'Toni Morrison', 1931, 'American');
"""
newdb.sql(query)
query = """SELECT * FROM books;
"""
newdb.sql(query)
┌─────────┬───────────────────────────────────────┬─────────────────────┬────────────────┬────────────────────┐
│ book_id │ title │ author │ published_year │ genre │
│ int64 │ varchar │ varchar │ int64 │ varchar │
├─────────┼───────────────────────────────────────┼─────────────────────┼────────────────┼────────────────────┤
│ 1 │ 1984 │ George Orwell │ 1949 │ Dystopian │
│ 2 │ The Great Gatsby │ F. Scott Fitzgerald │ 1925 │ Classic │
│ 3 │ To Kill a Mockingbird │ Harper Lee │ 1960 │ Classic │
│ 4 │ Pride and Prejudice │ Jane Austen │ 1813 │ Romance │
│ 5 │ The Hobbit │ J.R.R. Tolkien │ 1937 │ Fantasy │
│ 6 │ The Catcher in the Rye │ J.D. Salinger │ 1951 │ Literary Fiction │
│ 7 │ Harry Potter and the Sorcerer’s Stone │ J.K. Rowling │ 1997 │ Fantasy │
│ 8 │ The Da Vinci Code │ Dan Brown │ 2003 │ Thriller │
│ 9 │ Sapiens: A Brief History of Humankind │ Yuval Noah Harari │ 2011 │ Non-fiction │
│ 10 │ Beloved │ Toni Morrison │ 1987 │ Historical Fiction │
├─────────┴───────────────────────────────────────┴─────────────────────┴────────────────┴────────────────────┤
│ 10 rows 5 columns │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
query = """SELECT * FROM authors;
"""
newdb.sql(query)
newdb.sql(query)
┌───────────┬─────────────────────┬────────────┬─────────────┐
│ author_id │ name │ birth_year │ nationality │
│ int32 │ varchar │ int32 │ varchar │
├───────────┼─────────────────────┼────────────┼─────────────┤
│ 1 │ George Orwell │ 1903 │ British │
│ 2 │ F. Scott Fitzgerald │ 1896 │ American │
│ 3 │ Harper Lee │ 1926 │ American │
│ 4 │ Jane Austen │ 1775 │ British │
│ 5 │ J.R.R. Tolkien │ 1892 │ British │
│ 6 │ J.D. Salinger │ 1919 │ American │
│ 7 │ J.K. Rowling │ 1965 │ British │
│ 8 │ Dan Brown │ 1964 │ American │
│ 9 │ Yuval Noah Harari │ 1976 │ Israeli │
│ 10 │ Toni Morrison │ 1931 │ American │
├───────────┴─────────────────────┴────────────┴─────────────┤
│ 10 rows 4 columns │
└────────────────────────────────────────────────────────────┘
EXPORTING Data to csv from SQL¶
query = """EXPORT DATABASE './' (FORMAT CSV, DELIMITER ',');"""
newdb.sql(query)