Intro to SQL
Intro to SQL: turning a data design into reality
SQL (Structured Query Language) is a method for writing queries for tabular data that is commonly used across a wide array of databases. While each database software (from MySQL to PostgreSQL to SQLite to Oracle) has its own extensions to the language, we will be looking at the core functions that you will generally be able to apply to almost any database.
Likewise if you query data in programming languages like Python’s pandas
or R’s dplyr
, you will see very similar concepts to the ones we’ll learn here.
SQLite
SQLite is an extremely widely-used database software and format. It’s special because unlike MySQL, PostgreSQL, or other databases that run as a separate ongoing process on your machine or on a remote server, SQLite sits as a single self-contianed file in your filesystem. It is a great way to explore SQL and relational database creation because there is virtually no software setup involved.
A wide nubmer of programs - desktop GUIs, command line, ad browser-based, can interact with SQLite files.
This tutorial will focus first on using sqlite for data quering and analysis, and only secondarily on using it for data entry and editing. We will not be going over key database concepts such as indices and transactions. These are very important features, but because they relate more towards programming database-backed applications and issues of performance optimization, they are out of scope for this introductory lesson.
Quick references
- SQL cheatsheet
- SQL core functions - used inside
SELECT
statements to modify the data on the fly, such as modifying strings. - SQLite aggregation functions - used with
SELECT ... GROUP BY
statements to aggregate multiple values into one based on grouping. - SQLite date and time functions
Getting started
Navigate to https://sqliteonline.com. This website uses an implementation of SQLite that runs entirely in your internet browser (like Palladio), so your data does not leave your computer. It also provides a query editor with affordances like autocompletion and formatting, and a query result dispay that provides a nicely-formatted table.
Download the lesson data to your computer: normalized_knoedler.sqlite3
In the sqliteonline menubar, select File -> Open DB, and select the normalized_knoedler.sqlite3
file you just downloaded. (N.B. sqlite files have a bunch of possible file extensions, but you’ll often see .db
, .sqlite
, and .sqlite3
, the latter noting that the file was made with the most recent major version of SQLite available from 2004)
This file contains the exact same information as we used in the Palladio lesson, but arranged according to a different database schema:
sales
artists
- join table between
sales
andartists
- join table between
collectors
- join table between
sales.sellers
andcollectors
- join table between
sales.buyers
andcollectors
- join table between
SELECT, INSERT/UPDATE, JOINS
Together, we will walk through:
- Getting all data from a table
- Getting select columns from a table
- Getting select rows from a table
- Limiting/skipping results
- Computing new columns based on existing ones
- Arithmetic
- String concatenation with
||
CASE WHEN
COALESCE
- Dealing with dates -
date()
andjulianday()
- Wildcard searching with
LIKE
UNION
andINTERSECT
queries- Aggregation with
GROUP BY
COUNT
SUM
,AVG
,MIN
,MAX
group_concat(x, sep)
Creating tables and constraints
CREATE TABLE
- Specifying columns / attributes
PRIMARY KEY
TEXT / INT / FLOAT
UNIQUE
NOT NULL
DEFAULT
INSERT
data withVALUES
FOREIGN KEY
(runPRAGMA foreign_keys = ON;
first!)CHECK
constraints on columns and tables- Checking for valid dates with
CHECK (datecol IS date(datecol))
- Checking for valid dates with
- Compound
UNIQUE
- Running bulk
UPDATE
Compound queries
- Storing queries with
VIEW
s - Common Table Expressions (
WITH inter_tbl AS (SELECT ...) SELECT ... FROM inter_tbl...
)
More software
Create tables and then import from CSVs: