Intro to SQL
Intro to SQL: turning a data design into reality
SQL (Structured Query Language) is a method for writing queries for tabular data that is commonly used across a wide array of databases. While each database software (from MySQL to PostgreSQL to SQLite to Oracle) has its own extensions to the language, we will be looking at the core functions that you will generally be able to apply to almost any database.
Likewise if you query data in programming languages like Python’s pandas or R’s dplyr, you will see very similar concepts to the ones we’ll learn here.
SQLite
SQLite is an extremely widely-used database software and format. It’s special because unlike MySQL, PostgreSQL, or other databases that run as a separate ongoing process on your machine or on a remote server, SQLite sits as a single self-contianed file in your filesystem. It is a great way to explore SQL and relational database creation because there is virtually no software setup involved.
A wide nubmer of programs - desktop GUIs, command line, ad browser-based, can interact with SQLite files.
This tutorial will focus first on using sqlite for data quering and analysis, and only secondarily on using it for data entry and editing. We will not be going over key database concepts such as indices and transactions. These are very important features, but because they relate more towards programming database-backed applications and issues of performance optimization, they are out of scope for this introductory lesson.
Quick references
- SQL cheatsheet
- SQL core functions - used inside
SELECTstatements to modify the data on the fly, such as modifying strings. - SQLite aggregation functions - used with
SELECT ... GROUP BYstatements to aggregate multiple values into one based on grouping. - SQLite date and time functions
Getting started
Navigate to https://sqliteonline.com. This website uses an implementation of SQLite that runs entirely in your internet browser (like Palladio), so your data does not leave your computer. It also provides a query editor with affordances like autocompletion and formatting, and a query result dispay that provides a nicely-formatted table.
Download the lesson data to your computer: normalized_knoedler.sqlite3
In the sqliteonline menubar, select File -> Open DB, and select the normalized_knoedler.sqlite3 file you just downloaded. (N.B. sqlite files have a bunch of possible file extensions, but you’ll often see .db, .sqlite, and .sqlite3, the latter noting that the file was made with the most recent major version of SQLite available from 2004)
This file contains the exact same information as we used in the Palladio lesson, but arranged according to a different database schema:
salesartists- join table between
salesandartists
- join table between
collectors- join table between
sales.sellersandcollectors - join table between
sales.buyersandcollectors
- join table between
SELECT, INSERT/UPDATE, JOINS
Together, we will walk through:
- Getting all data from a table
- Getting select columns from a table
- Getting select rows from a table
- Limiting/skipping results
- Computing new columns based on existing ones
- Arithmetic
- String concatenation with
|| CASE WHENCOALESCE- Dealing with dates -
date()andjulianday()
- Wildcard searching with
LIKE UNIONandINTERSECTqueries- Aggregation with
GROUP BYCOUNTSUM,AVG,MIN,MAXgroup_concat(x, sep)
Creating tables and constraints
CREATE TABLE- Specifying columns / attributes
PRIMARY KEYTEXT / INT / FLOATUNIQUENOT NULLDEFAULT
INSERTdata withVALUESFOREIGN KEY(runPRAGMA foreign_keys = ON;first!)CHECKconstraints on columns and tables- Checking for valid dates with
CHECK (datecol IS date(datecol))
- Checking for valid dates with
- Compound
UNIQUE - Running bulk
UPDATE
Compound queries
- Storing queries with
VIEWs - Common Table Expressions (
WITH inter_tbl AS (SELECT ...) SELECT ... FROM inter_tbl...)
More software
Create tables and then import from CSVs: