How Did They Make That? - Printmaking Networks

The following are links to various software, services, and resources that I used during my dissertation research.

Data sources

The Rijksmuseum: JSON-based API (Example output)
The British Museum: Linked Open Data, accessible as bulk download as well as a SPARQL endpoint.
Printed books (I know, old school!)
- De Vries, Jan. European Urbanization: 1500-1800. Cambridge: Harvard University Press, 1984.
- van der Waals, Jan. Prenten in de gouden eeuw: van kunst tot kastpapier. Rotterdam: Museum Boijmans Van Beuningen, 2006.

curl: download JSON from Rijksmuseum API
parallel: run lots of curl calls at once, to download from the Rijksmuseum more efficiently
jq: Parse JSON into CSV files
fuseki: Graph database to store a local version of the British Museum LOD
rsync: move data and scripts on and off of Digital Ocean servers
pandoc: Turn text written in Markdown into PDF

RStudio: an integrated development environment for R
Tabula: extracts tabular data from scanned PDFs
Adobe Acrobat: OCRing PDFs (though this can also be done with open-source Tesseract)
briss: Brilliant free little tool for cropping scanned PDFs — way more intuitive than Acrobat’s cropping tools.
Excel: Yes, I use it. Easiest option to hand enter a table with just a few dozen rows
Zotero: Not technically used for data analysis, but this is my go-to citation manager. I use it in combination with Better BibTeX for formatting all my citations via Markdown/LaTeX.

SPARQL: A query language for Linked Open Data. Similar to SQL… but different.
jq: Not really a language, but you need to learn how to tell jq to turn JSON into the type of table you want
LaTeX: A rich language for formatting long documents. It is a beast, but still easier than using Word when you have hundreds of pages with sections, citations, images, and a persnickety style guide to follow.
Markdown: Also not really a language, but an easy-to-use text markup system for writing documents.

Last, and most important:

R: An open-source language designed for working with tabular data and statistical calculations. Vanilla R can be (kind of weird), but the following packages make it shine:

readr: reads in massive CSV/TSV files very quickly, and with the correct variable types (e.g. character, numeric, boolean)
dplyr: filter, group, aggregate, join, and run operations on tabular data with easy-to-use syntax and impressive speed. Without exaggeration, this may be the most important extension ever written for R.
tidyr: Transform between wide and narrow data tables (don’t worry, it’s a thing that starts to make sense once you begin to work with tabular data a lot)
lubridate: seamlessly parses many different ways for writing date strings
stringr: string manipulation functions, like regular expressions.
igraph: Network analysis package (also available in python and C). This package constructs graphs from edge lists, offers a wide range of functions for measurement, simulation, and plotting as well.
doParallel: Helps set up parallel R sessions so you can run multiple jobs at the same time, and collect all their results in one place.
clipr: A little utility package I wrote for quickly sending R results to my clipboard for pasting elsewhere, such as Palladio.
ggplot2: creates beautiful 2D plots for both screen and page.
animation: Makes animated GIFs from ggplot2.

Digital Ocean: cloud hosting service for quickly spinning up a lot of processors to run R jobs in parallel for relatively low $$$. This was the only software I actually had to directly pay for.

Hanneman, Robert A., and Mark Riddle. Introduction to Social Network Methods. Riverside: University of California, Riverside, 2005. http://faculty.ucr.edu/~hanneman/nettext/.
Prell, Christina. Social Network Analysis: History, Theory and Methodology. Los Angeles: Sage, 2011.
Arnold, Taylor, and Lauren Tilton. Humanities Data in R: Exploring Networks, Geospatial Data, Images, and Text. Cham: Springer, 2015.