The following lesson was made for Yael Rice’s Fall 2020 digital art history course at Amherst College, September 18th, 2020.
It is the most recent iteration of a workshop that has been presented at venues including:
You can see the full change history of this walkthrough at https://github.com/mdlincoln/mapping-knoedler-palladio
Palladio is a simple but powerful exploratory data visualization tool built at Stanford University. It runs entirely inside your internet browser. You don’t need a user account, and none of the data you visualize ever leaves your computer. Although it is simple and has some limitations especially with large datasets, it has some very useful features for exploring a new data set and finding its oddities and eccentricities. It is a good intro way to look through a dataset and figure out what might be missing before moving on to more comprehensive tools like Tableau, Python, or R.
This walkthrough will take you step by step through several tasks in Palladio that you can do on your own before the class meeting. Each step includes a series of reflection questions, both about what we think the data could be telling us, but also what might be missing or mis-represented in the data. This exercise is more about learning how to use Palladio to critically investigate data generated from historical documents than it is about finding new historical insights. You will also likely come up with questions about the data that don’t seem to be answered by the data documentation in this lesson. This is a real-life situation you’ll find yourself in when working with data produced by others. Figuring out what questions you would like to ask of the original data producers is a crucial part of critical data investigation.
Make sure to record answers to these questions in a word document or notebook to have available for our discussion during class.
Some practical tips from Miriam Posner before we begin:
A reminder that Palladio is still under development, so it can be buggy and slow!
Work slowly. Wait for an option to finish loading before you click it again or click something else.
Do not refresh the page or click the back button. You’ll lose your work.
On a related note: To start over, refresh the page.
Some additional tips from me:
You need to run this on a full desktop or laptop computer - it doesn’t work well with a touch screen!
Unfortunately, Palladio doesn’t entirely work with Firefox, especially the dynamic map visualizations. The Chrome internet browser seems to have the best performance for Palladio, however Safari functions as well.
Download the CSV file that we’ll work with.
These data describe a little over 4,100 sales by the fine art dealer M. Knoedler & Co. between roughly 1870-1970, as documented in data encoded from the handwritten stockbooks by staff at the Getty Provenance Index. These stockbooks were where Knoedler recorded details about the artworks that entered their inventory, when and where they bought them from and for how much, and (if sold) who the eventual buyer was.
Before forging ahead with the rest of the walkthrough, read the following essays on the Knoedler archive at the Getty, and the data project to encode the stockbooks, which are a small part of the full archive of Knoedler’s business documents:
In order to keep this introductory lesson straightforward and within the performance limits of Palladio, this is only a tiny slice of the full Knoedler data, which cover over 40,000 entries.
The data in nyc_knoedler.csv
covers only those cases in which:
Filtering the original 40,000+ entries based on these criteria results in the ~4,100 entries in the dataset we will use for this exercise.
I’ve also enhanced these data with some important modifications that aren’t included in the original Knoedler data:
For the sake of simplicity, I’ve drastically trimmed the number of variables for each of these transactions. This means some complexities that are represented in the Knoedler data, like transactions of multiple objects, as well as joint purchases by Knoedler and another dealer, are flattened here.
field | description |
---|---|
title |
Title of the work (if recorded) |
artists |
Creator(s) of the artwork (if recorded. GPI edtiors recorded the original spelling as written by Knoedler, but also recorded a standardized version if they could identify the artist [e.g. turning “J. Sargent” into “SARGENT, JOHN SINGER”]. This field holds the standardized versions.) |
artist_nationality |
Creator(s) nationalities (input by modern editors) |
genre |
Genre of the work (input by modern editors) |
object_type |
e.g. Painting , Drawing , Sculpture (input by modern editors) |
height |
Height in inches (if recorded) |
width |
Width in inches (if recorded) |
area |
Area in square inches (if recorded) |
seller |
Name of seller (Standardized in a similar manner to the artists. Numeric ID if anonymous/unknown) |
seller_type |
e.g. Dealer , Museum , Artist , Collector |
buyer |
Name of buyer (Standardized in a similar manner to the artists. Numeric ID if anonymous/unknown) |
buyer_type |
e.g. Dealer , Museum , Artist , Collector |
buyer_address |
Buyer address |
coordinates |
Coordinates in the format lat,lon |
purchase_date |
Date object brought into Knoedler stock in the format YYYY-MM-DD |
sale_date |
Date object sold out of Knoedler stock in the format YYYY-MM-DD |
purchase_price |
Price Knoedler paid to buy the object (normalized to 1900 USD) |
sale_price |
Price Knoedler received for selling the object (normalized to 1900 USD) |
The dataset we are using is several degrees removed from the original historical events of Knoedler’s sales and purchases of artworks. Based on all the description here, reflect on what information might be lost at each of these stages.
(I suggest you make some initial notes based on what you’ve read so far, then return to these questions after you have finished the rest of the walkthrough in order to add more examples that jump out once you start to visualize the data):
Navigate to https://hdlab.stanford.edu/palladio/ and click on the “Start” button.
Find the nyc_knoedler.csv
file that you donwloaded to your computer, and drag it into the window where it says “Load .csv or spreadsheet”. You should see text fill the box. Click the “Load” button below.
You should now see the data loaded into Palladio, represented as a list of all the field names in the original CSV, with 4193. rows. Where it says “Provide a title to this project”, let’s name it “Knoedler”, where the table name says “Untitled”, write in “New York Sales”.
The data view shows the fields (i.e. columns) from our spreadsheet, and also shows what type of variable Palladio has guessed our data are supposed to be. We’ve got a few text fields here, as well as date fields, and a coordinates field.
Palladio tries to check for some simple irregularities in our data, like odd characters, and it’s highlighted those fields with a red dot. We can ignore most of these for now, as almost all those characters (like commas or dashes in the title field) are there on purpose.
That said, there are some exceptions. Click on the artists
label to inspect the data in that field. Palladio will show a list of all the unique values in that field, and the number of times that value shows up in the records. Across 4193 total records, there are only a little over 1,000 unique values in the artists
field.
You’ll notice an option to insert a delimiter - a character that Palladio can use to split the artists
field if it contains multiple artists. If you click on the special characters above the multiple values box, you can filter the values at left to look at the ones containing commas, semicolons, and hyphens. Figure out which one of those characters the GPI data uses to denote multiple different artists, and enter it into the multiple values box, then click “Close”.
You’ll need to also specify a delimiter for the genre
field. Repeat the same steps as you took for the artists
field.
artist
values to look for text contained inside brackets []
. These aren’t individual names, but categories instead. What kinds of categories do you see and what might explain their presence in the original data? How would we need to think carefully about counting or measuring these categories compared to individual artist names?Usually the first thing to do in Palladio is to go to the “Table” tab, where you can view the underlying records themselves and start to experiment with the different filters.
Once you click on the “Table” tab, go to the “Settings” in the upper right and select the “row” dimension - in our case, pick the field called generated
- this is a field that Palladio added to our original data and represents the row number in the original CSV.
Under “Dimensions” you can select several or all of the original CSV values to view in the table. To start, let’s add title
, artists
, artist_nationality
, genre
, object_type
, purchase_price
, and sale_price
.
You should now see a spreadsheet of the selected fields.
Now we can start “faceting”, or filtering the data based on different categorical variables, like buyer name or artist nationality. Click on the “Facet” button on the lower left corner, and in the lower right corner, use the “Dimensions” menu to select which variable we want to facet by. Let’s start with artist_nationality
. The facet bar will count up all the occurrences of each artist nationality and rank them. If you click on a single facet value, it will filter down the displayed table to only show that facet. Try looking through the artists and titles where the artist_nationality
is American
.
If you click again on the “Dimensions” box at right, you can add additional fields to do compound faceting, such as listing all the works with the genre
of landscape
by American artists.
We can also click on the “Timeline” filter to visualize and filter based on date. Palladio should already have recognized the purchase_date
column and created a timeline historgram for us. This is the date that the Knoedler gallery recorded purchasing the painting and adding it to their stock. You can drag and select a particular range if you like, and then drag that range around to dynamically update the table to show only the sales within the selected time span.
By default, the “Height” of the bars is based on the number of sale records in table at that date. But we can change it to instead reflect the sum of some numeric variable, like purchase price, sale price, or even the size of the artworks that we calculated based off their recorded height and width.
Experiment with this and see how these trends compare to what we see in the plain numbers of works that Knoedler sold.
Leaving the current timeline open, click on the “Timeline” button again to add a second one and try a different height metric to compare.
The “group by” field will tint the bars so you can further subdivide the timeline. Try this on categorical variables like artist_nationality
or object_type
.
Palladio also understands timespans, or activities that have a start and end date. After dismissing the Timeline filter with the red trash basket button, click on “Timespan”. We’ll need to tell it the correct start date (purchase_date
) and end date (sale_date
) columns. Now in addition to getting a map visualization, we’ll also be able to visualize the tempo at which Knoedler purchased and then sold their stock. You can experiment with different timespan visualization techniques by changing the “Layout” menu.
genre
or object_type
that Knoedler sold over this time period - both in terms of number of sales, summed prices, or even size of artworks. Try some combinations - what patterns do you notice?
genre
and object_type
in the data dictionary above, what caveats should we keep in mind when looking at visualizations of those categories?Now click on the “Map” button. Palladio starts you out with a plain coastline base map. Before adding our own data, we can enhance this base map by adding more “Tiles”. Click on “New Layer”, then click on the “Tiles” tab. You can see the different tile types to add to the base map - let’s add “Streets” by clicking on the “Streets” button. In the “Name” field above, type “Streets”, then click “Add Layer”. You should now see borders and cities showing up on your base map.
Now it’s time to add our own data. Click on “New Layer” again, and click the “Data” tab. We’ll be adding “Points” (the default option). Under the name, type “Sales Locations”. Click on “Places” and select coordinates
(the only option available, since this is the only data in our table that is formatted as a pair of coordinates). For the tooltip label (what we see when we roll over the points), let’s start with buyer_address
.
Check the box to size points, and do so according to number of New York Sales
, which will make points bigger when they match more rows in our table. Finish by clicking “Add Layer”, and then click on the hamburger button (the three small lines in the upper right corner) to minimize the layer configuration box.
You should see a few dots appearing over New York City. In the upper left corner, underneath the zoom in/out buttons, there’s a small button with a few nested squares, called “Zoom to data” - click on this to automatically zoom in so that our data points fill the screen.
Roll over the points to inspect the addresses matched to each one. Look for the particularly huge points - what do you notice about the list of addresses there? What could this indicate about the geocoding process? Is it likely that the geocoder didn’t recognize the exact address in its current-day database, and instead assigned it a generic midtown or uptown address based only on the street name.
Click on the hamburger icon at upper left to re-open the settings box, and then click on the pencil/edit icon next to the “Sales Locations” layer. Experiment with changing the tooltip to display the buyers instead, or one of the other variables in our dataset.
Following the same process we used to facet the Table view, let’s filter our map by facets. Try buyer
first. Palladio will count up how many sales for each buyer are in the data set. Click on a single entry (like “Hilton, Henry”) to filter the gallery to only display those. You’ll notice some buyers show up with multiple addresses.
You also may notice that some addresses haven’t been properly geocoded by the automated service. Visualizing data like this can be a great way to catch data errors that are hard to spot in a spreadsheet.
Click on the red trash basket icon on the lower right to dismiss the facet filter. (If the data visualization doesn’t update automatically, try zooming in and out a bit and that will usually force it to update.)
Using the same timeline filter that we did with the Table view, try selecting one decade of sales on the timeline to filter the map down to that decade. You can drag the selection box along the timeline to dynamically update the map as you move from 1870 to 1970. Take note of the changing geographic distribution of buyers as you do this.
One of Knoedler’s biggest impacts on the history of fine arts in the U.S. was how they funneled old master paintings from European sellers into American collections. We can use network visualizations to get a sense of what buyers were connected to what sellers, and how the shape of the “Knoedler network” changed over time. (As always, though, we need to bear in mind that this slice of the data set only contains their New York buyers!)
Click on the “Graph” option. We need to specify the variables for the source and target dimensions - use seller
and buyer
, respectively.
Don’t be surprised if your computer suddenly slows down for a few seconds: this is a very large network! It won’t be helpful for us to try and look at the whole thing at once, so before continuing, click on the “Timeline” filter at the bottom of the screen and drag-select only a few years of relationships to show at one time. Back in the settings menu at the upper right, check the “Highlight” box for “source” (so our sellers
get highlighted) and also check “Size nodes”, which will adjust the size of each buyer/seller circle based on the total number of connections they have.
The resulting network from just a few years of sales should be much more manageable. Take a close look at the particularly prolific sellers - the entities that Knoedler purchased their stock from. What do you notice about the names? You’ll find quite a few of them on Wikipedia - notice that they generally aren’t single individuals, but are other art dealers!
Like we did with the map, see if you can find any patterns when filtering by time. You can also try graphing other types of entity relationships. For example, try setting the “Source” field to artists
or artist_nationality
while keeping the “Target” field set to buyer
. This can give an impression of how different buyers may have targeted their purchases - or how Knoedler may have steered their assets to different parts of the market.
Finally, you may notice “Knoedler’s” is in this network… but if this is only a network of people/institutions Knoedler bought from and then sold to, why would they appear in the network? The answer can only be found by understanding the archival context:
Because of their many branches, Knoedler often makes entries that appear as though it’s buying from itself! Depending on what kinds of questions we are asking, we might end up filtering out these sales from our data… or choose to inspect those records even more closely. Use a “Facet” by seller
to try and show all sales except the one where “Knoedler’s” is the seller.
seller_type
and buyer_type
as part of the data to distinguish between individual collectors and artists vs. entities like dealers or museums. Try creating a network with the source being seller_type
and the target being buyer_type
. Try visualizing the changing network over time using the timeline facet. What do you notice about the changing preponderance of different buyer or seller types over time?artist
, artist_nationality
, or genre
- can you find collectors who seemed to buy lots of one particular nationality or genre? Conversely, can you find collectors who bought a very diverse set of works?Although you cannot export interactive visualizations from Palladio, you can save static images based on your representations. In the Settings menus for any of the visualizations, click the “Download” button to generate a .svg image of your visualization.
Make sure to go back over the first set of reflection questions and add any further observations you had during this exercise about the dataset and what it shows vs. what it omits.
With the right data set, Palladio can also display images, and do some fancy things like overlaying networks onto maps, and join multiple data tables together. Optionally, you can experiment with these possibilities by using the Amsterdam depictions dataset from an earlier version of this workshop.. This dataset includes links to images, which will let you try out Palladio’s “Gallery” view, which can be very useful for examining subsets of your data and bringing your analyses back to the artwork.
Finally, remember that Palladio is a tool expressly designed for initial data explorations. Stanford’s Humanities+Design lab specifically intends it to be the starting point of a project, after which you move into more specialized tools. Palladio has some pretty strict limits:
Dr. Sandra van Ginhoven led this work, relying predominantly on Markus A. Denzel’s Handbook of World Exchange Rates, 1590-1914 (Burlington: Ashgate, 2010) for currency conversion, and Samuel H. Williamson’s resource “Annual Consumer Price Index for the United States, 1774-2014,” (2016) https://www.measuringworth.com/uscpi/result.php to adjust prices based on US CPI. ↩