Matthew Lincoln, PhD Cultural Heritage Data & Info Architecture


Jehna has come up with the CoffeeScript of regular expressions: VerbalExpressions, a JavaScript library (already implemented in a host of other languages, including Ruby) that makes regex almost human-writeable. (tip: The Changelog via Dave)

This is going to really ease parsing datasets with idiosyncratic conventions. See how the Ruby implementation works on the location headings in the ULAN that annoyingly concatenate unique id numbers with preferred terms:

require 'verbal_expressions'

location = "5600392409/New York City (New York state, United States) (inhabited place)"

num_query = do
	anything_but "/"

puts num_query.source # => ^(?:[^/]*)

content_query = do
	find "/"
	anything_but "("

puts content_query.source # => (?:/)(?:[^\(]*)

puts location.slice(num_query) # => 5600392409
puts location.slice(content_query) # => New York City

Comments are enabled via

Cite this post:

Lincoln, Matthew D. "VerbalExpressions." Matthew Lincoln, PhD (blog), 06 Aug 2013,

Tagged in: Code