Matthew Lincoln, PhD Cultural Heritage Data & Info Architecture

VerbalExpressions

Jehna has come up with the CoffeeScript of regular expressions: VerbalExpressions, a JavaScript library (already implemented in a host of other languages, including Ruby) that makes regex almost human-writeable. (tip: The Changelog via Dave)

This is going to really ease parsing datasets with idiosyncratic conventions. See how the Ruby implementation works on the location headings in the ULAN that annoyingly concatenate unique id numbers with preferred terms:

require 'verbal_expressions'

location = "5600392409/New York City (New York state, United States) (inhabited place)"

num_query = VerEx.new do
	start_of_line
	anything_but "/"
end

puts num_query.source # => ^(?:[^/]*)

content_query = VerEx.new do
	find "/"
	anything_but "("
end

puts content_query.source # => (?:/)(?:[^\(]*)

puts location.slice(num_query) # => 5600392409
puts location.slice(content_query) # => New York City

Comments are enabled via Hypothes.is


Cite this post:

Lincoln, Matthew D. "VerbalExpressions." Matthew Lincoln, PhD (blog), 06 Aug 2013, https://matthewlincoln.net/2013/08/06/verbalexpressions.html.


Tagged in: Code