VerbalExpressions
06 Aug 2013
Jehna has come up with the CoffeeScript of regular expressions: VerbalExpressions , a JavaScript library (already implemented in a host of other languages , including Ruby) that makes regex almost human-writeable. (tip: The Changelog via Dave )
This is going to really ease parsing datasets with idiosyncratic conventions. See how the Ruby implementation works on the location headings in the ULAN that annoyingly concatenate unique id numbers with preferred terms:
require 'verbal_expressions'
location = "5600392409/New York City (New York state, United States) (inhabited place)"
num_query = VerEx . new do
start_of_line
anything_but "/"
end
puts num_query . source # => ^(?:[^/]*)
content_query = VerEx . new do
find "/"
anything_but "("
end
puts content_query . source # => (?:/)(?:[^\(]*)
puts location . slice ( num_query ) # => 5600392409
puts location . slice ( content_query ) # => New York City
Lincoln, Matthew D. "VerbalExpressions." Matthew Lincoln, PhD (blog), 06 Aug 2013, https://matthewlincoln.net/2013/08/06/verbalexpressions.html.