the special problems of creating a set of high-quality encodings of Joyce's Ulysses

2015-10-17

timfinnegan
12:32:48 PM

for Ulysses, we need: newspaper-headlines, dialog-speaker, stage-directions, song-lyrics, quote-dash (so css can restyle it), various other odd indents…


timfinnegan
12:34:35 PM

we want parallel versions in HTML, CSS, ascii, ASCIIDOC, epub, pdf?


literature_geek
07:34:39 PM

@timfinnegan: Are these specifically just things that get special typographical treatment, or are there other types of things (e.g. interior monologue?) you’d like encoded?


timfinnegan
08:16:54 PM

the real priority is typos and italics, everything else is gravy… but eventually we can try to systematize eg styles of indenting


2015-10-19

timfinnegan
04:12:00 AM

i just redownloaded Sigil to edit zeigermann’s variorum: http://www.mobileread.com/forums/showthread.php?t=248272


timfinnegan
04:15:11 AM

i’m thinking we should be able to export all five variants in some set of formats: plain ascii, ASCIIDOC, plain html, css-html? and post them somewhere like github(?) where they can be collectively tweaked


timfinnegan
04:16:36 AM

zeigermann sounds like he’s the best proofreader in town


timfinnegan
04:19:34 AM

i’m not seeing where he marks the typos? do we have to do some kind of file-compare command in sigil?


timfinnegan
04:34:49 AM

i just invited zeigermann to join us here


timfinnegan
05:13:22 AM

(downloaded Calibre too, still looking for ‘diff’ command) once we have five versions in some convenient format, we should be able to generate ‘repairkits’ that specify how they all differ


timfinnegan
05:17:16 AM

z5 embeds 1922 pagenumbers with links to the other four, i hope we can strip these, and use regular expressions to massage the epub/html


timfinnegan
05:21:41 AM

i see css classes like: noindd, stage, spk, centerh


timfinnegan
05:29:14 AM

for the simplest .txt variant, if we impose a line length we can use spaces to indent blockquotes


timfinnegan
05:31:46 AM

there’s only one online concordance i know of, that seems like it’s written in COBOL


timfinnegan
05:32:48 AM

every joycean needs a clean single file on their desktop that they can do string searches in


timfinnegan
06:29:03 AM

in circe p412-413 there’s a sneaky problem with a 3-paragraph stage direction. z5 chooses not to italicise the parens for some reason, so it gets especially messy here


timfinnegan
06:38:33 AM

z5 classes include: ‘noindd’ = no indent dialog (always emdash w/o any spaces?); ‘stage’ = stage directions; ‘spk’ = speaker for Circe; ‘centerh’ = centered headline for Eolus; ‘stanza’; ‘center’


timfinnegan
06:44:22 AM

i just noticed that italic phrases seem to require closing a doublequote in a paragraph, listing the italic phrase, then reopening the doublequote? …hmmm, the doublequotes are added whenever the <p> and </p> are on separate lines… is this an epub thing?


timfinnegan
07:01:19 AM

txt and asciidoc can both use underline-style_italics. i don’t understand asciidoc’s blockquote command yet http://asciidoctor.org/docs/asciidoc-syntax-quick-reference/


timfinnegan
07:01:50 AM

oops, slack converted my underlines


timfinnegan
07:17:05 AM

did i read somewhere that http://archive.org doesn’t like multiple editions of the same text? eg only 1922 not 1984 also? and couldn’t they handle the format-conversions if they did accept the z5 editions from us?


timfinnegan
09:21:12 PM

fumbling in Calibre i converted z5 to 7Mb plain txt… maybe there’s a setting that preserves italics?


2015-10-20

timfinnegan
03:26:28 AM

…indeed, exporting txt as ‘markdown’ renders italics as asterisks (it also adds backslashes before lots of punctuation marks: *(Thickveiled, a crimson halter round her neck, a copy of the* Irish Times *in her hand, in tone of reproach, pointing.)* Henry! Leopold! Lionel, thou lost one! Clear my name.)


timfinnegan
03:29:24 AM

(urk, i added my own backslashes before the asterisks so slack wouldn’t render them as bold, but now slack prints my backslashes)


timfinnegan
03:33:14 AM

anyway, the regexps to convert markdown to txt-with-underline-style-italics don’t look too hard… in fact, they’re trivial if we don’t replace every ‘italic-space’ with an underline


timfinnegan
03:36:00 AM

so i’m gonna dive in and start massaging this 7Mb txt file with all five editions: removing pagenumbers, exploring spacebar-style-indent-and-center…


timfinnegan
04:01:34 AM

the speakers in Circe export as all-lowercase, grrr


timfinnegan
05:59:45 AM

Calibre will export ‘htmlz’ which you can unzip to a 10Mb html file if you change ‘htmlz’ to ‘zip’. The html has 23 special classes (like ‘calibre23’) but search-and-replace leaves a pretty promising html markup.


timfinnegan
07:10:13 AM

Windows Notepad++ freezes with the 10Mb html, but seems okay once I split it once.


timfinnegan
04:02:57 PM

here’s a survey of the encodings available now: http://ulyssespages.blogspot.com/2015/10/ulysses-etexts.html


timfinnegan
04:07:26 PM

if we can do any single edition with all typos corrected, using a minimal html that will allow others to restyle it simply… and then offer ‘repairkits’ so people can examine the debatable variants and customize their own however they want…?


timfinnegan
04:08:54 PM

and then export whatever other formats people seem to want (txt, asciidoc, pdf)


timfinnegan
02:54:53 AM

questions for css experts: 1) can a stylesheet set the width of the spacing after a dialog-dash? (some editions have a space, some not)


timfinnegan
02:56:16 AM

2) can a stylesheet consistently omit the dialog dashes before spoken stanzas? (the html file would have the dashes, but would be specially tagged so the designer could ideally choose not to display them)


timfinnegan
02:58:10 AM

3) stanzas are left-aligned but indented different amounts depending on the length of the longest line, which seems approximately centered– is this do-able?


2015-10-21

timfinnegan
03:52:33 AM

z5 doesn’t italicize the parens around stage directions, just the stage directions themselves. i’m always conflicted about similar punctuation quandaries– eg quotes and exclamation marks. is there a consensus of designers? #textencoding


timfinnegan
04:44:21 AM

joyce obviously knew he ‘d be putting his typesetters thru their paces, with embedded correspondence, addresses, and ads (eg Plumtree’s). he might even have foreseen our aristotelian efforts to formally classify these.


timfinnegan
02:32:12 PM

css worst case? some people think the opening line of finnegans wake should be indented to where the last line of the last chapter stops (because they form a continuous sentence)


2015-10-22

timfinnegan
05:10:58 AM

i’ve finished a firstdraft overview of the formatting differences between four editions currently available to me: http://ulyssespages.blogspot.com/2015/10/ulysses-etexts.html


timfinnegan
05:15:42 AM

mulligan’s play in ch9, and bloom’s budget in ch17 are special problems; the rest is pretty much indents and aligns and whitespace


timfinnegan
05:20:46 AM

it’s pretty clear my dream is impractical, of a Perfect Markup that allows designers to pick and choose design variants. but some simplification of this must be practical.


timfinnegan
02:44:39 AM

i’m looking at ‘tei lite’ and find: l (verse line), lg (line group), sp (speech), speaker, and stage which make sense


timfinnegan
02:51:36 AM

can these all be implemented as classes of ‘p’s? if each ‘l’ is a ‘p’ can an ‘lg’ be a ‘p’ too? (i’m guessing not)


timfinnegan
02:55:51 AM

the alternative is to use br’s within lg’s… but once you open that door the whole book becomes an lg and every linebreak a br


2015-10-23

timfinnegan
03:14:39 AM

circe is one of the easiest episodes to mark up, i think. sp, speaker, and stage can all be classes of p with the line spacing adjustable by the stylesheet.


timfinnegan
03:19:20 AM

one stage direction (p412) is split into 3 paragraphs, which don’t work as 3 separate stage directions, so are we left with br’s again? what if we want to indent the 1st line of each???


timfinnegan
03:28:22 AM

(is it possible joyce specifically intended to subvert genre by systematically embedding every class of paragraph within every other?)


timfinnegan
03:30:37 AM

there’s something esthetically self-defeating about trying to fit art into hierarchies like tei (surely this has been argued to death somewhere?)


timfinnegan
03:42:28 AM

if joyce had written ulysses as hypermedia, with an infinite budget to recreate visuals… but no, he knew his strength was the sound of words, not their look… it’s impossible to imagine any human ever with a mastery of visual media comparable to joyce’s godlike mastery of language, precisely assigning the best color to every individual pixel…


timfinnegan
03:46:39 AM

so the #1 priority needs to be getting the sound right, not introducing any novelties that modify joyce’s expression of the sound…?


timfinnegan
04:09:38 AM

bloom’s budget p664 must be read linearly but typeset as columns. mulligan’s play at p208 needs to read right, but trying to approximate that extralarge curly bracket in media that don’t support it is misguided if the approximation disrupts the sound, eg: https://en.wikisource.org/wiki/Page:Ulysses,_1922.djvu/211


timfinnegan
04:16:50 AM

i think the best solution for p208 is to drop the bracket and move the two names onto one line. similarly for p664, putting column two after/below column one is the best you can do in some media.


timfinnegan
04:41:05 AM

other places where the original alignment may not be possible/ convenient: milly’s letter p64, martha’s envelope p69 and letter p75, the leading ellipses on p133, the chant at p189, and the elizabethan indents at p194-195… but that may be all– the rest is simple css


timfinnegan
10:14:41 AM

this helps: <style> .right {position: absolute; right: 10%; width: 300px; }</style>


timfinnegan
11:19:19 AM

‘dot leaders’ are being added to css, but there’s a workaround: http://www.w3.org/Style/Examples/007/leaders.en.html


timfinnegan
11:21:49 AM

…a messy workaround: “The pseudo-element fills the whole width of the list item with dots and the SPANs are put on top. A white background on the SPANs hides the dots behind them and an ‘overflow: hidden’ on the UL ensures the dots do not extend outside the list”


timfinnegan
11:30:14 AM

are html tables okay for bloom’s budget? http://www.w3schools.com/html/html_tables.asp


timfinnegan
11:32:15 AM

timfinnegan
11:38:28 AM

if the verse on p133 is done as a 3-line, one-column table, and right-justified, will extra dots to the left just politely vanish?


timfinnegan
06:30:48 PM

this looks promising:


timfinnegan
06:32:33 PM

.outer-div {width:60%; margin-left:auto; margin-right:auto; text-align:right; overflow:hidden; white-space: nowrap;} .inner-div {float:right;}

<div class=”outer-div”> <div class=”inner-div”>
. . . . . . . . . . . . . . . . . . . . . . . . . . la tua pace<br /> . . . . . . . . . . . . . . . . . . . . che parlar ti piace<br /> . . . . . . . . . . . mentreche il vento, come fa, si tace.<br /> </div></div>


timfinnegan
06:34:45 PM

(haven’t a clue how it works though)


2015-10-26

timfinnegan
06:04:42 AM

i always feel like i’m taking my life in my hands when i mess with blogger’s css, but here’s a live version of the above: http://ulyssespages.blogspot.com/2014/08/page-133-7706-736-psha-thereof.html


timfinnegan
06:05:52 AM

i’m almost afraid to ask for feedback whether it looks wrong on other platforms


literature_geek
08:47:20 AM

the dots cut off nicely at the same place on my mac/latest firefox


2015-10-28

literature_geek
10:41:55 AM

@timfinnegan: Just noticed you brought Infinite Ulysses’ annotations over the one-thousand mark. Thank you! (I have all of next week dedicated to work on the site, so I should be able to finish some longstanding to-do tasks there…)


timfinnegan
12:32:42 PM

i’ve gotten bogged down trying to sort out the very messy clues to the sequence of the blooms’ residences


2015-10-31

timfinnegan
06:48:52 PM

here’s markdown for screenplays, for the txt editions? http://fountain.io/syntax


2015-11-01

timfinnegan
04:19:14 AM

ulysses’ toplevel has 3-12-3 episodes, but several episodes require distinct markup: ep10 has 19 subsections, with crossreferenced insertions, including an insertion from ep11


timfinnegan
04:25:02 AM

ep15 is a script, and ep9 also includes some script. ep7 has headlines. ep17 has q&a. there are several embedded documents, eg letters and ads. most eps have dialog and lyrics and interior monolog and points of view and geographical locations…


timfinnegan
04:26:42 AM

joyce almost seems intentionally to have made the text encoding as difficult as possible


timfinnegan
04:27:03 AM

…but rewarding, for all that


2015-11-07

timfinnegan
03:42:02 AM

it sounds like http://juxtasoftware.org could make it easier to study the differences in the z5 texts…. i’m still floundering on getting html out of the z5 epub, but beautifulsoup may help. comparing episode-by-episode is bound to be easier than whole-book by whole-book. but if juxta needs xml how hard is that conversion?


literature_geek
09:29:32 AM

I’ve used Juxta on two versions of Ulysses (plain text, not xml) and liked it.


2015-11-09

timfinnegan
05:46:18 AM

here’s a 60pg 2004 survey of the ulysses-editions problem: https://www.academia.edu/2531525/Ulysses_in_the_Plural_The_Variable_Editions_of_Joyces_Novel


literature_geek
08:29:53 AM

@timfinnegan: Cool! This is very useful.