ulysses-textencoding
the special problems of creating a set of high-quality encodings of Joyce's Ulysses
2015-10-17
for Ulysses, we need: newspaper-headlines, dialog-speaker, stage-directions, song-lyrics, quote-dash (so css can restyle it), various other odd indents…
we want parallel versions in HTML, CSS, ascii, ASCIIDOC, epub, pdf?
@timfinnegan: Are these specifically just things that get special typographical treatment, or are there other types of things (e.g. interior monologue?) you’d like encoded?
the real priority is typos and italics, everything else is gravy… but eventually we can try to systematize eg styles of indenting
2015-10-19
i just redownloaded Sigil to edit zeigermann’s variorum: http://www.mobileread.com/forums/showthread.php?t=248272
i’m thinking we should be able to export all five variants in some set of formats: plain ascii, ASCIIDOC, plain html, css-html? and post them somewhere like github(?) where they can be collectively tweaked
zeigermann sounds like he’s the best proofreader in town
i’m not seeing where he marks the typos? do we have to do some kind of file-compare command in sigil?
i just invited zeigermann to join us here
(downloaded Calibre too, still looking for ‘diff’ command) once we have five versions in some convenient format, we should be able to generate ‘repairkits’ that specify how they all differ
z5 embeds 1922 pagenumbers with links to the other four, i hope we can strip these, and use regular expressions to massage the epub/html
i see css classes like: noindd, stage, spk, centerh
for the simplest .txt variant, if we impose a line length we can use spaces to indent blockquotes
there’s only one online concordance i know of, that seems like it’s written in COBOL
every joycean needs a clean single file on their desktop that they can do string searches in
in circe p412-413 there’s a sneaky problem with a 3-paragraph stage direction. z5 chooses not to italicise the parens for some reason, so it gets especially messy here
z5 classes include: ‘noindd’ = no indent dialog (always emdash w/o any spaces?); ‘stage’ = stage directions; ‘spk’ = speaker for Circe; ‘centerh’ = centered headline for Eolus; ‘stanza’; ‘center’
i just noticed that italic phrases seem to require closing a doublequote in a paragraph, listing the italic phrase, then reopening the doublequote? …hmmm, the doublequotes are added whenever the <p> and </p> are on separate lines… is this an epub thing?
txt and asciidoc can both use underline-style_italics. i don’t understand asciidoc’s blockquote command yet http://asciidoctor.org/docs/asciidoc-syntax-quick-reference/
oops, slack converted my underlines
did i read somewhere that http://archive.org doesn’t like multiple editions of the same text? eg only 1922 not 1984 also? and couldn’t they handle the format-conversions if they did accept the z5 editions from us?
fumbling in Calibre i converted z5 to 7Mb plain txt… maybe there’s a setting that preserves italics?
2015-10-20
…indeed, exporting txt as ‘markdown’ renders italics as asterisks (it also adds backslashes before lots of punctuation marks: *(Thickveiled, a crimson halter round her neck, a copy of the* Irish Times *in her hand, in tone of reproach, pointing.)* Henry! Leopold! Lionel, thou lost one! Clear my name.)
(urk, i added my own backslashes before the asterisks so slack wouldn’t render them as bold, but now slack prints my backslashes)
anyway, the regexps to convert markdown to txt-with-underline-style-italics don’t look too hard… in fact, they’re trivial if we don’t replace every ‘italic-space’ with an underline
so i’m gonna dive in and start massaging this 7Mb txt file with all five editions: removing pagenumbers, exploring spacebar-style-indent-and-center…
the speakers in Circe export as all-lowercase, grrr
Calibre will export ‘htmlz’ which you can unzip to a 10Mb html file if you change ‘htmlz’ to ‘zip’. The html has 23 special classes (like ‘calibre23’) but search-and-replace leaves a pretty promising html markup.
Windows Notepad++ freezes with the 10Mb html, but seems okay once I split it once.
here’s a survey of the encodings available now: http://ulyssespages.blogspot.com/2015/10/ulysses-etexts.html
if we can do any single edition with all typos corrected, using a minimal html that will allow others to restyle it simply… and then offer ‘repairkits’ so people can examine the debatable variants and customize their own however they want…?
and then export whatever other formats people seem to want (txt, asciidoc, pdf)
questions for css experts: 1) can a stylesheet set the width of the spacing after a dialog-dash? (some editions have a space, some not)
2) can a stylesheet consistently omit the dialog dashes before spoken stanzas? (the html file would have the dashes, but would be specially tagged so the designer could ideally choose not to display them)
3) stanzas are left-aligned but indented different amounts depending on the length of the longest line, which seems approximately centered– is this do-able?
2015-10-21
z5 doesn’t italicize the parens around stage directions, just the stage directions themselves. i’m always conflicted about similar punctuation quandaries– eg quotes and exclamation marks. is there a consensus of designers? #textencoding
joyce obviously knew he ‘d be putting his typesetters thru their paces, with embedded correspondence, addresses, and ads (eg Plumtree’s). he might even have foreseen our aristotelian efforts to formally classify these.
css worst case? some people think the opening line of finnegans wake should be indented to where the last line of the last chapter stops (because they form a continuous sentence)
2015-10-22
i’ve finished a firstdraft overview of the formatting differences between four editions currently available to me: http://ulyssespages.blogspot.com/2015/10/ulysses-etexts.html
mulligan’s play in ch9, and bloom’s budget in ch17 are special problems; the rest is pretty much indents and aligns and whitespace
it’s pretty clear my dream is impractical, of a Perfect Markup that allows designers to pick and choose design variants. but some simplification of this must be practical.
i’m looking at ‘tei lite’ and find: l (verse line), lg (line group), sp (speech), speaker, and stage which make sense
can these all be implemented as classes of ‘p’s? if each ‘l’ is a ‘p’ can an ‘lg’ be a ‘p’ too? (i’m guessing not)
the alternative is to use br’s within lg’s… but once you open that door the whole book becomes an lg and every linebreak a br
2015-10-23
circe is one of the easiest episodes to mark up, i think. sp, speaker, and stage can all be classes of p with the line spacing adjustable by the stylesheet.
one stage direction (p412) is split into 3 paragraphs, which don’t work as 3 separate stage directions, so are we left with br’s again? what if we want to indent the 1st line of each???
(is it possible joyce specifically intended to subvert genre by systematically embedding every class of paragraph within every other?)
there’s something esthetically self-defeating about trying to fit art into hierarchies like tei (surely this has been argued to death somewhere?)
if joyce had written ulysses as hypermedia, with an infinite budget to recreate visuals… but no, he knew his strength was the sound of words, not their look… it’s impossible to imagine any human ever with a mastery of visual media comparable to joyce’s godlike mastery of language, precisely assigning the best color to every individual pixel…
so the #1 priority needs to be getting the sound right, not introducing any novelties that modify joyce’s expression of the sound…?
bloom’s budget p664 must be read linearly but typeset as columns. mulligan’s play at p208 needs to read right, but trying to approximate that extralarge curly bracket in media that don’t support it is misguided if the approximation disrupts the sound, eg: https://en.wikisource.org/wiki/Page:Ulysses,_1922.djvu/211
i think the best solution for p208 is to drop the bracket and move the two names onto one line. similarly for p664, putting column two after/below column one is the best you can do in some media.
other places where the original alignment may not be possible/ convenient: milly’s letter p64, martha’s envelope p69 and letter p75, the leading ellipses on p133, the chant at p189, and the elizabethan indents at p194-195… but that may be all– the rest is simple css
this helps: <style> .right {position: absolute; right: 10%; width: 300px; }</style>
‘dot leaders’ are being added to css, but there’s a workaround: http://www.w3.org/Style/Examples/007/leaders.en.html
…a messy workaround: “The pseudo-element fills the whole width of the list item with dots and the SPANs are put on top. A white background on the SPANs hides the dots behind them and an ‘overflow: hidden’ on the UL ensures the dots do not extend outside the list”
are html tables okay for bloom’s budget? http://www.w3schools.com/html/html_tables.asp
(there’s a suggestion that tables can fake dot-leaders) https://www.quora.com/What-is-the-best-way-to-fill-a-span-with-a-character-in-valid-HTML/answer/Garrick-Saito
if the verse on p133 is done as a 3-line, one-column table, and right-justified, will extra dots to the left just politely vanish?
this looks promising:
.outer-div {width:60%; margin-left:auto; margin-right:auto; text-align:right; overflow:hidden; white-space: nowrap;} .inner-div {float:right;}
<div class=”outer-div”>
<div class=”inner-div”>
. . . . . . . . . . . . . . . . . . . . . . . . . . la tua pace<br />
. . . . . . . . . . . . . . . . . . . . che parlar ti piace<br />
. . . . . . . . . . . mentreche il vento, come fa, si tace.<br />
</div></div>
(haven’t a clue how it works though)
2015-10-26
i always feel like i’m taking my life in my hands when i mess with blogger’s css, but here’s a live version of the above: http://ulyssespages.blogspot.com/2014/08/page-133-7706-736-psha-thereof.html
i’m almost afraid to ask for feedback whether it looks wrong on other platforms
the dots cut off nicely at the same place on my mac/latest firefox
2015-10-28
@timfinnegan: Just noticed you brought Infinite Ulysses’ annotations over the one-thousand mark. Thank you! (I have all of next week dedicated to work on the site, so I should be able to finish some longstanding to-do tasks there…)
i’ve gotten bogged down trying to sort out the very messy clues to the sequence of the blooms’ residences
2015-10-31
here’s markdown for screenplays, for the txt editions? http://fountain.io/syntax
2015-11-01
ulysses’ toplevel has 3-12-3 episodes, but several episodes require distinct markup: ep10 has 19 subsections, with crossreferenced insertions, including an insertion from ep11
ep15 is a script, and ep9 also includes some script. ep7 has headlines. ep17 has q&a. there are several embedded documents, eg letters and ads. most eps have dialog and lyrics and interior monolog and points of view and geographical locations…
joyce almost seems intentionally to have made the text encoding as difficult as possible
…but rewarding, for all that
2015-11-07
it sounds like http://juxtasoftware.org could make it easier to study the differences in the z5 texts…. i’m still floundering on getting html out of the z5 epub, but beautifulsoup may help. comparing episode-by-episode is bound to be easier than whole-book by whole-book. but if juxta needs xml how hard is that conversion?
I’ve used Juxta on two versions of Ulysses (plain text, not xml) and liked it.
2015-11-09
here’s a 60pg 2004 survey of the ulysses-editions problem: https://www.academia.edu/2531525/Ulysses_in_the_Plural_The_Variable_Editions_of_Joyces_Novel
@timfinnegan: Cool! This is very useful.