textencoding

paregorios

01:44:18 PM

I guess this is the place

raffazizzi

01:47:50 PM

yup. It we’re not careful it may soon turn into venting about overlapping hierarchies

paregorios

01:48:10 PM

eek

mdlincoln

01:48:34 PM

This popped up in my feed today: sentiment analysis being no better than a coin flip: https://relativeinsight.com/zurich-university-sentiment-analysis-is-a-statistical-flip-of-a-coin/ (@edsu this might be useful to show ppl in response to questions about Twitter analytics)

raffazizzi

01:51:03 PM

@mdlincoln: I’d post that again in #announcements :simple_smile:

mdlincoln

01:53:17 PM

blerg - guess who the slack newbie is :sweat:

paregorios

01:13:54 PM

http://www.tei-c.org/release/doc/tei-p5-doc/readme-2.9.1.html

paregorios

01:14:26 PM

now that TEI is on github, is there a way for us to autotrack/announce releases here in slack?

raffazizzi

02:07:55 PM

That’s a good idea! Let me know how you get along - happy to jump in if needed

paregorios

02:09:38 PM

they assume you’re a member of the dev team on a given project at github

paregorios

02:14:14 PM

@raffazizzi: I’ve subscribed the channel to the TEI Guidelines releases Atom feed at https://github.com/TEIC/TEI/releases.atom

raffazizzi

02:14:23 PM

well I am :stuck_out_tongue: so maybe I can set it up

raffazizzi

02:14:52 PM

oh, maybe that’s actually better - otherwise we’d be spammed with every push and comments on issues

paregorios

02:14:52 PM

I suppose it depends a bit on how much granularity we want

paregorios

02:14:55 PM

yeah

paregorios

02:14:58 PM

zactly

raffazizzi

02:15:02 PM

:+1:

paregorios

02:15:11 PM

we can see how it goes

timfinnegan

03:45:56 AM

james joyce’s works just came out of copyright in 2011, and there seems to be some residual timidity/ inertia that’s left us with a dismal selection of etexts that mostly use CAPS for italics… which I’m looking to fix. I expect we’ll be generating dozens of new offerings in the near future, but I’m not sure what archives are serious about supporting multiple editions of classic texts, in multiple formats, with maybe a little crowdsourced proofreading if that proves necessary. (I believe http://archive.org balks at the mutiple-editions principle?)

timfinnegan

03:54:55 AM

is there a consensus on when to italicize the punctuation around italic passages– eg parens and quotes and exclamation marks?

paregorios

08:07:30 AM

@timfinnegan: iirc Chicago Manual of Style et al address those sorts of issues; I think the answer generally is that adjacent punctuation is attracted to the text style of neighboring characters, but none of the neurons to which I presently have access remembers details.

timfinnegan

10:20:25 AM

hmm, that leads to the question: does the CMoS still apply to web css? eg if web rendering has different issues to address?

paregorios

02:22:19 PM

I’m not sure I really understand the issue. Italics are a convention of formatting of text for visual display. CSS is a mechanism for effecting formatting of visual display online. Neither constitutes semantic markup. If the issue is what to italicize in a web page, one decides what one wants to see. If it’s about setting up for post-processing on the basis of what the italics are perceived to mean, one either needs to intervene earlier (with appropriate rules) to markup the text with something unambiguous (i.e. other than italics) or one has to embed italics-inference rules in the post-processing context.

timfinnegan

02:25:36 PM

oh, i just wonder if cmos recommendations sometimes make css look bad, so web designers take exception

paregorios

02:27:30 PM

ah, dunno

paregorios

05:27:10 PM

xml catalogs as used in OxygenXML (the only place I’ve ever used them) seem backwards to me: https://www.oxygenxml.com/doc/versions/17.0/ug-editor/index.html#topics/using-XML-Catalogs.html

raffazizzi

10:47:23 AM

Hi people - the Text Encoding Initative Memebers Meeting conference has started today with pre-conference workshops

raffazizzi

10:47:36 AM

Follow it on twitter at #TEI2015

raffazizzi

10:48:15 AM

Anyone here at the conference? We could use this slack channel a backchannel during the conference

paregorios

03:45:30 PM

@wsalesky!

wsalesky

04:21:23 PM

@paregorios! :simple_smile:

benwbrum

05:34:43 AM

Last week was the first MEDEA workshop on encoding account books. I did a rotten job of tweeting, but would be happy to chat with anyone else working with account books about the issues that were raised there.

paregorios

08:32:46 AM

@benwbrum: I got to meet and talk with Kathryn Tomasek in DC a few weeks ago. Cool project!

benwbrum

09:56:49 AM

It is really neat, @paregorios. There was a broad spectrum of perspectives at the workshop, including linguists who hated the normalization the economic historians were doing (since it made the data unusable for historic linguistic and orthographic change) and of course economic historians who hated the verbatim-et-literatim encoding the linguists were doing in return (since the lack of normalization kills any quantitative economic analysis). That contrast was really valuable for those of us in the middle, who’d been foolishly toying with utopian encoding schemes.

paregorios

09:57:28 AM

sounds like a true humanities project!

martindholmes

02:55:18 AM

Hi all.

paregorios

08:28:16 AM

@benwbrum: for fun some ancient accounts (just the ones for which the http://papyri.info site has texts and images): http://tinyurl.com/np32pp2 marked up in TEIXML

benwbrum

08:39:58 AM

Thanks! I felt like there were two major gaps in representation at MEDEA – nobody was working with texts that pre-dated the 13th century, so we missed anything from the ancient world as well as any cuneiform accounts.

paregorios

08:41:23 AM

yeah, I’m not sure what the CDLI has in the way of the latter

benwbrum

08:41:26 AM

Hey – thanks to the link you sent, I found an account with currency as well as goods! http://papyri.info/ddbdp/p.cair.zen;4;59799/?q=PHRASE:%28abrechnung+OR+account%29&rows=3&start=19&fl=id%2Ctitle&fq=has_transcription%3Atrue&fq=%28images-int%3Atrue%29&fq=metadata%3A%28abrechnung+OR+account%29&sort=series+asc%2Cvolume+asc%2Citem+asc&p=20&t=113

benwbrum

08:43:20 AM

Interestingly, only Kathryn Tomasek was working with a non-obsolete currency. Even those of us dealing with the 19th-c US were working with “Virginia money” and such.

paregorios

08:43:21 AM

that’s an interesting record for a number of technical reasons …

paregorios

08:44:05 AM

if the aggregation there is working right it means that separate “papyri” held at separate institutions (Columbia and Harvard) are thought to be part of the same document.

benwbrum

08:46:01 AM

That is interesting. Has http://papyri.info done any work with IIIF? It’s designed to support such cases.

paregorios

08:46:24 AM

and that is indeed what one of the HGV records says: “Gehört zu PSI VI 625 und P.Cair. Zen. IV 59799”

paregorios

08:46:42 AM

that would be a question for @hcayless or @ryanfb

benwbrum

08:47:40 AM

OK.

paregorios

08:48:10 AM

this happens as I understand it not uncommonly with the papyri: artifact of the 19th and (especially) 20th century antiquities trade in and out of Egypt

paregorios

08:48:29 AM

documents were subdivided and sold separately in order to increase unit returns

benwbrum

08:50:41 AM

Apparently it’s common enough in medieval manuscripts to serve as a major motivation for IIIF.

paregorios

08:51:26 AM

actually there’s 3 fragments: one in the Egyptian museum in Cairo, one at Columbia, and one in Florence (per http://www.trismegistos.org/text/1424)

benwbrum

08:51:28 AM

They allow you to bring in several images from different sources to create a “page” on a canvas, or to pull different pages from different repositories into the same text.

paregorios

08:51:48 AM

that’s cool; it looks to me like we only have image here of the Columbia piece

paregorios

08:52:51 AM

one would have to run down the CE 1978 citations to see how they’re thought to fit together

paregorios

08:54:24 AM

I suspect you’ll find that the XML encoding of all of these doesn’t privilege the “account” document type very much. Numbers should be marked up as such, but otherwise, not.

paregorios

08:54:49 AM

e.g. http://papyri.info/ddbdp/p.cair.zen;4;59799/source

paregorios

08:56:38 AM

currencies aren’t glossed/called out, for example

hcayless

09:25:26 AM

I got a message that my name was being spoken. We don’t do IIIF yet, but plan to whenever we have the spare time

paregorios

09:28:53 AM

@hcayless: thanks

ryanfb

09:29:06 AM

Though we’re likely to do it first for images we host. External images is a whole other bag of worms

hcayless

09:30:03 AM

and by worms, he means dragons

hcayless

09:32:06 AM

they bring out the barbecue sauce, you better run

paregorios

09:41:05 AM

mmmmmm barbecue

paregorios

09:42:21 AM

which brings up an interesting encoding question …. how would one markup the pseudo-acronym BBQ in TEI?

benwbrum

09:43:12 AM

I’ve been adding IIIF support to FromThePage, and have struggled a bit with how to handle self-hosted pages vs. pages hosted elsewhere.

benwbrum

09:44:05 AM

One option is to generate manifests for both, with image services (I.e. URLs) that point to the FromThePage server, then use a “shim” like approach to proxy image calls to the actual image hosts.

benwbrum

09:44:24 AM

That seems like the only option for hosts that don’t actually support the IIIF Image API.

benwbrum

09:45:11 AM

However, the hosts I’m integrating with have either recently added support for IIIF (the Internet Archive) or have plug-ins or shims available to support it (Omeka).

benwbrum

09:46:09 AM

In that case it seems like the thing to do is to ingest an IIIF manifest from the host, then produce a modified manifest based on it which directs images to the original host, but provides transcripts via annotations hosted by my server.

edsu

09:46:17 AM

i thought if the CORS headers were set up correctly the images could be anywhere?

benwbrum

09:46:24 AM

They can, yes.

edsu

09:46:36 AM

ok, i guess i don’t understand the problem then :simple_smile:

benwbrum

09:47:02 AM

The problem isn’t ‘is this possible?’, it’s “what’s the best way to do this?”

benwbrum

09:47:41 AM

Generating manifests for someone else’s images is straightforward when adding additional content (transcripts & translations, in my case).

edsu

09:47:48 AM

i would try to avoid shims, personally

edsu

09:48:08 AM

i’m not sure if that helps :simple_smile:

benwbrum

09:48:36 AM

But I’d like to avoid losing information contained in the original manifests when I generate derivative manifests – like repository-specific metadata or non-transcript annotations.

benwbrum

09:49:09 AM

That may be unnecessary in the linked data world – I’m certainly new to LODLAM, and don’t really have my head around it.

edsu

09:49:33 AM

sounds like you have your head around pretty well to me

benwbrum

09:50:13 AM

So if http://papyri.info starts presenting transcripts along with IIIF, I’ll be watching closely.

edsu

09:50:30 AM

i would probably only aggregate what information you actually need for FromThePage

edsu

09:51:04 AM

and not worry too much about other data, that in theory would be good to have, but you have no use for at the moment

edsu

09:51:43 AM

let your application drive your decisions, rather than what seems like the right thing to do

edsu

09:52:11 AM

here ends my Pontification for the Day

benwbrum

09:52:35 AM

Thanks, @edsu. You sounded positively guru-like.

edsu

09:53:05 AM

i’m a total sham

edsu

09:53:12 AM

i also am LATE TO A MEETING ARGH!

edsu

09:53:15 AM

seeya

paregorios

12:06:44 PM

Anybody here have experience with handling <saxon:collation> in XSL in OxygenXML with Saxon? I have previously working code that now fails after an OxygenXML upgrade.

paregorios

12:33:47 PM

nm. with moral support on twitter from @wsalesky I read the fine manual and got with the program: http://www.saxonica.com/documentation/index.html#!extensibility/config-extend/collation/implementing-collation

suttonkoeser

05:20:00 PM

Anybody working with TEI + annotation, TEI facsimile, or markdown to TEI?

andersoncliffb

09:25:45 AM

@suttonkoeser: I’m not working with markdown to TEI, but I’m certainly very interested in the topic since I write frequently in both.

hcayless

09:36:00 AM

@suttonkoeser: there’s a markdown to TEI conversion in the TEI Stylesheets. Haven’t used it though. Not at this moment working on TEI annotation, but have and will again soonish.

suttonkoeser

09:42:33 AM

I presume this stylesheet is the one you mean, I found it in my initial searches: https://github.com/TEIC/Stylesheets/blob/master/markdown/markdown-to-tei.xsl
But since it’s based on regular expressions and has no comments, I didn’t think it would be very easy to modify or work with

suttonkoeser

09:45:07 AM

I’m working in python, so currently I’m using the mistune markdown parser (https://github.com/lepture/mistune) and creating a custom tei renderer.

hcayless

09:46:27 AM

I’ve absolutely no idea how it works, but find myself sharing responsibility for it :simple_smile:

hcayless

09:47:26 AM

I vaguely assume the sanest approach would be to convert to HTML and then transform to TEI

hcayless

09:47:52 AM

that way you’d stand a chance of flattening some of the variance in markdown flavors

hcayless

09:48:18 AM

but my ignorance here is vast and embarrassing :simple_smile:

suttonkoeser

09:50:04 AM

That’s interesting, hadn’t thought of going from markdown -> html -> tei. I suppose it would be ideal if the markdown was converted to html using the same library that users see when they enter and preview their content, but the markdown is getting entered as annotation content using annotator.js and the meltdown javascript library.

hcayless

09:51:11 AM

it depends what you’re doing really, whether it’s a generic markdown -> TEI converter or whether you’re targeting a specific flavor

hcayless

09:52:46 AM

sounds like you have a specific flavor in mind, so you might not need that

suttonkoeser

09:54:10 AM

Right. I think the markdown is pretty generic at the moment, only non-standard addition I’m aware of is footnotes (which is a fairly standard non-standard from what I can tell). But I guess we can decide what TEI output we want for our use case, and it doesn’t have to be a general solution for everyone. Although I think most of the tags I’m outputting are fairly standard

fmcc

09:55:44 AM

There seems to be some discussion of this kind of conversion with pandoc: https://github.com/jgm/pandoc/issues/2047

suttonkoeser

09:56:30 AM

@fmcc: interesting, good to know

fmcc

09:58:00 AM

doesn’t seem to have been much activity since then though

hcayless

11:28:25 AM

Just looking at the TEI Stylesheets implementation and I fear I share @suttonkoeser ’s suspicion of it (despite knowing it was built by super-smart people).

raffazizzi

09:17:58 AM

@suttonkoeser: Hi! Re: TEI+annotation, I’m very interested and worked on some aspects / have some plans. What are you working on?

suttonkoeser

12:48:15 PM

Hey @raffazizzi. The TEI + annotation work is in relation to readux, which is for our digitized books and an annotation/critical edition platform - http://readux.library.emory.edu/ for the site, https://github.com/emory-libraries/readux for code. I have a new release almost out the door that supports annotation with annotator.js, and we’re generating TEI facsimile from a couple of different OCR xml formats to support that.

suttonkoeser

12:49:40 PM

The next step, which I’ve started working on, is to generate a TEI export of a single volume packaged with all of a user’s annotations for that volume, so that we have it as an artifact and as an interim step to generating a more user-friendly annotated edition

suttonkoeser

12:57:16 PM

We’ve decided for our purposes that we don’t need all of the annotation data (in particular annotation created/updated don’t seem relevant in the TEI export), but I think we’ve figured out a reasonable way to map the annotation data to TEI notes and insert annotation reference markers into the TEI facsimile data.

suttonkoeser

12:58:04 PM

If anyone is interested, I could share some examples of the TEI we’re generating. We did have some detail questions (like how to reference the annotated content), but I think we’ve come up with something workable.

literature_geek

03:18:00 PM

@suttonkoeser: I’m interested in examples of the TEI!

suttonkoeser

05:09:41 PM

@literature_geek: cool, I’ll share some soon - I think it will be a little easier to discuss when you can look at the annotation features in readux and see the tei facsimile that it’s using

hcayless

07:06:48 PM

@suttonkoeser: I’d be interested to see it too!

raffazizzi

10:38:35 AM

@suttonkoeser: sounds great! I’d also be interested in seeing some examples :simple_smile:

timfinnegan

11:10:56 AM

visualisation-god edward tufte’s views (once removed?) on css: https://edwardtufte.github.io/tufte-css/

textencoding

2015-10-15

2015-10-16

2015-10-17

2015-10-18

2015-10-19

2015-10-20

2015-10-21

2015-10-23

2015-10-25

2015-10-26

2015-10-27

2015-10-28

2015-10-29

2015-11-03

2015-11-04

2015-11-05

2015-11-06

2015-11-09

2015-11-10

2015-11-11

2015-11-14