Simon Will, Victor Zimmermann & Christoph Schaller
We provide a tool for measuring Latin verse, as well as a web application highlighting results and providing helpful annotation of phenomena that lead to this classification.
One defining aspect of most poetry in opposition to prose is that poetry is “bound speech”, i.e. speech bound by constraints on the form instead of the content. These constraints can take various forms: For example, they can concern rhyme patterns or the way a text is laid out on paper.
This piece of work, however, focuses on metric constraints, i. e. constraints that concern the rhythmic structure of the text in question: Each line of verse has to take a certain sequence of marked and unmarked syllables. Most poetry in the Germanic languages is bound by accentuating (or qualitative) metrics, which means that accented syllables (as the determined by either loudness or pitch) are considered marked and non-accented syllables are considered unmarked.
In contrast, Ancient Greek and Latin metrics was governed by a quantitative principle meaning that long syllables are considered marked and short syllables are considered unmarked. There are about a dozen different meters that are frequently used in Latin verse and some more that occur less frequently. The hexameter is by far the most frequent one and is used primarily in the epos such as Virgil’s Aeneid and in didactic poetry such as Lucretius’s On the Nature of Things. For this reason, when automatic processing of Latin metrics is attempted, other meters are often overlooked in favor of the hexameter.
The attempt of this work, was to conceive of a system to automatically determine the quantities of a line’s syllables (a process called scanning) without overly focusing on one specific meter. In addition to building a library to do this, a web interface should present the result in an easily digestible way, detailing also how the system arrived at its result, in order to help learners of Latin better understand the process.
As mentioned above, scanning a line means determining its syllables’ quantities. There are two common ways a syllable can be counted as long:
- The syllable contains a long vowel or a diphthong. In this case, the syllable is said to be long “by nature.”
- The syllable’s vowel is followed by two or more consonants (that may well be part of the next syllable or even the next word). In this case, the syllable is considered to be long “by position.”
All other syllables are considered short. However, as is often the case with language, a number of phenomena can occur that change the way the text is read impacting also the scanning of the line:
If a syllable can be considered long by position but the causing consonant cluster consists only of a muta (b, d, g, p, t, k) followed by a liquida (l, r), it may indeed be considered long by position, but more often the lengthening does not occur. For example, the second syllable of “volucris” (“bird”) can be considered long or short.
If one word ends with a vowel or an “m” (which is only nasalized) and the succeeding word begins with a vowel or an “h” (which is only an aspiration mark), the last syllable of the first word is elided. This phenomenon is called elision. For example, in “quare habe” the second syllable of “quare” is elided resulting in the reading “quar(h)abe”. If the elision is not actually carried out, this is called a hiat.
If the above situation occurs, but the second word is a certain form of the auxiliary “esse” (“to be”), e.g. “est” or “estis”, the first syllable of the form of “esse” is elided instead of the last syllable of the first word. For example, “pressa est“ is read as “pressast”. This is called apheresis.
If inside of a word, one syllable ends with a vowel end next begins with a vowel, the first vowel is usually short (in Latin: vocalis ante vocalem corripitur). However, sometimes the two syllables are blended into one which is then long by nature of containing a diphthong. This is called synizesis. For example, “eorum” while usually having three syllables (e-o-rum) can be read as eo-rum.
There are various Latin meters (the Hypotactic corpus counts 273 meters, of which only 40 occur in more than 50 lines and only 13 occur in more than 500 lines) and this is not the place for a detailed summary of them. But for one simple and popular meter, the Phalaecian Hendecasyllable, the schema is shown for illustration purposes:
x x – ⏑ ⏑ – ⏑ – ⏑ – –
– marks a long syllable, ⏑ marks a short syllable and x marks a syllaba anceps meaning that it can be either long or short. The last element of any meter is always marked as long in schemas. However, there is the license of putting a short syllable in this long element (brevis in longo).
An example of a line fitting the above schema is the first (and any other) line of Catullus 1:
Cui dono lepidum novum libellum?
(Wem gebe ich dieses zierliche neue Büchlein?)
In contrast, this is the hexameter schema:
– ⏕ – ⏕ – ⏕ – ⏕ – ⏕ – –
⏕ means that either one long or two short syllables are possible. This grants the poet significantly more leeway to fit a line to the meter than in the case of the hendecasyllable, but it makes it harder to scan it. The meters used in Latin comedy such as the iambic senarius even have a lot more uncertainties in them than the hexameter.
Latin prosody has of course been studied for more than 2000 years, and extensively and systematically so since the 19th century. We omit all the fundamental research in this area that brought to light the principles described above and concentrate instead on attempts to treat Latin metrics using computational methods.
As hinted on above, there exists quite some previous work on the hexameter. The online tool Arma by Dylan Holmes is a scanner that is limited to scanning hexameters. It uses a purely rule-based approach without any knowledge about vowel lengths that is explained on the site itself. Basically, using this approach is possible here because the tool is limited to one meter, the hexameter, and the hexameter meter does not have many uncertainties in it.
There is also the practice website hexameter.co which makes the user scan hexameters and tells them whether they are correct or not.
Winge (2015) introduced a tool called latin-macronizer, which can be used on any Latin text (i.e. also on prose) to mark which vowels are long. The tool is based on the Latin morphology tool Morpheus that provides analyses of Latin word forms including lemmatization and vowel lengths as well as on a parser; combining these tools, latin-macronizer determines what form is present in the text and what quantity the vowels in this form have.
There are also two closed-source tools for scanning verse: Pede certo can scan hexameters and pentameters, but frequently enters an “error“ state yielding no scansion at all. The same website provides a way to search for word forms in Latin verse, however; this is very useful for finding out about how a word is used in verse and proved a valuable tool for us.
Besides that, there is the Metronom tool, the result of a Master’s Thesis of Jacek Tomaszewski. It is a polish interface for scanning Latin and Greek verse and is by far the most sophisticated attempt at scanning verse that is known to us. It supports a large amount of meters and works more reliable than the other tools. It has two shortcomings, however:
- It does not know about vowel lengths and often considers that vowels can be short or long resulting in frequent false positives, i.e. it says a line scans as a hexameter when it actually doesn’t because some vowel quantity contradicts this analysis.
- It does not show how it arrived at its results. I.e. it does not mark elision, positional lengthening, etc.
After a review of the existing tools, we concluded that we wanted to build a tool that more closely resembles the human scanning process: When a human reads a line of verse, they know from experience which vowels are long rendering the syllables long by nature and can by practicing also spot syllables that are long by position. The vowel lengths that are ignored by the other scanning systems can be determined using a dictionary similar to the way it is done by Winge (2015) and we consider using these vowel lengths to be a more natural way of beginning the scansion of a verse.
Moreover, we wanted to build a tool that annotates its result instead of only providing the syllable lengths, which would make it more comprehensible and more meaningful for students of Latin.
At a high level, our approach consists of three basic steps:
- Generate all possible ways a line can be read, called the readings of a line, while storing information about how they came about.
- For each reading, combine it with every possible meter judging it on how good it fits the meter.
- Rank the reading-meter combinations using an SVM or a decision tree.
The first step consists of several sub-steps described in the next section.
Our system begins with splitting a line into tokens and looking them up in the Morpheus morphology dictionary. In case there is more than one configuration of vowel lengths for the form (e.g. puellā vs. puellă), all of them are considered.
Afterwards, the tokens are split into syllables and positional lengths are determined. In case of muta cum liquida, both the lengthening and the non-lengthening variant are considered.
Elisions and aphereses are applied where applicable. Synizeses are considered where applicable.
After this process, a list of readings has been generated. For example, for a line that contains one form that has three analyses, one muta cum liquida and one synizesis possibility, there exist 3 * 2 * 2 = 12 readings.
After the readings have been generated, they are paired with the meters and for every possible reading-meter-combination, the following five features are extracted:
- Number of muta cum liquida appearances that trigger a length by position (X in the tree below)
- Number of synizeses applied (X in the tree below)
- Whether (1) or not (0) the reading matches the meter in question (X in the tree below)
- Whether (0) or not (1) a usual configuration of breaks is present in the verse (X in the tree below)
- Number of other meter rules that are violated (X in the tree below)
Each of these features can be interpreted as a penalty because a completely usual reading will have 0 for all or most of them.
We consider a decision tree, a random forest and an SVM to rank the reading-meter combinations. The first one is supposed to be the reading-meter combination that most probably is correct for the given line.
For training, we use parts of the Hypotactic dataset created by David Chamberlain to train our ranking algorithms. For reasons of limited time, we chose to include only four metra in our sub-dataset. These are the hexameter, the pentameter, the hendecasyllable and the scazon. We created a dataset with 10000 instances in the train set (it is actually larger but we only used the first 10000), 10000 in the development set and 15000 instances in the test set.
Chamberlain does not guarantee the correctness of the analyses in the Hypotactic dataset, but from our qualitative assessment, virtually all of the verses are entered correctly.
To evaluate our ranking algorithm, we use a the above mentioned splits of our Hypotactic verses. We annotate each reading that matches the scansion and meter of the Hypotactic verse as gold and train a number of machine learning classifiers on the resulting data set. The list of readings is then ranked by the expected probability of a gold classification.
The tables below show the probability that the correct reading-meter combination is in the (n+1)th top-ranked combinations. E.g. for the decision tree (dev), in 8433 out of 10005 instances, the correct reading-meter combination was one of the first two combinations.
DecisionTree (dev) 0 7338/10005 0.7334332833583208 1 8433/10005 0.8428785607196402 2 8801/10005 0.8796601699150425 3 8869/10005 0.8864567716141929 4 8976/10005 0.8971514242878561 5 8988/10005 0.8983508245877061 6 9008/10005 0.9003498250874563 7 9012/10005 0.9007496251874063 8 9038/10005 0.9033483258370815 9 9042/10005 0.9037481259370315 10 9044/10005 0.9039480259870065 11 9044/10005 0.9039480259870065 12 9048/10005 0.9043478260869565 13 9050/10005 0.9045477261369316 14 9050/10005 0.9045477261369316 15 9050/10005 0.9045477261369316 16 9054/10005 0.9049475262368816
SupportVectorMachine (dev) 0 7340/10005 0.7336331834082959 1 8407/10005 0.840279860069965 2 8797/10005 0.8792603698150925 3 8865/10005 0.8860569715142429 4 8981/10005 0.8976511744127936 5 8994/10005 0.8989505247376312 6 9011/10005 0.9006496751624188 7 9015/10005 0.9010494752623688 8 9039/10005 0.903448275862069 9 9043/10005 0.903848075962019 10 9045/10005 0.904047976011994 11 9045/10005 0.904047976011994 12 9048/10005 0.9043478260869565 13 9050/10005 0.9045477261369316 14 9050/10005 0.9045477261369316 15 9050/10005 0.9045477261369316 16 9054/10005 0.9049475262368816
RandomForest (dev) 0 7201/10005 0.7197401299350324 1 8257/10005 0.825287356321839 2 8736/10005 0.8731634182908545 3 8804/10005 0.879960019990005 4 8921/10005 0.8916541729135432 5 8934/10005 0.8929535232383808 6 8973/10005 0.8968515742128935 7 8977/10005 0.8972513743128436 8 9003/10005 0.8998500749625188 9 9007/10005 0.9002498750624688 10 9022/10005 0.9017491254372814 11 9022/10005 0.9017491254372814 12 9025/10005 0.9020489755122438 13 9027/10005 0.9022488755622189 14 9028/10005 0.9023488255872064 15 9028/10005 0.9023488255872064 16 9033/10005 0.9028485757121439 17 9034/10005 0.9029485257371315 18 9039/10005 0.903448275862069 19 9039/10005 0.903448275862069 20 9040/10005 0.9035482258870565 21 9040/10005 0.9035482258870565 22 9046/10005 0.9041479260369815 23 9046/10005 0.9041479260369815 24 9047/10005 0.904247876061969 25 9047/10005 0.904247876061969
DecisionTree (test) 0 10934/14945 0.731615925058548 1 12602/14945 0.8432251589160255 2 13180/14945 0.8819003011040482 3 13273/14945 0.8881231180996989 4 13418/14945 0.8978253596520576 5 13439/14945 0.8992305118768819 6 13471/14945 0.9013716962194714 7 13480/14945 0.9019739043158247 8 13508/14945 0.9038474406155905 9 13511/14945 0.9040481766477083 10 13512/14945 0.9041150886584142 11 13513/14945 0.9041820006691201 12 13525/14945 0.9049849447975912 13 13525/14945 0.9049849447975912 14 13527/14945 0.905118768819003 15 13527/14945 0.905118768819003 16 13534/14945 0.9055871528939444 17 13535/14945 0.9056540649046504 18 13535/14945 0.9056540649046504 19 13535/14945 0.9056540649046504
SupportVectorMachine (test) 0 10934/14945 0.731615925058548 1 12587/14945 0.8422214787554366 2 13183/14945 0.8821010371361659 3 13276/14945 0.8883238541318167 4 13417/14945 0.8977584476413516 5 13437/14945 0.89909668785547 6 13470/14945 0.9013047842087655 7 13479/14945 0.9019069923051187 8 13506/14945 0.9037136165941787 9 13510/14945 0.9039812646370023 10 13512/14945 0.9041150886584142 11 13512/14945 0.9041150886584142 12 13524/14945 0.9049180327868852 13 13525/14945 0.9049849447975912 14 13526/14945 0.9050518568082971 15 13526/14945 0.9050518568082971 16 13534/14945 0.9055871528939444 17 13535/14945 0.9056540649046504 18 13535/14945 0.9056540649046504 19 13535/14945 0.9056540649046504
RandomForest (test) 0 10818/14945 0.7238541318166611 1 12464/14945 0.8339913014386082 2 13120/14945 0.8778855804616928 3 13216/14945 0.8843091334894614 4 13359/14945 0.8938775510204081 5 13380/14945 0.8952827032452325 6 13433/14945 0.8988290398126464 7 13443/14945 0.8994981599197056 8 13472/14945 0.9014386082301773 9 13475/14945 0.9016393442622951 10 13487/14945 0.9024422883907661 11 13488/14945 0.9025092004014721 12 13500/14945 0.9033121445299431 13 13501/14945 0.9033790565406491 14 13504/14945 0.9035797925727668 15 13504/14945 0.9035797925727668 16 13511/14945 0.9040481766477083 17 13513/14945 0.9041820006691201 18 13519/14945 0.9045834727333556 19 13519/14945 0.9045834727333556 20 13520/14945 0.9046503847440616 21 13520/14945 0.9046503847440616 22 13522/14945 0.9047842087654734 23 13522/14945 0.9047842087654734 24 13524/14945 0.9049180327868852 25 13524/14945 0.9049180327868852 26 13527/14945 0.905118768819003 27 13527/14945 0.905118768819003 28 13527/14945 0.905118768819003 29 13527/14945 0.905118768819003 30 13529/14945 0.9052525928404148
We examine similar behaviour between the three classifiers, each reporting about 72 % recall for the top classification, while converging to 90 % recall by the fifth place of our ranking. The final 10 % recall missing are due to no suitable reading being generated in the first place.
The decision tree highlights the importance of the correct scansion of a given verse (X), i.e. if the lengths don’t match, the reading is definitely false. The other features seem to be more ambiguous, with X and X not even being considered for classification. This shows that additional features, regardless of machine learning approach, may be needed for a better performance.
We wrote a Web Application as a frontend for our tool that is hosted under checkmyprosody.com. It provides the three top-ranked reading-meter combinations as well as an easily digestible way of displaying it.
Shortcomings and Future Work
We demonstrated that our approach is a feasible way to build a system that jointly scans the line and predicts a meter. However, the reading generation as well as the ranking yield results that are not wholly satisfactory for us.
The generated readings contain the correct reading in 90 % of the cases. The errors are mostly due to uncommon forms like proper names, especially the Greek ones. The morphology tool Morpheus does not easily handle these words. Also, we noticed some errors in the lengths that are entered in Morpheus. One way to handle this, is to manually allow more plausible forms whenever a proper name is detected (e.g. via capitalization).
Moreover, there are some phenomena that we have not considered, yet. This includes hiat and iambic shortening, where a long syllable before a short one can become short as well in special configurations.
As for the ranking, all the machine learning approaches worked similarly well, but in order to improve them, more and better features need to be incorporated. For example, there are some rules that have been discovered about double breves and other quantity sequences that occur only rarely, like Ritschl’s rule and the Hermann-Lachmann rule.
Another thing that needs to be improved is the number of meters the tool is able to analyze. For the tests, we only used four meters, but there are many more. They can fairly easily be incorporated into the system by adding them to our list of meters.
To bridge the gap that arises through the imperfect reading generation, one could identify which readings almost match a meter and adjust the quantities to make them match. This way, situations where some peculiarity (like an unknown Greek proper name) prohibits generating a correct reading can be healed.
We presented a system that jointly scans a line of Latin verse and predicts the meter it satisfies. Our approach was special in that it is fundamentally not limited to any specific meters and we incorporated knowledge about vowel lengths using an external morphology tool to make up for the added complexity of the task.
In order to make the system more useful for learners of Latin, we built a web frontend for our scansion system that annotates special phenomena in the verse and explains their effects.
While we could prove that our approach works in principle, there are several rough edges in our system and more work needs to be done to make it less reliant on the correctness of the morphology tool and to enhance the feature extraction process in order to improve the ranking system.
We want to thank Jonathan Geiger, Johan Winge, David Chamberlain and Jacek Tomaszewski for their precious advice and their willingness to answer any questions we had about their tools.
- Boldrini, Sandro: Prosodie und Metrik der Römer. 1999
- Crusius, Friedrich: Römische Metrik. 1986
- Drexler, Hans: Einführung in die römische Metrik. 1967
- Winge, Johan: Automatic annotation of Latin vowel length. Bachelor’s Thesis at Uppsala University. 2015.