perseusdl / lexica Goto Github PK
View Code? Open in Web Editor NEWRepo for the text files of lexica
License: Creative Commons Attribution Share Alike 4.0 International
Repo for the text files of lexica
License: Creative Commons Attribution Share Alike 4.0 International
user report: 3/19/19
Hello,
converting lat.ls.perseus-eng1.xml to Unicode i decteted some errors in
the greek-BetaCode. First column = source text, second column = output
from converter, problems are marked with XX='[problem]' there:
<entryFree id="n4408 Auchetae">``<foreign lang="greek">
Au)xa/tai</foreign>
<foreign lang="greek">
XXX='A'ὐχάται</foreign>``</entryFree>
<entryFree id="n4991 barritus2">``<foreign lang="greek">
'*ele/fas</foreign>
<foreign lang="greek">
XXX='''Ελέφας</foreign>``</entryFree>
<entryFree id="n5138 Bellona">``<foreign lang="greek">
'*enuw/,
'*erinnu/s, qea\ polemikh/</foreign>
<foreign lang="greek">
XXX='''Ενυώ, XXX='''Εριννύς, θεὰ πολεμική</foreign>``</entryFree>
<entryFree id="n6076 caeruleus1">``<foreign lang="greek">
kat'
.e)coxh/n</foreign>
<foreign lang="greek">
κατ'
XXX='.'ἐξοχήν</foreign>``</entryFree>
<entryFree id="n6163 Calauria">``<foreign lang="greek">
'i/a</foreign>
<foreign lang="greek">
XXX='''ία</foreign>``</entryFree>
<entryFree id="n6913 Carystos">``<foreign lang="greek">
Ka/rustos</foreign>
<foreign lang="greek">
XXX='K'άρυστος</foreign>``</entryFree>
<entryFree id="n6941 Caspium">``<foreign lang="greek">
to\ Ka/spion
pe/lagos</foreign>
<foreign lang="greek">
τὸ XXX='K'άσπιον
πέλαγος</foreign>``</entryFree>
<entryFree id="n9175 coluber">``<foreign lang="greek">
w(s de\ dra/kwn,
k.t.l.</foreign>
<foreign lang="greek">
ὡς δὲ δράκων,
κXXX='.'τXXX='.'λ.</foreign>``</entryFree>
<entryFree id="n11572 Creta1">``<foreign lang="greek">
*krh:sios</foreign>
<foreign lang="greek">
ΚρηXXX=':'σιος</foreign>``</entryFree>
<entryFree id="n11620 Crisa">``<foreign lang="greek">
ko/lpos
Krisai=os</foreign>
<foreign lang="greek">
κόλπος
XXX=''XXX='K'ρισαῖος</foreign>``</entryFree>
<entryFree id="n21065 humus">``<foreign lang="greek">
xămai/, xămo/qen,
xăma^lo/s</foreign>
<foreign lang="greek">
χXXX='ă'μαί, χXXX='ă'μόθεν,
χXXX='ă'μᾰλός</foreign>``</entryFree>
<entryFree id="n26736 Linus">``<foreign lang="greek">
*li:nos</foreign>
<foreign lang="greek">
ΛιXXX=':'νος</foreign>``</entryFree>
<entryFree id="n26737 linyphus">``<foreign lang="greek">
li:nufos</foreign>
<foreign lang="greek">
λιXXX=':'νυφος</foreign>``</entryFree>
<entryFree id="n27352 lycophos">``<foreign lang="greek">
luko:fws</foreign>
<foreign lang="greek">
λυκοXXX=':'φως</foreign>``</entryFree>
<entryFree id="n27395 lyricus">``<foreign lang="greek">
luriko:s</foreign>
<foreign lang="greek">
λυρικοXXX=':'ς</foreign>``</entryFree>
<entryFree id="n27655 magulum">``<foreign lang="greek">
gna/qos, to\
ma:goulon</foreign>
<foreign lang="greek">
γνάθος, τὸ
μαXXX=':'γουλον</foreign>``</entryFree>
<entryFree id="n27757 Mallos">``<foreign lang="greek">
mallw:ths</foreign>
<foreign lang="greek">
ΜαλλωXXX=':'της</foreign>``</entryFree>
<entryFree id="n27791 mamma">``<foreign lang="greek">
ma:mma</foreign>
<foreign lang="greek">
μαXXX=':'μμα</foreign>``</entryFree>
<entryFree id="n27868 mania2">``<foreign lang="greek">
mani:a</foreign>
<foreign lang="greek">
μανιXXX=':'α</foreign>``</entryFree>
<entryFree id="n27878 manicon">``<foreign lang="greek">
maniko:n</foreign>
<foreign lang="greek">
μανικοXXX=':'ν</foreign>``</entryFree>
<entryFree id="n28002 marcidat">``<foreign lang="greek">
th/kei,
th:ketai</foreign>
<foreign lang="greek">
τήκει,
τηXXX=':'κεται</foreign>``</entryFree>
<entryFree id="n28056 Marium">``<foreign lang="greek">
ma:rion</foreign>
<foreign lang="greek">
ΜαXXX=':'ριον</foreign>``</entryFree>
<entryFree id="n28135 maspetum">``<foreign lang="greek">
ma:speton</foreign>
<foreign lang="greek">
μαXXX=':'σπετον</foreign>``</entryFree>
<entryFree id="n28200 mathematicus">``<foreign lang="greek">
maqhmatiko:s</foreign>
<foreign lang="greek">
μαθηματικοXXX=':'ς</foreign>``</entryFree>
<entryFree id="n34477 Penates">``<foreign lang="greek">
dENAS</foreign>
<foreign lang="greek">
ΔXXX='E'XXX='N'XXX='A'XXX='S'</foreign>``</entryFree>
<entryFree id="n34477 Penates">``<foreign lang="greek">
pENAS</foreign>
<foreign lang="greek">
ΠXXX='E'XXX='N'XXX='A'XXX='S'</foreign>``</entryFree>
<entryFree id="n39482 pulcher1">``<foreign lang="greek">
Kriti/a| tw=|
kalw=|</foreign>
<foreign lang="greek">
XXX=''XXX='K'ριτίᾳ τῷ
καλῷ</foreign>``</entryFree>
<entryFree id="n41901 Roxane">``<foreign lang="greek">
(Rwca/nh,</foreign>
<foreign lang="greek">
XXX=''XXX='('XXX='R'ωξάνη,</foreign>``</entryFree>
<entryFree id="n43335 Segesta1">``<foreign lang="greek">
)/Egesta</foreign>
<foreign lang="greek">
XXX=''XXX=')'XXX='/'XXX='E'γεστα</foreign>``</entryFree>
<entryFree id="n49607 U">``<foreign lang="greek">
U</foreign>
<foreign lang="greek">
XXX=''XXX='U'</foreign>``</entryFree>
If you need further informations, don't hestiate to contact me.
user report
Nearly all the words starting with dii... seem unavailable in LSJ (the Liddell-Scott-Jones Greek dictionary), as indicated by the zeroes here: http://www.perseus.tufts.edu/hopper/resolveform?type=start&lookup=dii&lang=greek. I'm especially trying to access the LSJ entry for diisthmi listed as the first lemma here: http://www.perseus.tufts.edu/hopper/resolveform?type=start&lookup=diisthmi&lang=greek.
user report
I have come across something that seemed faulty.
Looking up, in Lanes Arabic-English Lexicon, the term فَشَرِّدْ
I was referred to page
The entry there began with the numeral "2"
I examined the previous and following pages in order to get more information on the term.
Page http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A2002.02.0027%3Aentry%3D%24arada
began with the numeral "1".
Page http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A2002.02.0027%3Aentry%3D652
began with the numeral "4".
Have I missed the numeral "3" - or is there no numeral "3" in the Lexicon? Or has the entry inadvertently been deleted or not entered?
Thank you.
There seem to be four dictionary entries merged into one in the LSJ dictionary at
http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A1999.04.0057%3Aentry%3De(to%2Fs
user report 2/5/15
In the Lewis and Short Latin dictionary: http://www.perseus.tufts.edu/hopper/text?doc=Perseus:text:1999.04.0059:entry=sicco&highlight=dry We find the definition "to make dry, to day, to dry up (etc)" That should be "to dry", not "to day"
I was trying to search θαρσέω with Accordance and tried clicking on the first link 4.184 - but the link appears to be broken, both in Accordance and in LSJ online. The page I searched in:
The broken link: http://www.perseus.tufts.edu/hopper/invalidquery.jsp?doc=Perseus:abo:tlg,0059,006:4:184
User report: I noticed an error in your LSJ. Under the headword δίγλωσσος it says the Attic form is διττος. This is because in the original book they only give the Attic ending, -ττος for the non-Attic -σσος, and this is then combined to δι- (instead of δίγλω-). The correct Attic form is, of course, δίγλωττος.
I’m curious if there is a standardized way to resolve the URNs found in the lexicon, e.g. “urn:cts:greekLit:tlg0033.tlg001.perseus-grc1:6:35”, to something human-readable (showing author, work, etc., in a less abbreviated form than it appears in the LSJ), short of writing something to parse the data over at https://github.com/PerseusDL/canonical-greekLit myself.
Adding the two lexica to the repo.
user report 12/1/14
Dear Perseus Webmaster,
This is an error that might not trip up someone reading the text, but the Lewis and Short entry for "promereo" inclues a link to Cicero, For Lucius Murena, the link being labeled "Cic. Mur. 34, 70" which is intended to lead to the page, "Cic. Mur. 34." Instead it leads to Cic. Mur. 16, where there is a different kind of "section 34" that is simultaneous with the bold-labelled kind, section 16.
It leads here: http://www.perseus.tufts.edu/hopper/text?doc=Cic. Mur. 34.70&lang=original It should link to here instead: http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A1999.02.0019%3Atext%3DMur.%3Achapter%3D34
The reason is that the text has two section divisions, one of which is faster than the other and overtakes it after a while.
The headers for the lexica aren't the usual download file headers (they are the internal Perseus headers) so some critical info on reuse is missing.
I suggest merging these headers with the ones used for public text download or providing instructions for an attribution statement since the CC BY-SA allows for whatever attribution we request.
Current files also include a funder statement (something other than &NEH; for instance). This may be something we need to rethink for all texts.
user report 12/9/15
A couple of suggested edits to the Lewis and Short entry for “apud” 5:
EDIT 1: Socraiem is a typo; read Socratem
EDIT 2: And the source of the cited passage, “Socraiem illum, qui est in Phaedro Platonis,” is Cic. de Or. (1.28; maybe that is Cic. de Or. 1, 7, 28 in some text numbering systems). The punctuation in the excerpted entry (and elsewhere in this dictionary entry) seems, to me, to make the attribution of passage to source unclear, on first reading. Maybe replace the colons with semicolons? Like this:
confirmed. fixed typo; stet on formatting as this runs throughout this work
Re #24.
Also resolve other user questions about text status.
this text has not been moved as yet, but leaving this here
"When I click on "aff-' in the list of lemmata in the left edge of the window (when I am looking at a definition from Lewis and Short), this error page appears immediately"
Hi @lcerrato
I'm working on a series of automation scripts to resolve some frequent tagging errors (for example, n.
in verbs that are v. a. and n.
sometimes being tagged as <gen>
when in most entries it's in <pos>
, abbreviations that are split into a different tag from their final .
, etc...)
In order to make this task easier, I would like to send a PR doing the following:
<entryFree>
is split into multiple lines. Most of the time this is actually a mistake, and the second line of the <entryFree>
is actually its own entry in LS. There are 2 occurrences where this is legitimate, and in these cases I want to move those all in the same line. In both cases, the amount of text in the second line is very small, and wouldn't cause the first line to being prohibitively long.Would this be accepted?
I've been making some corrections in a local repository for my personal use.
For a few entries I notices that what was intended to be an individual "sense" entry was accidentally included as part of another, i.e I see
<sense ... >[sense X text] (α) [sense Y text]<sense>
I would like to add the missing sense tag, so that it's: <sense ... >[sense X text]</sense><sense>[sense Y text]<sense>
, but I'm unsure of which ID I should assign to the new sense
. Should the new sense
be given:
sense
items that come after)?e.g. should
<sense id="n123.6" >[sense X text] (α) [sense Y text]</sense>
<sense id="n123.7">[sense Z text]</sense>`
become
<sense id="n123.6" >[sense X text]</sense>
<sense id="n123.8">[sense Y text]</sense>
<sense id="n123.7">[sense Z text]</sense>
or
<sense id="n123.6" >[sense X text]</sense>
<sense id="n123.7">[sense Y text]</sense>
<sense id="n123.8">[sense Z text]</sense>
According to my sources the accenuation of PSUCHOS in the following entry
should be PSU/CHOS:
Hi all, I wanted to let you know about a project I've started here (scripts) and here (output) to convert the Lewis and Short XML to JSON. There's still a long way to go yet until the JSON starts being useful, but I wanted to open in issue now to make sure that what I'm doing will be the most helpful it can to you upstream. In particular, I want to know how I should respond to data errors. Obviously typo fixes should be sent upstream, but what about possible issues with the markup? For example, in some entries the <sense>
level
attribute drops by more than one step. I want to change this for the JSON, but I'm not sure if it's a quirk of the actual Lewis and Short text that you would like to preserve. Likewise, I'm not sure what distinguishes type="main"
from type="greek"
in the <entryFree>
tag, as there are many words that seem straightforward Greek adoptions to me, and yet have the type="main"
. Example:
<entryFree id="n51482" type="main" key="xeromyrrha">
<orth extent="full" lang="la">xērŏmyrrha</orth>, <itype>ae</itype>, <gen>f.</gen> (<foreign lang="greek">chro/s-mu/rra</foreign>), <sense id="n51482.0" n="I" level="1">dry myrrh, <bibl><author>Sedul.</author> Hymn. 2, 81</bibl>.</sense>
</entryFree>
I could go on with more questions about the markup, and will likely have yet more as I process the XML further. So I would like to know, in general, would you like me to edit the XML and submit a pull request for all the changes that I make, and you will sort out which ones to accept and which to discard? Or are there certain types of edits that you would be uninterested in and so I shouldn't bother trying to put them into the XML? I want to be as helpful as I can without spamming your pull requests, so let me know what you would like from me.
Gratias vobis, Iohannes
Entities are an aspect of XML I am not very familiar with, so I may be doing something wrong.
Anyway, when I try to run lxml.etree.parse("lat.ls.perseus-eng1.xml")
in Python to parse the XML file for Lewis and Short, I get this error:
lxml.etree.XMLSyntaxError: Entity 'dagger' not defined, line 288, column 27
<foreign lang="greek" TEIform="foreign">w(/raka</foreign> Baillet <tr opt="n" TEIform="tr">Inscr. destombeaux des rois</tr>
The book title should not be listed as a translation
A user sent in a bunch of corrections to the Middle Liddell. 1999.04.0058
Most have to do with numerical encoding, so they would be best dealt with when Unicode conversion has taken place
user report RM20201122: I have noticed two macrons that I think the venerable Charles Lewis and Charles Short probably even intended (correctly) to be breves: - mās (long ā) măris (short ă) has a sub-entry mās māris (both with ā with macron) for the noun sense ("as subst."). The poetic examples given in the subentry itself, by Lucretius and Horace, show măr- (short ă), namely fēmina/que ut mărĭ/bus (Lucretius), (laudibus) nātā/lemque marēs// (lesser Asclepiadean, Hor.). - Īōnĭus has a macron on the capital ī, but all the surrounding
entries have the correct ĭ with breve.
see issue PerseusDL/canonical#25
user report 8/30/15
the definition for
εἷς , μία^, ἕν
doesn't display correctly, it spans over the next voice (ἔστω)
There is a new version submitted by users with corrections, etc.
per @gcelano a request to remove the entities from the file ă
etc.
user report LG06132022
CTS_XML_TEI/perseus/pdllex/grc/lsj/grc.lsj.perseus-eng4.xml | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/CTS_XML_TEI/perseus/pdllex/grc/lsj/grc.lsj.perseus-eng4.xml b/CTS_XML_TEI/perseus/pdllex/grc/lsj/grc.lsj.perseus-eng4.xml
index c7b9eeb..4132c44 100644
--- a/CTS_XML_TEI/perseus/pdllex/grc/lsj/grc.lsj.perseus-eng4.xml
+++ b/CTS_XML_TEI/perseus/pdllex/grc/lsj/grc.lsj.perseus-eng4.xml
@@ -77741,11 +77741,11 @@ Preamble and TEI header for loose SGML LSJ.
</sense>
</entryFree>
<entryFree id="n25038"
- key="hte"
+ key="diakrathte/on"
type="main"
opt="n"
TEIform="entryFree">
- <orth extent="full" lang="greek" opt="n" TEIform="orth">-hte<*>on,</orth>
+ <orth extent="full" lang="greek" opt="n" TEIform="orth">diakrat-hte/on,</orth>
<sense id="n25038.0" n="A" level="1" opt="n" TEIform="sense">
<tr opt="n" TEIform="tr">one must hold fast,</tr>
<bibl default="NO" TEIform="bibl">
is entry for χάω complete
LSJ needs to be converted from beta code to Unicode.
In those cases where a "sense" has not been properly marked in the XML, how should it be added, with regard to the "id" attribute? For example, in the very first article there should be senses inserted between "n0.9" and "n0.10", as well as between "n0.13" and "n0.14". If I correct this and insert sense tags, what value should I give to their id attributes?
grouped with prior entry
user report V20201207
user report 8/30/15
I'd like to report another typo.
NOICE
http://www.perseus.tufts.edu/hopper/morph?l=boa&la=greek#lexicon
5. c. inf., cry aloud or command in aloud noice to do a thing
see also
http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A1999.04.0057%3Aentry%3Dboa%2Fw
checked against source should be command in a loud voice
Hi @lcerrato -
I've recently finished (with the help of some friends experienced in these languages) transcribing some of the Hebrew and Syriac words in Lewis and Short. In the current version, these are just listed as empty <foreign></foreign>
. I plan to mark Hebrew words with lang="heb"
and Syriac with lang="syr"
.
Since there are already unicode characters in the lat.ls.perseus-eng2.xml
(for Greek), I was assuming that there will be no concern with adding Hebrew unicode for these missing words, but thought I should double check first.
user report: 9/22/14 12:38 PM
cor-rumpo (conr- ), rāpi, ruptum (rumptum), 3, v. a.
s/b
rūpī
there is a blur on this spot in the book
some front matter missing for various lexica; need more detail on this
need to assign to correct repos.
From the readme: https://github.com/PerseusDL/lexica/tree/master/CTS_XML_TEI/perseus/pdllex/lat/ls#readme, this attribution is required:
Text provided under a CC BY-SA license by Perseus Digital Library, http://www.perseus.tufts.edu, with funding from The National Endowment for the Humanities.
Data accessed from https://github.com/PerseusDL/lexica/ [date of access].
I maintain a site that has an interface Lewis and Short with some convenient reading features (handling of abbreviations, syntax highlighting, mobile friendly, etc... - you can an example article here: https://www.morcus.net/dicts?q=habeo ). I am of course crediting Perseus Digital Library for the content, but I am seeking assistance on the second line:
Data accessed from https://github.com/PerseusDL/lexica/ [date of access].
I have a fork of the repo (at https://github.com/nkprasad12/lexica - this is because sometimes I have changes that are not yet ready for a Pull Request to your repo, and sometimes your repo has changes that I have not yet had time to validate for my abbreviation handling system) from which I am pulling the latest changes to deploy to my website on an automated basis several times a week.
Due to this, it is hard to maintain an accurate date of access
from the PerseusDL/lexica
repo. For clarity, I was wondering whether it would be acceptable to display this slightly modified version of the attribution instead:
Text provided under a CC BY-SA license by Perseus Digital Library, http://www.perseus.tufts.edu, with funding from The National Endowment for the Humanities.
Data originally from https://github.com/PerseusDL/lexica/ .
Data accessed from https://github.com/nkprasad12/lexica/ [date of access].
lat.ls.perseus-eng1.xml
user report
Do you make a note of errors on the Perseus site? I noticed the following error in the online Lewis & Short dictionary. Under the word 'amictus' there is an incorrect reference to Tibullus. The quotation is from Tib. 1.8.13, not Tib. 1.9.13
"II. Meton., abstr. pro concr., the garment itself that is thrown about or on, any clothing, a mantle, cloak, etc.: “quam (statuam) esse ejusdem, status, amictus, anulus, imago ipsa declarat,” Cic. Att. 6, 1, 17: “frustra jam vestes, frustra mutatur amictus,” Tib. 1, 9, 13"
I noticed that one of my recent PRs had a missing tag that wasn't caught until after it was merged, and needed to be fixed manually by lcerrato (#71).
GitHub offers (for free, to public repos such as this one) the ability to run a suite of tests based on certain trigger criteria, reducing the burden on repo maintainers.
If this seems useful, I can send a PR to automatically check that the lat.ls.perseus-eng2.xml
file can be parsed as valid XML before any pull request can be merged. This check is non-binding, repo admins would still have the discretion to merge a PR that failed the checks if desired.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.