Giter Club home page Giter Club logo

lexica's People

Contributors

adiel-mittmann avatar alatius avatar annakrohn avatar ids1024 avatar lcerrato avatar nkprasad12 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lexica's Issues

(lat.ls.perseus-eng1.xml) unicode conversion errors

user report: 3/19/19
Hello,

converting lat.ls.perseus-eng1.xml to Unicode i decteted some errors in
the greek-BetaCode. First column = source text, second column = output
from converter, problems are marked with XX='[problem]' there:

<entryFree id="n4408  Auchetae">``<foreign lang="greek">Au)xa/tai</foreign>    <foreign lang="greek">XXX='A'ὐχάται</foreign>``</entryFree>
<entryFree id="n4991  barritus2">``<foreign lang="greek">'*ele/fas</foreign>    <foreign lang="greek">XXX='''Ελέφας</foreign>``</entryFree>
<entryFree id="n5138  Bellona">``<foreign lang="greek">'*enuw/,
'*erinnu/s, qea\ polemikh/</foreign>    <foreign lang="greek">XXX='''Ενυώ, XXX='''Εριννύς, θεὰ πολεμική</foreign>``</entryFree>
<entryFree id="n6076  caeruleus1">``<foreign lang="greek">kat'
.e)coxh/n</foreign>    <foreign lang="greek">κατ'
XXX='.'ἐξοχήν</foreign>``</entryFree>
<entryFree id="n6163  Calauria">``<foreign lang="greek">'i/a</foreign>   
<foreign lang="greek">XXX='''ία</foreign>``</entryFree>
<entryFree id="n6913  Carystos">``<foreign lang="greek">Ka/rustos</foreign>    <foreign lang="greek">XXX='K'άρυστος</foreign>``</entryFree>
<entryFree id="n6941  Caspium">``<foreign lang="greek">to\ Ka/spion
pe/lagos</foreign>    <foreign lang="greek">τὸ XXX='K'άσπιον
πέλαγος</foreign>``</entryFree>
<entryFree id="n9175  coluber">``<foreign lang="greek">w(s de\ dra/kwn,
k.t.l.</foreign>    <foreign lang="greek">ὡς δὲ δράκων,
κXXX='.'τXXX='.'λ.</foreign>``</entryFree>
<entryFree id="n11572  Creta1">``<foreign lang="greek">*krh:sios</foreign>    <foreign lang="greek">ΚρηXXX=':'σιος</foreign>``</entryFree>
<entryFree id="n11620  Crisa">``<foreign lang="greek">ko/lpos
Krisai=os</foreign>    <foreign lang="greek">κόλπος
XXX='
'XXX='K'ρισαῖος</foreign>``</entryFree>
<entryFree id="n21065  humus">``<foreign lang="greek">xămai/, xămo/qen,
xăma^lo/s</foreign> <foreign lang="greek">χXXX='ă'μαί, χXXX='ă'μόθεν,
χXXX='ă'μᾰλός</foreign>``</entryFree>
<entryFree id="n26736  Linus">``<foreign lang="greek">*li:nos</foreign>   
<foreign lang="greek">ΛιXXX=':'νος</foreign>``</entryFree>
<entryFree id="n26737  linyphus">``<foreign lang="greek">li:nufos</foreign>    <foreign lang="greek">λιXXX=':'νυφος</foreign>``</entryFree>
<entryFree id="n27352  lycophos">``<foreign lang="greek">luko:fws</foreign>    <foreign lang="greek">λυκοXXX=':'φως</foreign>``</entryFree>
<entryFree id="n27395  lyricus">``<foreign lang="greek">luriko:s</foreign>    <foreign lang="greek">λυρικοXXX=':'ς</foreign>``</entryFree>
<entryFree id="n27655  magulum">``<foreign lang="greek">gna/qos, to\
ma:goulon</foreign> <foreign lang="greek">γνάθος, τὸ
μαXXX=':'γουλον</foreign>``</entryFree>
<entryFree id="n27757  Mallos">``<foreign lang="greek">mallw:ths</foreign>    <foreign lang="greek">ΜαλλωXXX=':'της</foreign>``</entryFree>
<entryFree id="n27791  mamma">``<foreign lang="greek">ma:mma</foreign>   
<foreign lang="greek">μαXXX=':'μμα</foreign>``</entryFree>
<entryFree id="n27868  mania2">``<foreign lang="greek">mani:a</foreign>   
<foreign lang="greek">μανιXXX=':'α</foreign>``</entryFree>
<entryFree id="n27878  manicon">``<foreign lang="greek">maniko:n</foreign>    <foreign lang="greek">μανικοXXX=':'ν</foreign>``</entryFree>
<entryFree id="n28002  marcidat">``<foreign lang="greek">th/kei,
th:ketai</foreign>    <foreign lang="greek">τήκει,
τηXXX=':'κεται</foreign>``</entryFree>
<entryFree id="n28056  Marium">``<foreign lang="greek">ma:rion</foreign>    <foreign lang="greek">ΜαXXX=':'ριον</foreign>``</entryFree>
<entryFree id="n28135  maspetum">``<foreign lang="greek">ma:speton</foreign>    <foreign lang="greek">μαXXX=':'σπετον</foreign>``</entryFree>
<entryFree id="n28200  mathematicus">``<foreign lang="greek">maqhmatiko:s</foreign>    <foreign lang="greek">μαθηματικοXXX=':'ς</foreign>``</entryFree>
<entryFree id="n34477  Penates">``<foreign lang="greek">dENAS</foreign>    <foreign lang="greek">ΔXXX='E'XXX='N'XXX='A'XXX='S'</foreign>``</entryFree>
<entryFree id="n34477  Penates">``<foreign lang="greek">pENAS</foreign>    <foreign lang="greek">ΠXXX='E'XXX='N'XXX='A'XXX='S'</foreign>``</entryFree>
<entryFree id="n39482  pulcher1">``<foreign lang="greek">Kriti/a| tw=|
kalw=|</foreign>    <foreign lang="greek">XXX='
'XXX='K'ριτίᾳ τῷ
καλῷ</foreign>``</entryFree>
<entryFree id="n41901  Roxane">``<foreign lang="greek">
(Rwca/nh,</foreign>    <foreign lang="greek">XXX='
'XXX='('XXX='R'ωξάνη,</foreign>``</entryFree>
<entryFree id="n43335  Segesta1">``<foreign lang="greek">
)/Egesta</foreign>    <foreign lang="greek">XXX='
'XXX=')'XXX='/'XXX='E'γεστα</foreign>``</entryFree>
<entryFree id="n49607  U">``<foreign lang="greek">U</foreign>    <foreign lang="greek">XXX=''XXX='U'</foreign>``</entryFree>

If you need further informations, don't hestiate to contact me.

Lane's dictionary

user report

I have come across something that seemed faulty.

Looking up, in Lanes Arabic-English Lexicon, the term فَشَرِّدْ

I was referred to page

http://www.perseus.tufts.edu/hopper/text?doc=Perseus:text:2002.02.0027:entry=$r~dhu&highlight=terrify

The entry there began with the numeral "2"

I examined the previous and following pages in order to get more information on the term.

Page http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A2002.02.0027%3Aentry%3D%24arada

began with the numeral "1".

Page http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A2002.02.0027%3Aentry%3D652

began with the numeral "4".

Have I missed the numeral "3" - or is there no numeral "3" in the Lexicon? Or has the entry inadvertently been deleted or not entered?

Thank you.

(grc.lsj.perseus-eng4.xml) typo - entry issue

User report: I noticed an error in your LSJ. Under the headword δίγλωσσος it says the Attic form is διττος. This is because in the original book they only give the Attic ending, -ττος for the non-Attic -σσος, and this is then combined to δι- (instead of δίγλω-). The correct Attic form is, of course, δίγλωττος.

How to use URNs?

I’m curious if there is a standardized way to resolve the URNs found in the lexicon, e.g. “urn:cts:greekLit:tlg0033.tlg001.perseus-grc1:6:35”, to something human-readable (showing author, work, etc., in a less abbreviated form than it appears in the LSJ), short of writing something to parse the data over at https://github.com/PerseusDL/canonical-greekLit myself.

cross reference error Lewis and Short

user report 12/1/14

Dear Perseus Webmaster,



This is an error that might not trip up someone reading the text, but the Lewis and Short entry for "promereo" inclues a link to Cicero, For Lucius Murena, the link being labeled "Cic. Mur. 34, 70" which is intended to lead to the page, "Cic. Mur. 34." Instead it leads to Cic. Mur. 16, where there is a different kind of "section 34" that is simultaneous with the bold-labelled kind, section 16.

It leads here:

http://www.perseus.tufts.edu/hopper/text?doc=Cic. Mur. 34.70&lang=original

It should link to here instead:

http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A1999.02.0019%3Atext%3DMur.%3Achapter%3D34



The reason is that the text has two section divisions, one of which is faster than the other and overtakes it after a while.

headers for these files

The headers for the lexica aren't the usual download file headers (they are the internal Perseus headers) so some critical info on reuse is missing.

I suggest merging these headers with the ones used for public text download or providing instructions for an attribution statement since the CC BY-SA allows for whatever attribution we request.

Current files also include a funder statement (something other than &NEH; for instance). This may be something we need to rethink for all texts.

(lat.ls.perseus-eng1.xml) small typo fixes

user report 12/9/15

A couple of suggested edits to the Lewis and Short entry for “apud” 5:

  1. In designating the author of a work or of an assertion, apud aliquem, in, by, in the writings of, any one (the work itself being designated by in with abl.; as, de quā in Catone majore satis multa diximus, Cic. Off. 1, 42, 151: “Socraiem illum, qui est in Phaedro Platonis,” id. de Or. 1, 7, 28: “quo in libro,” id. ib. 1, 11, 47)

EDIT 1: Socraiem is a typo; read Socratem

EDIT 2: And the source of the cited passage, “Socraiem illum, qui est in Phaedro Platonis,” is Cic. de Or. (1.28; maybe that is Cic. de Or. 1, 7, 28 in some text numbering systems). The punctuation in the excerpted entry (and elsewhere in this dictionary entry) seems, to me, to make the attribution of passage to source unclear, on first reading. Maybe replace the colons with semicolons? Like this:

  1. In designating the author of a work or of an assertion, apud aliquem, in, by, in the writings of, any one (the work itself being designated by in with abl.; as, de quā in Catone majore satis multa diximus, Cic. Off. 1, 42, 151; “Socraiem illum, qui est in Phaedro Platonis,” id. de Or. 1, 7, 28; “quo in libro,” id. ib. 1, 11, 47)

confirmed. fixed typo; stet on formatting as this runs throughout this work

[Lewis and Short] removing insignificant whitespace

Hi @lcerrato

I'm working on a series of automation scripts to resolve some frequent tagging errors (for example, n. in verbs that are v. a. and n. sometimes being tagged as <gen> when in most entries it's in <pos>, abbreviations that are split into a different tag from their final ., etc...)

In order to make this task easier, I would like to send a PR doing the following:

  1. Fix all instances where an <entryFree> is split into multiple lines. Most of the time this is actually a mistake, and the second line of the <entryFree> is actually its own entry in LS. There are 2 occurrences where this is legitimate, and in these cases I want to move those all in the same line. In both cases, the amount of text in the second line is very small, and wouldn't cause the first line to being prohibitively long.
  2. Remove indentation before some entries. Since phase (1) makes it so that each entry is in one line, this should have no significant since it's outside the entries.
  3. Remove empty lines between entries. Again, this should have no significant since it's happening outside entry boundaries.

Would this be accepted?

How to handle missing <sense> tags

I've been making some corrections in a local repository for my personal use.

For a few entries I notices that what was intended to be an individual "sense" entry was accidentally included as part of another, i.e I see
<sense ... >[sense X text] (α) [sense Y text]<sense>

I would like to add the missing sense tag, so that it's: <sense ... >[sense X text]</sense><sense>[sense Y text]<sense>, but I'm unsure of which ID I should assign to the new sense. Should the new sense be given:

  • the lowest free ID (which preserves backwards compatibility, but meaning the IDs aren't in order anymore), or
  • the ID in "order" (which requires changing all sense items that come after)?

e.g. should

<sense id="n123.6" >[sense X text] (α) [sense Y text]</sense>
<sense id="n123.7">[sense Z text]</sense>`

become

<sense id="n123.6" >[sense X text]</sense>
<sense id="n123.8">[sense Y text]</sense>
<sense id="n123.7">[sense Z text]</sense>

or

<sense id="n123.6" >[sense X text]</sense>
<sense id="n123.7">[sense Y text]</sense>
<sense id="n123.8">[sense Z text]</sense>

FYI: Lewis and Short to JSON project

Hi all, I wanted to let you know about a project I've started here (scripts) and here (output) to convert the Lewis and Short XML to JSON. There's still a long way to go yet until the JSON starts being useful, but I wanted to open in issue now to make sure that what I'm doing will be the most helpful it can to you upstream. In particular, I want to know how I should respond to data errors. Obviously typo fixes should be sent upstream, but what about possible issues with the markup? For example, in some entries the <sense> level attribute drops by more than one step. I want to change this for the JSON, but I'm not sure if it's a quirk of the actual Lewis and Short text that you would like to preserve. Likewise, I'm not sure what distinguishes type="main" from type="greek" in the <entryFree> tag, as there are many words that seem straightforward Greek adoptions to me, and yet have the type="main". Example:

<entryFree id="n51482" type="main" key="xeromyrrha">
  <orth extent="full" lang="la">xērŏmyrrha</orth>, <itype>ae</itype>, <gen>f.</gen> (<foreign lang="greek">chro/s-mu/rra</foreign>), <sense id="n51482.0" n="I" level="1">dry myrrh, <bibl><author>Sedul.</author> Hymn. 2, 81</bibl>.</sense>
</entryFree>

I could go on with more questions about the markup, and will likely have yet more as I process the XML further. So I would like to know, in general, would you like me to edit the XML and submit a pull request for all the changes that I make, and you will sort out which ones to accept and which to discard? Or are there certain types of edits that you would be uninterested in and so I shouldn't bother trying to put them into the XML? I want to be as helpful as I can without spamming your pull requests, so let me know what you would like from me.

Gratias vobis, Iohannes

Unable to parse due entity not defined errors

Entities are an aspect of XML I am not very familiar with, so I may be doing something wrong.

Anyway, when I try to run lxml.etree.parse("lat.ls.perseus-eng1.xml") in Python to parse the XML file for Lewis and Short, I get this error:

lxml.etree.XMLSyntaxError: Entity 'dagger' not defined, line 288, column 27

(grc.lsj.perseus-eng16.xml) typo

<foreign lang="greek" TEIform="foreign">w(/raka</foreign> Baillet <tr opt="n" TEIform="tr">Inscr. destombeaux des rois</tr>

The book title should not be listed as a translation

(L&S) accent misprints (?)

user report RM20201122: I have noticed two macrons that I think the venerable Charles Lewis and Charles Short probably even intended (correctly) to be breves: - mās (long ā) măris (short ă) has a sub-entry mās māris (both with ā with macron) for the noun sense ("as subst."). The poetic examples given in the subentry itself, by Lucretius and Horace, show măr- (short ă), namely fēmina/que ut mărĭ/bus (Lucretius), (laudibus) nātā/lemque marēs// (lesser Asclepiadean, Hor.). - Īōnĭus has a macron on the capital ī, but all the surrounding
entries have the correct ĭ with breve.

(grc.lsj.perseus-eng4.xml) entry n25038 incomplete

user report LG06132022

 CTS_XML_TEI/perseus/pdllex/grc/lsj/grc.lsj.perseus-eng4.xml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/CTS_XML_TEI/perseus/pdllex/grc/lsj/grc.lsj.perseus-eng4.xml b/CTS_XML_TEI/perseus/pdllex/grc/lsj/grc.lsj.perseus-eng4.xml
index c7b9eeb..4132c44 100644
--- a/CTS_XML_TEI/perseus/pdllex/grc/lsj/grc.lsj.perseus-eng4.xml
+++ b/CTS_XML_TEI/perseus/pdllex/grc/lsj/grc.lsj.perseus-eng4.xml
@@ -77741,11 +77741,11 @@ Preamble and TEI header for loose SGML LSJ.
                </sense>
             </entryFree>
             <entryFree id="n25038"
-                       key="hte"
+                       key="diakrathte/on"
                        type="main"
                        opt="n"
                        TEIform="entryFree">
-               <orth extent="full" lang="greek" opt="n" TEIform="orth">-hte&lt;*&gt;on,</orth>  
+               <orth extent="full" lang="greek" opt="n" TEIform="orth">diakrat-hte/on,</orth>  
                <sense id="n25038.0" n="A" level="1" opt="n" TEIform="sense">
                   <tr opt="n" TEIform="tr">one must hold fast,</tr> 
                   <bibl default="NO" TEIform="bibl">

How to insert new senses?

In those cases where a "sense" has not been properly marked in the XML, how should it be added, with regard to the "id" attribute? For example, in the very first article there should be senses inserted between "n0.9" and "n0.10", as well as between "n0.13" and "n0.14". If I correct this and insert sense tags, what value should I give to their id attributes?

Non-Greek/Roman characters in Lewis and Short

Hi @lcerrato -
I've recently finished (with the help of some friends experienced in these languages) transcribing some of the Hebrew and Syriac words in Lewis and Short. In the current version, these are just listed as empty <foreign></foreign>. I plan to mark Hebrew words with lang="heb" and Syriac with lang="syr".

Since there are already unicode characters in the lat.ls.perseus-eng2.xml (for Greek), I was assuming that there will be no concern with adding Hebrew unicode for these missing words, but thought I should double check first.

LS: typo

user report: 9/22/14 12:38 PM
cor-rumpo (conr- ), rāpi, ruptum (rumptum), 3, v. a.
s/b
rūpī

there is a blur on this spot in the book

Question on credit

From the readme: https://github.com/PerseusDL/lexica/tree/master/CTS_XML_TEI/perseus/pdllex/lat/ls#readme, this attribution is required:

Text provided under a CC BY-SA license by Perseus Digital Library, http://www.perseus.tufts.edu, with funding from The National Endowment for the Humanities.
Data accessed from https://github.com/PerseusDL/lexica/ [date of access].

I maintain a site that has an interface Lewis and Short with some convenient reading features (handling of abbreviations, syntax highlighting, mobile friendly, etc... - you can an example article here: https://www.morcus.net/dicts?q=habeo ). I am of course crediting Perseus Digital Library for the content, but I am seeking assistance on the second line:

Data accessed from https://github.com/PerseusDL/lexica/ [date of access].

I have a fork of the repo (at https://github.com/nkprasad12/lexica - this is because sometimes I have changes that are not yet ready for a Pull Request to your repo, and sometimes your repo has changes that I have not yet had time to validate for my abbreviation handling system) from which I am pulling the latest changes to deploy to my website on an automated basis several times a week.

Due to this, it is hard to maintain an accurate date of access from the PerseusDL/lexica repo. For clarity, I was wondering whether it would be acceptable to display this slightly modified version of the attribution instead:

Text provided under a CC BY-SA license by Perseus Digital Library, http://www.perseus.tufts.edu, with funding from The National Endowment for the Humanities.
Data originally from https://github.com/PerseusDL/lexica/ .
Data accessed from https://github.com/nkprasad12/lexica/ [date of access].

small fix typos

lat.ls.perseus-eng1.xml

user report
Do you make a note of errors on the Perseus site? I noticed the following error in the online Lewis & Short dictionary. Under the word 'amictus' there is an incorrect reference to Tibullus. The quotation is from Tib. 1.8.13, not Tib. 1.9.13


"II. Meton., abstr. pro concr., the garment itself that is thrown about or on, any clothing, a mantle, cloak, etc.: “quam (statuam) esse ejusdem, status, amictus, anulus, imago ipsa declarat,” Cic. Att. 6, 1, 17: “frustra jam vestes, frustra mutatur amictus,” Tib. 1, 9, 13"

Proposal: Automatic XML Validation

I noticed that one of my recent PRs had a missing tag that wasn't caught until after it was merged, and needed to be fixed manually by lcerrato (#71).

GitHub offers (for free, to public repos such as this one) the ability to run a suite of tests based on certain trigger criteria, reducing the burden on repo maintainers.

If this seems useful, I can send a PR to automatically check that the lat.ls.perseus-eng2.xml file can be parsed as valid XML before any pull request can be merged. This check is non-binding, repo admins would still have the discretion to merge a PR that failed the checks if desired.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.