dhlab-basel / alephmarcreader Goto Github PK
View Code? Open in Web Editor NEWPython Library to read Marc obtained from Aleph
License: GNU Affero General Public License v3.0
Python Library to read Marc obtained from Aleph
License: GNU Affero General Public License v3.0
Priority A (closed by #14):
Priority B:
Priority C:
Authors and recipients can be persons or institutions.
our get_author()
method
alephmarcreader/alephmarcreader/abstractalephmarcreader.py
Lines 254 to 271 in 364839f
get_recipient()
respectively)So basically, wherever we look for functions in field 600 or 700, we also need to look in 610 and 710.
This should be a fairly easy fix, and won't require any changes to the API or anything. So that could become the hotfix v1.0.1...
@tobiasschweizer Shall I take care of it?
We could provide an implementation for MarcXML, derived from an abstract base class:
import abc
# compatible with Python 2 *and* 3:
ABC = abc.ABCMeta('ABC', (object,), {'__slots__': ()})
class AbstractMarcReader(ABC):
@abc.abstractmethod
def __init__(self, file_path):
pass
@abc.abstractmethod
def do_something(self):
print("Some implementation!")
class Marc21Reader(AbstractMarcReader):
def __init__(self, file_path):
super(Marc21Reader, self).__init__(file_path)
pass
def do_something(self):
super(Marc21Reader, self).do_something()
print("The enrichment from Marc21Reader")
class MarcXMLReader(AbstractMarcReader):
def __init__(self, file_path):
super(MarcXMLReader, self).__init__(file_path)
pass
def do_something(self):
super(MarcXMLReader, self).do_something()
print("The enrichment from MarcXMLReader")
x = Marc21Reader('test')
x.do_something()
y = MarcXMLReader('test')
y.do_something()
see https://stackoverflow.com/questions/35673474/using-abc-abcmeta-in-a-way-it-is-compatible-both-with-python-2-7-and-python-3-5 and https://www.python-course.eu/python3_abstract_classes.php
When an unexpected cardinality occurs (see Issue #15 ), this should best be logged right away. (As I attemt in pullrequest #21 .) The problem here is that it's hard to print a reasonable warning message within _handle_subfields_cardinality_max_one()
, because there we only have an array of Subfields (which we expect to be max. 1 long), but no information on what field we're in or what the subfield identifier is.
I see two options:
_handle_subfields_cardinality_max_one()
was called._handle_subfields_cardinality_max_one()
, so we have al necessary information within this method.@tobiasschweizer What say you? Wich option would you prefer?
get_subfields
in Marc21get_subfield_texts
So far, get_date()
returns field 046, subfield c, wich can either be
Subfield e, which is end of time span is never looked at.
See my excel sheet for more details.
(There I'm also suggesting to remodel non-standard dates to trivial time spans: E.g. 1739
as 01.01.1739-01.01.1740
etc.)
Generally, if we have time spans, we'd probably also need a method get_singe_date()
, for example to generate letter titles. Or am I mistaken here?
the whole point of this field is giving the place/date information, as it is written on the letter.
In many cases, though, we have things like $a[Basel],$c[1743.06.22]
.
Having these editorial information in this field makes the information a bit futile...
@tobiasschweizer We could ignore all fields that start with [
to solve that. Should I?
If the reader encounters an unexpected cardinality of a filed or subfield, print out a message.
This is the MARCXML representation of https://github.com/dhlab-basel/alephmarcreader/blob/master/alephmarcreader/tests/sample_data/000055275.marc:
<?xml version="1.0" ?>
<!DOCTYPE dublin-core-simple [
<!-- Dublin Core Version 1.1 -->
<!-- Based on CIMI Guide to Best Practice 1999-08-12 -->
<!ELEMENT record-list (dc-record*)>
<!ELEMENT dc-record (title | creator | subject | description | publisher | contributor
| date | type | format | identifier | source | language | relation | coverage | rights)*>
<!ELEMENT title (#PCDATA) >
<!ELEMENT creator (#PCDATA) >
<!ELEMENT subject (#PCDATA) >
<!ELEMENT description (#PCDATA) >
<!ELEMENT publisher (#PCDATA) >
<!ELEMENT contributor (#PCDATA) >
<!ELEMENT date (#PCDATA) >
<!ELEMENT type (#PCDATA) >
<!ELEMENT format (#PCDATA) >
<!ELEMENT identifier (#PCDATA) >
<!ELEMENT source (#PCDATA) >
<!ELEMENT language (#PCDATA) >
<!ELEMENT relation (#PCDATA) >
<!ELEMENT coverage (#PCDATA) >
<!ELEMENT rights (#PCDATA) >
]>
<record-list>
<dc-record>
<type>text</type>
<language>fre</language>
<subject>536 LIDOS Bernoulli-Edition</subject>
<contributor>Bernoulli, Daniel, 1700-1782</contributor>
<title>Brief an [Johannes] Scheuchzer /</title>
<date>ce 12 mars 1734</date>
<format>3 S. ;</format>
<description>Die Briefhandschrift findet sich in einem Züricher Briefband mit der Aufschrift "Epistolae Helvetorum ad J. J. Scheuchzer". Johann Jakob Scheuchzer starb jedoch am 23.6.1733. Daniel Bernoulli hatte von dessen Tod spätestens im August 1733 auf der Rückreise von St. Petersburg erfahren (s. Brief von Johann II Bernoulli an Leonhard Euler von 1733.08.21). Der in Zürich lebende Adressat dieses Briefes kann also nicht Johann Jakob Scheuchzer sein. Der Adressat ist daher höchst wahrscheinlich dessen Bruder Johannes Scheuchzer.</description>
<language>Französisch</language>
<subject>Bernoulli, Johann 1667-1748</subject>
<subject>Boerhaave, Herman 1668-1738</subject>
<subject>L'Isle, Joseph Nicolas de 1688-1768</subject>
<subject>Fahrenheit, Daniel Gabriel 1686-1736</subject>
<subject>Respinger, Johann Heinrich 1709-1782</subject>
<subject>Réaumur, René Antoine Ferchault de 1683-1757</subject>
<contributor>Scheuchzer, Johann, 1684-1738</contributor>
<relation>Bernoulli, Daniel I (1700-1782)</relation>
</dc-record>
</record-list>
calling this "original date" (as it is called in BIBB) is somewhat misleading, as it can be place only.
@tobiasschweizer should we rename this?
Is it intentional that in get_shelfmark()
(Field 852) we only use two of the four expected Subfields? (See also my excel import list.)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.