Giter Club home page Giter Club logo

alephmarcreader's People

Contributors

balduinlandolt avatar tobiasschweizer avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

alephmarcreader's Issues

New fields

Priority A (closed by #14):

  • 024: e-manuscripta (instead of 856)
  • 610 / 710: Körperschaften
  • 300: Physische Beschreibung
  • 500: Anmerkung
  • 520: Inhaltsangabe
  • 041: Sprache standardisiert

Priority B:

  • 250: Entstehungstufe
  • 264: original date (as string)
  • 525: Begleitmaterial

Priority C:

  • 581: Literatur

Missing Institutions in Author and Recipient

Authors and recipients can be persons or institutions.

our get_author() method

def get_author(self):
"""
Returns information about the author.
:return: [Person]
"""
author = []
for field in self.__get_field('100'):
author.append(self._get_person_info(field, '100'))
# check for recipients (700) that are actually authors
for field in self.__get_field('700'):
person = self._get_person_info(field, '700')
if "aut" in person.roles:
author.append(person)
return author
get_author.__annotations__ = {'return': [Person]}

(and get_recipient() respectively)
only check persons.

So basically, wherever we look for functions in field 600 or 700, we also need to look in 610 and 710.

This should be a fairly easy fix, and won't require any changes to the API or anything. So that could become the hotfix v1.0.1...

@tobiasschweizer Shall I take care of it?

Provide implementation for MarcXML

We could provide an implementation for MarcXML, derived from an abstract base class:

import abc

# compatible with Python 2 *and* 3:
ABC = abc.ABCMeta('ABC', (object,), {'__slots__': ()})

class AbstractMarcReader(ABC):

    @abc.abstractmethod
    def __init__(self, file_path):
        pass

    @abc.abstractmethod
    def do_something(self):
        print("Some implementation!")

class Marc21Reader(AbstractMarcReader):
    def __init__(self, file_path):
        super(Marc21Reader, self).__init__(file_path)
        pass

    def do_something(self):
        super(Marc21Reader, self).do_something()
        print("The enrichment from Marc21Reader")

class MarcXMLReader(AbstractMarcReader):
    def __init__(self, file_path):
        super(MarcXMLReader, self).__init__(file_path)
        pass

    def do_something(self):
        super(MarcXMLReader, self).do_something()
        print("The enrichment from MarcXMLReader")

x = Marc21Reader('test')
x.do_something()

y = MarcXMLReader('test')
y.do_something()

see https://stackoverflow.com/questions/35673474/using-abc-abcmeta-in-a-way-it-is-compatible-both-with-python-2-7-and-python-3-5 and https://www.python-course.eu/python3_abstract_classes.php

Loging unexpected cardinality: no field information in subfield handling

When an unexpected cardinality occurs (see Issue #15 ), this should best be logged right away. (As I attemt in pullrequest #21 .) The problem here is that it's hard to print a reasonable warning message within _handle_subfields_cardinality_max_one(), because there we only have an array of Subfields (which we expect to be max. 1 long), but no information on what field we're in or what the subfield identifier is.

I see two options:

  • Throwing an exception, which we handle where _handle_subfields_cardinality_max_one() was called.
  • we pass field-number and subfield-identifier to _handle_subfields_cardinality_max_one(), so we have al necessary information within this method.

@tobiasschweizer What say you? Wich option would you prefer?

Date: End of time span missing

So far, get_date() returns field 046, subfield c, wich can either be

  • single date
  • start of time span

Subfield e, which is end of time span is never looked at.

See my excel sheet for more details.
(There I'm also suggesting to remodel non-standard dates to trivial time spans: E.g. 1739 as 01.01.1739-01.01.1740 etc.)

Generally, if we have time spans, we'd probably also need a method get_singe_date(), for example to generate letter titles. Or am I mistaken here?

remove conjectured information from get_original_date()

the whole point of this field is giving the place/date information, as it is written on the letter.
In many cases, though, we have things like $a[Basel],$c[1743.06.22].

Having these editorial information in this field makes the information a bit futile...

@tobiasschweizer We could ignore all fields that start with [ to solve that. Should I?

MARCXML

This is the MARCXML representation of https://github.com/dhlab-basel/alephmarcreader/blob/master/alephmarcreader/tests/sample_data/000055275.marc:

<?xml version="1.0" ?>
<!DOCTYPE dublin-core-simple [
<!-- Dublin Core Version 1.1 -->
<!-- Based on CIMI Guide to Best Practice 1999-08-12 -->
<!ELEMENT record-list (dc-record*)>
<!ELEMENT dc-record (title | creator | subject | description | publisher | contributor
| date | type | format | identifier | source | language | relation | coverage | rights)*>
<!ELEMENT title (#PCDATA) >
<!ELEMENT creator (#PCDATA) >
<!ELEMENT subject (#PCDATA) >
<!ELEMENT description (#PCDATA) >
<!ELEMENT publisher (#PCDATA) >
<!ELEMENT contributor (#PCDATA) >
<!ELEMENT date (#PCDATA) >
<!ELEMENT type (#PCDATA) >
<!ELEMENT format (#PCDATA) >
<!ELEMENT identifier (#PCDATA) >
<!ELEMENT source (#PCDATA) >
<!ELEMENT language (#PCDATA) >
<!ELEMENT relation (#PCDATA) >
<!ELEMENT coverage (#PCDATA) >
<!ELEMENT rights (#PCDATA) >
]>
<record-list>
<dc-record>
<type>text</type>
<language>fre</language>
<subject>536 LIDOS Bernoulli-Edition</subject>
<contributor>Bernoulli, Daniel, 1700-1782</contributor>
<title>Brief an [Johannes] Scheuchzer /</title>
<date>ce 12 mars 1734</date>
<format>3 S. ;</format>
<description>Die Briefhandschrift findet sich in einem Züricher Briefband mit der Aufschrift &quot;Epistolae Helvetorum ad J. J. Scheuchzer&quot;. Johann Jakob Scheuchzer starb jedoch am 23.6.1733. Daniel Bernoulli hatte von dessen Tod spätestens im August 1733 auf der Rückreise von St. Petersburg erfahren (s. Brief von Johann II Bernoulli an Leonhard Euler von 1733.08.21). Der in Zürich lebende Adressat dieses Briefes kann also nicht Johann Jakob Scheuchzer sein. Der Adressat ist daher höchst wahrscheinlich dessen Bruder Johannes Scheuchzer.</description>
<language>Französisch</language>
<subject>Bernoulli, Johann 1667-1748</subject>
<subject>Boerhaave, Herman 1668-1738</subject>
<subject>L&apos;Isle, Joseph Nicolas de 1688-1768</subject>
<subject>Fahrenheit, Daniel Gabriel 1686-1736</subject>
<subject>Respinger, Johann Heinrich 1709-1782</subject>
<subject>Réaumur, René Antoine Ferchault de 1683-1757</subject>
<contributor>Scheuchzer, Johann, 1684-1738</contributor>
<relation>Bernoulli, Daniel I (1700-1782)</relation>
</dc-record>
</record-list>

Missing Subfields in Shelfmark

Is it intentional that in get_shelfmark() (Field 852) we only use two of the four expected Subfields? (See also my excel import list.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.