ld4l-labs / bib2lod Goto Github PK
View Code? Open in Web Editor NEWConverts bibliographic records to Linked Open Data
Converts bibliographic records to Linked Open Data
Currently returns boolean for valid/invalid. Should return an error message to log the specific error in the record, empty string if valid.
Check out revision 293b0ec. This version does not build a MainTitleElement, but the functional tests pass. Expected and actual output files attached (with txt extensions added).
actual.nt.txt
expected.ttl.txt
Remove getters and setters, store everything in a map, individual objects will use the parts of it they need and do relevant validation, throwing exceptions. Move all validation out of the Configuration object.
Create from minimal record: Instance and Title, Work and Title, Local Identifier, Item and Title (Is there always an item associated with a record?).
<?xml version="1.0" encoding="UTF-8"?>
<!-- Minimal MARC record. Cornell ILS requires only that a record have an
identifier (001), a LDR (which can have everything marked ‘no attempt to code’),
a 008 (ditto), and a title (130, 240, or 245).
-->
<collection xmlns="http://www.loc.gov/MARC21/slim">
<record>
<leader>01050cam a22003011 4500</leader>
<controlfield tag="001">102063</controlfield>
<controlfield tag="008">860506s1957 nyua b 000 0 eng </controlfield>
<datafield tag="245" ind1="0" ind2="0">
<subfield code="a">Clinical cardiopulmonary physiology.</subfield>
<subfield code="c">Sponsored by the American College of Chest Physicians. Editorial board: Burgess L. Gordon, chairman, editor-in-chief, Albert H. Andrews [and others]</subfield>
</datafield>
</record>
</collection>
Currently outputs only ntriples.
See if this facilitates conversion.
Also try inferencing vs non-inferencing model.
Reduces confusion and complexity
Second test fails:
def test02_cornell_ld4l_conversion(self):
"""Test Cornel LD4L conversions based on sample configuration file."""
indirs = 'sample-data/marcxml-to-ld4l/cornell'
outdir = self.tmpdir
for indir in glob.glob(os.path.join(indirs, '*')):
# FIXME - should look for *.xml in each dir and then build tests on that
src = os.path.join(indir, '102063.min.xml')
ref = os.path.join(indir, '102063.min.ttl')
dst = os.path.join(outdir, '102063.min.ttl')
config = example_config()
config['InputService']['source'] = src
config['OutputService']['destination'] = outdir
config_filepath = self.write_config(config)
out = run_bib2lod([config_filepath])
self.assertTrue(os.path.exists(dst))
self.assertEqual(RDiffB(['data.ld4l.org/cornell']).compare_files([ref, dst]), 0)
E AssertionError: 20 != 0
tests_functional/test_bib2lod.py:64: AssertionError
Note that I modified the code in my directory to expect turtle output, since I found input/output comparisons easier. Change output format in the config to TURTLE.
I've examined the input and output and can't find any differences.
Use MapOfLists to define children, attributes, and externals; then case use MapOfLists methods. Define any new methods in MapOfLIsts as required by Entity.
Use FileNameExtensionFilter, and write another filter for readability. Or write a custom filter that applies both.
See BaseConfiguration.buildInputReadersFromSource: source.listFiles()
Create hierarchy of interface/abstract class/concrete class.
It seems that the current output for the minimal record has both a Work and an Instance but they are not connected. I assume there should be a bf:instanceOf or bf:hasInstance triple.
This allows extension of Record classes to accommodate local fields, etc.
When no file logger is defined, we get logging to console. When it is defined, there is no logging to console.
Add an inferencing module to generate inferences from the model before outputting. Start with inverse inferencing.
Can throw an existing exception or a custom one
The configuration needs to test for validity, since the configuration can come from a source other than a json config file. So the tests in JsonUtils are redundant. We can probably then eliminate the class altogether, since all we need to do is get the values from the config.
Useful for testing, perhaps other uses.
Status, Origin, etc.
Then remove test files. See http://junit.org/junit4/javadoc/latest/org/junit/rules/TemporaryFolder.html#TemporaryFolder()
Not sure about this, but worth considering. Converter, Cleaner, Parser, InputService, OutputService, EntityBuilders, and EntityBuilder are all instantiated only once.
This has mostly been implemented, but there are a few exceptions.
Don't hard-code in manager
Controller controls entire process - read, clean, parse, convert, write - whereas a Converter performs one specific conversion (e.g., marcxml to ld4l)
Conversion data doesn't have to reside on a file system but can be streamed to the converter.
Useful for testing, perhaps other uses.
Currently the element "tag" attribute can only accommodate integer values. The are some institution-specific custom values that are alphanumeric and also need to be handled.
The three-letter codes are not identical. Using LC URIs for now, but since Lexvo has been adopted in the ontology recommendation, will use a mapping file to replace with the Lexvo URIs.
Mappings available here: http://www.lexvo.org/linkeddata/resources.html
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.