Giter Club home page Giter Club logo

xtf's Introduction

xtf

The eXtensible Text Framework (XTF) is a powerful open source platform for providing access to digital content. Developed and maintained by the California Digital Library (CDL), XTF functions as the primary access technology for the CDL's digital collections and other digital projects worldwide.

Much more information is available here: http://xtf.cdlib.org

Build Status

We love pull requests!

xtf's People

Contributors

billthegoat avatar conal-tuohy avatar dericed avatar helrond avatar martinhaye avatar tingletech avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xtf's Issues

xtf namespace prefix missing in ;raw=1 display

There's been a longstanding and annoying issue that viewing XML files with ";raw=1" yields an error in the browser due to missing xtf namespace prefix. I recently noticed this:

/** Determines whether this document is using namespaces. Not sure why
  *  this works when false, but it does.
  */
protected boolean usesNamespaces = false;

https://github.com/cdlib/xtf/blob/master/WEB-INF/src/org/cdlib/xtf/lazyTree/LazyDocument.java#L61-L64

It turns out that changing the above to true causes raw output to include those namespace prefixes and work properly.

( Doing further testing before submitting a pull req to see if there are any other side effects of this change. )

&s in xtf:store attributes get garbled

I'm trying to store two URL that must always be used together (an image URL and a rights statement) in one xtf metadata field. An attribute value saved with & comes back from XTF as &.

<facet-wikithumb xtf:meta="yes" xtf:store="yes" id="Philip_Henry_Delamotte"
     thumb="http://upload.wikimedia.org/wikipedia/commons/thumb/e/e2/Crystal_Palace_South_transept_&amp;_south_tower_from_Water_Temple.jpg/150px-Crystal_Palace_South_transept_&amp;_south_tower_from_Water_Temple.jpg"
     rights="http://en.wikipedia.org/wiki/File:Crystal_Palace_South_transept_&amp;_south_tower_from_Water_Temple.jpg">true</facet-wikithumb>
BT-macBookPro:xtf-cpf tingle$ ./bin/indexDump -index default -field facet-wikithumb | grep south_tower_from_Water_Temple
<$ id="Philip_Henry_Delamotte" thumb="http://upload.wikimedia.org/wikipedia/commons/thumb/e/e2/Crystal_Palace_South_transept_&_south_tower_from_Water_Temple.jpg/150px-Crystal_Palace_South_transept_&_south_tower_from_Water_Temple.jpg" rights="http://en.wikipedia.org/wiki/File:Crystal_Palace_South_transept_&_south_tower_from_Water_Temple.jpg">true|

I guess I could go:

<facet-wikithumb-src>http://upload.wikimedia.org/wikipedia/commons/thumb/e/e2/Crystal_Palace_South_transept_&amp;_south_tower_from_Water_Temple.jpg/150px-Crystal_Palace_South_transept_&amp;_south_tower_from_Water_Temple.jpg</facet-wikithumb-src>
<facet-wikithumb-rights>http://en.wikipedia.org/wiki/File:Crystal_Palace_South_transept_&amp;_south_tower_from_Water_Temple.jpg</facet-wikithumb-rights>

use-when attribute on external function calls

I've only done this on two files so far. If I get a chance to do and test it more thoroughly, I will submit a pull for this feature.

I have been keeping a version of Oxygen 10 around for testing and debugging stylesheet changes, but even that is awkward at times.

I've discovered if I use the use-when attribute and function-available(), I can run stylesheets in current Oxygen 21.

diff --git a/style/textIndexer/common/preFilterCommon.xsl b/style/textIndexer/common/preFilterCommon.xsl
index 18aa772f..8a88c40f 100644
--- a/style/textIndexer/common/preFilterCommon.xsl
+++ b/style/textIndexer/common/preFilterCommon.xsl
@@ -48,7 +48,7 @@
       <xsl:variable name="docpath" select="saxon:system-id()"/>
       <xsl:variable name="base" select="replace($docpath, '(.*)\.[^\.]+$', '$1')"/>
       <xsl:variable name="dcpath" select="concat($base, '.dc.xml')"/>
-      <xsl:if test="FileUtils:exists($dcpath)">
+      <xsl:if use-when="function-available('FileUtils:exists')" test="FileUtils:exists($dcpath)">
          <xsl:apply-templates select="document($dcpath)" mode="inmeta"/>
          <xsl:if test="not(document($dcpath)//*:identifier)">
             <identifier xtf:meta="true" xtf:tokenize="no">
@@ -253,7 +253,7 @@
       
       <!-- Remove accent marks and other diacritics -->
       <xsl:variable name="no-accents-name">
-         <xsl:value-of select="CharUtils:applyAccentMap('../../../conf/accentFolding/accentMap.txt', $creator)"/>
+         <xsl:value-of use-when="function-available('CharUtils:applyAccentMap')" select="CharUtils:applyAccentMap('../../../conf/accentFolding/accentMap.txt', $creator)"/>
       </xsl:variable>
       
       <!-- Normalize Spaces & Case-->
@@ -461,7 +461,7 @@
       
       <!-- Remove accent marks and other diacritics -->
       <xsl:variable name="no-accents-name">
-         <xsl:value-of select="CharUtils:applyAccentMap('../../../conf/accentFolding/accentMap.txt', $string)"/>
+         <xsl:value-of use-when="function-available('CharUtils:applyAccentMap')" select="CharUtils:applyAccentMap('../../../conf/accentFolding/accentMap.txt', $string)"/>
       </xsl:variable>

Indexer crash on PDFs with arabic text

Barbara Hui reports that XTF text indexer crashes when indexing a PDF, possibly due to Arabic text. The error message is:

  Indexing New/Updated Documents:
    Index: "default"
      Scanning Data Directories.... Done.
      (0%)   Indexing [9780824740832_NetBaseRaw/9780824740832.xml] ... Done.
      (0%)   Indexing [9780824740832_NetBaseRaw/9780824740832.ch1/9780824740832.ch1.pdf] ...
      *** PDFToXML.convert() Exception: class java.lang.NoSuchMethodError
                          With message: org.apache.fontbox.cmap.CMap.getName()Ljava/lang/String;
      Saxon Error on line 4 column 24 of file:/usr/local/licensed/xtf_crc/data/9780824740832_NetBaseRaw/9780824740832.ch1/9780824740832.ch1.pdf:: SXXP0003: Error reported by XML parser: XML document structures must start and end within
  the same entity.
      Skipping Due to Errors

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.