Giter Club home page Giter Club logo

kalamar's Introduction

Kalamar

Kalamar is a Mojolicious-based user interface frontend for the KorAP Corpus Analysis Platform.

DOI

Kalamar Screenshots

Setup

The easiest way to install and run Kalamar is using Docker.

docker pull korap/kalamar

Then start Kalamar listening on port 64543.

docker run --network host --name kalamar korap/kalamar

Kalamar will be available at http://localhost:64543.

See the description on docker hub regarding further information.

Setup for Development

To install the latest version of Kalamar, first fetch the resource ...

git clone https://github.com/KorAP/Kalamar

... and follow the steps below.

If you have any problems with installing Kalamar, see the Troubleshooting section.

Generate Static Asset Files

To generate the static asset files (scripts, styles, images ...), you need NodeJS >= 6.0.0. This will probably need administration rights, depending on your installation path. These tools may also be available with a package manager.

You can check your version using

npm -v

Afterwards you can install the dependencies and run grunt to create the assets.

cd Kalamar
npm install -g grunt-cli
npm install
grunt

Whenever the assets change, just rerun npm install and grunt.

Start Server

Kalamar uses the Mojolicious framework, that expects a Perl version of at least 5.16. On Windows Strawberry Perl is recommended. An environment based on Perlbrew is recommended, if available. The installation guide requires App::cpanminus as well.

Some perl modules are not on CPAN yet, so you need to install them from GitHub. The easiest way to do this is using App::cpanminus. This will probably need administration rights.

cpanm https://github.com/Akron/Mojolicious-Plugin-Localize.git

Then install the dependencies using App::cpanminus (there is no need to install Kalamar) and run the test suite.

cd Kalamar
cpanm --installdeps .
perl Makefile.PL
make test

Kalamar can be deployed like all Mojolicious apps. The easiest way is to start the built-in server:

perl script/kalamar daemon

Kalamar will then be available at localhost:3000 in your browser.

By default, Kalamar tries to connect to https://korap.ids-mannheim.de/api/, followed by the most current version of the API. You may change that endpoint to the KorAP API provider in the configuration (see Kustvakt for further information) or by using the environment variable KALAMAR_API.

If the endpoint is remote and requires SSL support, like the default endpoint, you have to install SSL support in addition:

cpanm IO::Socket::SSL

Updates

To update Kalamar, just run

git pull origin master
cpanm --installdeps .
npm install
grunt

And both the server and client dependencies should be up to date.

Configuration

The basic configuration file is kalamar.conf. To define derivations, create a configuration file with the pattern kalamar.myconf.conf and follow the descriptions in kalamar.conf.

To start Kalamar with a derivative configuration, set the MOJO_MODE environment variable.

MOJO_MODE=myconf perl script/kalamar daemon

Or in the windows command line with:

> cmd /C "set MOJO_MODE=qr && perl .\script\kalamar daemon"

Or in the windows powershell with:

> $env:MOJO_MODE='myconf'; perl .\script\kalamar daemon; Remove-Item Env:\MOJO_MODE

For client-side configurations, a file kalamar.conf.js can be introduced, that will be consulted during the build process, loading optional components using a require(...) directive (see example below).

Secret file

Kalamar uses auto rotating secrets. Allow access to a file called kalamar.secret.json in the home directory of kalamar. It will automatically be created, if it doesn't exist. (kalamar.secret is deprecated.)

Localization

To create a localized version of Kalamar, start the localize command with the target locale as its argument, e.g. pl for polish.

perl script/kalamar localize pl

The newly defined dictionary file can then be modified and added to the resources definition of the Localize plugin in the configuration:

Localize => {
  resources => ['kalamar.pl.dict']
}

To localize example queries according to a special corpus environment, define a name of the example corpus in the configuration.

Kalamar => {
  examplecorpus => 'mycorpus'
}

Then create a translation file based on kalamar.queries.dict as a blueprint and add it to the Localize resource list.

Templates can be localized and customized by overriding the Template dictionary entries.

Currently the JavaScript translations are separated and stored in dev/js/src/loc. To generate assets relying on different locales, add the locale to Gruntfile.js.

To localize the annotation helper according to a special corpus environment, different annotation foundries can be loaded in kalamar.conf.js. For example to support marmot and malt, the configuration may look like this:

require([
  "hint/foundries/marmot",
  "hint/foundries/malt"
]);

See dev/js/src/hint/foundries for more optional foundries.

Customization

The landing page can be customized by overriding the entry for Template_intro in the dictionary.

Some sections of the user interface can be customized by adding new content blocks. Currently the documented sections are in footer, in the bottom line of the user interface, sidebar, in the left part of the user interface if present, headerButtonGroup, in the right top part of the user interface, and loginInfo, below the login form if present.

Plugins

Some plugins are bundled as part of Kalamar. Plugins can be loaded via configuration file in an array

{
  Kalamar => {
    plugins => ['Auth']
  }
}

Currently bundled plugins are

Maintaining

Caching

Kalamar supports CHI for caching, allowing various cache drivers to configure.

To see options for cache maintenance (e.g. to clear search results after index update), run the command

perl script/kalamar chi

Troubleshooting

make not available under Windows

Instead of running

perl Makefile.PL
make test

it is also possible to run the test suite using prove.

prove -lr t

Problem installing Crypt::Random::Source on Windows

Crypt::Random::Source recently removed support for C as a random source, which may lead to missing sources in tests under certain operating systems. You should be able to force install Crypt::Random::Source, though this environment is not recommended for production:

cpanm -f Crypt::Random::Source

Problem installing Mojolicious::Plugin::MailException on Windows

Some versions of Mojolicious::Plugin::MailException have a minor bug in the test suite, so a force install may be necessary.

cpanm -f Mojolicious::Plugin::MailException

Problem running scripts on Windows with Powershell

In case you are having issues with running scripts under Windows, you can set the execution policy with Set-ExecutionPolicy. If using the RemoteSigned execution policy, you can use Unblock-File to allow specific scripts to run.

COPYRIGHT AND LICENSE

Original Software

Copyright (C) 2015-2024, IDS Mannheim
Author: Nils Diewald, Helge Stallkamp
Contributor: Eliza Margaretha (Documentation), Susanne Feix (Translation), Leo Repp

Kalamar is developed as part of the KorAP Corpus Analysis Platform at the Leibniz Institute for the German Language (IDS), member of the Leibniz Association and supported by the KobRA project, funded by the Federal Ministry of Education and Research (BMBF).

Kalamar is free software published under the BSD-2 License.

To cite this work, please refer to:
Diewald, Nils, Barbu Mititelu, Verginica and Kupietz, Marc (2019): The KorAP user interface. Accessing CoRoLa via KorAP. In: Cosma, Ruxandra/Kupietz, Marc (eds.), On design, creation and use of of the Reference Corpus of Contemporary Romanian and its analysis tools. CoRoLa, KorAP, DRuKoLA and EuReCo, Revue Roumaine de Linguistique, 64(3). Editura Academiei Române, Bucharest, Romania.

Bundled Assets

The KorAP logo was designed by Norbert Cußler-Volz is released under the terms of the Creative Commons License BY-NC-ND 4.0. ALERTIFY.js is released under the terms of the MIT License. Almond is released under the terms of the BSD License. dagre is released under the terms of the MIT License. Highlight.js is released under the terms of the BSD License. Jasmine is released under the terms of the MIT License. RequireJS is released under the terms of the BSD License. Font Awesome by Dave Gandy is released under the terms of the SIL OFL 1.1. Benchmark.js is released under the terms of the MIT License. lodash is released under the terms of the MIT License. Platform.js is released under the terms of the MIT License. INTRO.JS is released under the terms of the GNU AFFERO GENERAL PUBLIC LICENSE (GNU AGPLv3).

kalamar's People

Contributors

akron avatar dependabot[bot] avatar hebasta avatar kupietz avatar leo-aus-berlin avatar margaretha avatar r-wilm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kalamar's Issues

Formatting date

Fields with date value such as creationDate and pubDate should be formatted to a more readable text.

corpusByMatch assistant

Similar to the queryByMatch assistant (or "query by example"), that let's the user create queries by choosing annotations in the annotation view, a corpusByMatch assistant can help the user to modify a virtual corpus by choosing metadata values in the metadata view.

For example, when the match contains the metadata field corpusSigle with the value BZK and the user clicks on BZK, a line below the metadata view appears saying Limit corpus to: corpusSigle=BZK. When choosing another field pubDate with a value 1990-02-01, the line is expanded to Limit corpus to: corpusSigle=BZK & pubDate=1990-02-01. By clicking on the same value again, the constraint will be removed.

When the user clicks on the line, the chosen constraints will be anded to the virtual corpus and the window scrolls to the VC builder.

This feature won't be available for stored-only fields. At the moment, it would be possible for string and date/number fields. To work for text and keyword fields, KorAP/Krill#32 needs to be resolved.

Improvement datepicker

For choosing not a exact date, but only a year, the button at the top has to be pressed. It should also be possible to type it in manually.

Show sigles in query results

The sigles give a quick hint on the provenance of the query hits and their distribution wrt to source and time. Without them result lists can be pretty meaningless.

groupByMatch assistant

In combination with the corpusByMatch assistant (#27), and once the grouping mechanism is available in the backend (c.f. #26), it's helpful to establish a groupByMatch assistant in the metadata view.
When the user opens the metadata view of a match and creates a VC by using the metadata values, it's beneficial to support the creation of grouping by fields by letting the user click on the metadata keys to create group criteria.

For example, a match has the metadata fields author and corpusSigle. By clicking on author a line below the metadata view appears saying Group by: field=author. Once the user clicks on the corpusSigle key, the line is expanded to Group by: field=author, field=corpusSigle.
When the user clicks on the line, the user interface switches from search to group, keeping the query, the VC and the QL as before, but adding the group information by field=author, field=corpusSigle. After running the grouping, a list of all matches is grouped by the metadata values of the fields author and corpusSigle.

UI for virtual corpus management

Login-View on small screens

The login view on small screens (like mobile devices) is positioned on top of the screen, while the button (after an unauthorized search is executed) is on the lower left side. On small screens, the button should also be positioned on top and eventually the inactive sidebar as well (to prevent weird animations).

Height of relational view

The height of the relational view as SVG is often too much, especially when there is no arc at all. This should be limited to the heighest arc label in the tree visualisation.

User-settings

The user should be able to choose certain settings to make the search in KorAP more comfortable

  • choose metadata field names, the user is interested in and their order (see also #40). (All metadata field are displayed after clicking at a "more" button)

  • ...

Remember application state on login

Currently, when the user logs in to KorAP, all search information is lost. This is not wanted, in case the user logs in to avoid a certain query rewrite. Therefore, all application state should be remembered during log in.

Mouseover problem in dependency visualization

The arrow heads of green-highlighted relation arcs (appear when mouseovering relation arcs) are missing. It is sometimes hard to determine the relation direction when there are arrow heads at both the start and the end of a relation arc.

Also reported by Marc Kupietz.

Crawl Wiki data to embed in API descriptions

Currently all API descriptions are stored in Wiki pages on GitHub. It would be nice to crawl these pages and embed them in the Kalamar documentation. I would suggest to use a Mojolicious::Command to update the documentation templates after setup.

$ perl script/kalamar fetch_docs

Missing space in VC between key and operator

When choosing a key in the VC creator like author the initial VC is author eq .... After clicking on the menu without selecting an item, followed by deactivating the menu by clicking anywhere else, the VC is missing a space delimitor between the key and the operator, like authoreq ....

Annotation Cheatsheet

Annotation cheatsheet lists all tags for each foundry and layers. It is identical to the annotation helper tool, but in form of a table. While the annotation helper tool is useful for writing query, the cheatsheet is useful when users need to look up for the meaning/description of some annotations, e.g. while analysing the results.

CoreNLP

Layers Tags Description
Lemma AA Superlative phrase with "am"
AP Adjective phrase
Named entity I-LOC Location

It should be separate pages (not in the query results) in the help/tutorial.
Since it contains the same content as the annotation helper, it should be able to be updated automatically, shouldn't it?

An alternative to the cheatsheet tables could be tooltips activated by mouse hovering over the annotation tags in a match info table view and a visualization/tree view.

Show all metadata by one click on the reference

Currently not all metadata fields are shown, although they may be part of a fields request.
For all texts, all metadata fields should be retrievable in a table view.
In addition, fields may be shown as a KWIC column, see #13 .

Add Shibboleth authorization endpoint

Probably as a plugin, it is required to implement an authorization endpoint for Shibboleth SSO.
The implementation will be as followed:

  • Provide an endpoint that is secured by Shibboleth Auth in Apache
  • After IdentityProvider roundtrip the Shibboleth session information is collected and forwarded to Kustvakt (when login is successful)
  • Kustvakt returns a session token as with ldap auth

Recommend the VC limitation to documents annotated by a certain foundry/layer, when requested in the query

Whenever a query contains specific annotations possibly not available for all documents (e.g. [mate/l=Baum] in Poliqarp), the user should get a notification that it may be useful to limit the virtual corpus to only contain documents annotated with the requested foundry/layer.
A simple button for this limitation should be attached to the virtual corpus helper.

This requirement may potentially be set by default by the user, meaning that whenever a user does a query, the VC is automatically rewritten in that way.

This feature would need a modifications in Koral and KoralQuery (see KorAP/Koral#27 ).

The feature was requested by franck.

Improve UX with disabled Javascript

Kalamar should be usable without JavaScript enabled - at least to a certain degree. Currently there are two important limitations:

  • The login tab doesn't stay open in case the user want's to log in
    • possible solution: The log in screen is prepended on top like in the handhelt view
  • KWIC matches can't be enlarged to snippet view
    • This worked before so I guess it may be possible to reenable this function

Integrate Piwik

Can be trivial - but can be cleverly complicated (e.g. for Query construct statistics).

(This was copied from the GDoc "[KorAP] Necessary Tasks for the 2nd Frontend to become potentially production ready" and Trac Ticket #221)

Formatting keywords in metadata view

The value of foundries field should be listed in a more readable way.

Example of foundries value
corenlp corenlp/constituency corenlp/morpho corenlp/sentences dereko dereko/structure dereko/structure/base-sentences-paragraphs-pagebreaks malt malt/dependency opennlp opennlp/morpho opennlp/sentences

"no permission to access /kalamar on this server"

Reposting Pavel Stranak's message from the blog:

Hi guys,
I would really like to try Kalamar, but I am getting “Forbidden. You don’t have permission to access
/kalamar on this server.”
Pavel

@pavel: Thanks for the note. Would you please say how you attempted the connection?

It makes a bit of a difference whether you install with admin priviledges or not; in the latter case, you have to follow the installation messages a bit more closely to make sure that all the dependencies are met, so my .bashrc has the following two lines now:

added at the request of perlbrew

source ~/perl5/perlbrew/etc/bashrc

added at the request of cpanm

cpanm --local-lib=~/perl5 local::lib && eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)

I run it by doing Kalamar$ morbo script/kalamar and it's there, on http://localhost:3000/ .

Rewrite KwiC code to use flexbox

The current table/flow mechanism for KwiC view is quite inflexible, when multiple left columns are shown or the sigle column is optional (see #13). I expect a more flexible implementation using CSS flexbox.

Last characters of a query may be hidden behind the query button

From @EleFri

Wenn ich eine sehr lange Query eingebe, dann sehe ich die letzten paar Zeichen/Buchstaben nicht mehr. Auch wenn ich den Rechts-Pfeil benutze, bleiben die letzten Zeichen immer noch hinter dem Suchbutton versteckt. Das Problem habe ich nur auf meinem kleinen Laptop-Bildschirm, weil auf anderen großen Bildschirmen der Textfeld für die Suchanfrage so lang ist, dass es zu diesem Problem gar nicht kommt. Hilfreich für kleine Bildschirme wie meiner wäre wahrscheinlich ein zweizeiliger Textfeld für Suchanfragen.

I can confirm the Problem in Firefox for every view but the mobile view.

Migrate from Grunt to WebPack

Grunt is a monolithic module builder, meaning all scripts are bundled in one large file. This is problematic with large dependencies that are not needed from start, e.g. libraris for tree visualization or statistic visualizations like d3.

WebPack is able to load dependencies asynchronously in chunks (using the already used AMD/commonJS formalism). So in the beginning this should load the page faster and use less memory on the client side.

Non-hierarchical relations do not zoom properly

When a non-hierarchical relation is open and the viewport is on a scale other than 100%, the arrows miss their targets. This is not only true when the user zooms interactively, but also when the zoom is different before the relation view is openend.

Integrate Collocation Search

Currently there is no option for collocation search (using the Neo4j-backend in KorAP) - should be implemented. Doesn’t sound tooo complicated. However, the response formats should be more coherent.

(This was copied from the GDoc "[KorAP] Necessary Tasks for the 2nd Frontend to become potentially production ready" and the Trac Ticket #219.)

Virtual Corpus info

Virtual corpus size should be shown both for private Virtual corpora in a user page and when specified together with a search query. Other info regarding the VC such as the document titles, dates, etc should be available as well. In Cosmas 2, such info can be exported.

Has there been an open issue regarding corpus browsing already?

Corpus browsing

I choose the term corpus browsing for the "Korpuspräsentationen" in Cosmas II, because I think it fits best to the display of corpus information like document titles, dates, country of origin, etc. .

Fix Prefix-Cleaning in annotation assistant

Currently, in case a prefix search is used to filter entries in the annotation assistant, the browser won't clean this information when the annotation helper is reopened at another position. This bug was probably introduced by the hint test suite fix in #46 .

Fix Annotation Assistant in WebKit

While Firefox allows to click through the annotation assistant stepwise (foundry -> layer -> key -> value), WebKit stops on each level and with a click only the annotation is added to the searchbar, but the new menu keeps being closed.

Using brat as the annotation visualisation library

WebAnno seems to use the client part of brat to show visualizations of annotations.
If we choose to use brat for dependency trees, we should also adopt it for constituency trees.
But as we don't need interaction, writing our own tree visualization may not be too hard.
Currently Kalamar uses Dagre.

Integrate User Management

It’s not absolutely clear yet, what this will include.
Login, registration, changing the password, email etc. will probably be served by the middleware. Integration in the 2nd frontend may not be necessary.
Now it seems that the middleware will even serve vcs, user defined queries etc. So the user management for the second frontend needs excellent caching mechanisms to make this work without a database close by. Search properties however will be served by the frontend.

(This was copied from the GDoc "[KorAP] Necessary Tasks for the 2nd Frontend to become potentially production ready" and the Trac Ticket #216)

Relation menu keeps focus after click

When the relation menu is opened in the match view, and a relation is chosen (like malt/d), the menu keeps the focus, so when the user wants to interact with the browser window (e.g. Ctrl-+ for zoom), the interaction is captured.

Integrate Feedback Form

Users should have an easy way to report problems - like with the contact system in the first frontend. For the moment, we could simply force users to use GitHub issues, but I don't know if this will work for all (especially, if they already have KorAP-accounts and are identifiable).

(This was copied from the GDoc "[KorAP] Necessary Tasks for the 2nd Frontend to become potentially production ready" and Trac Ticket #220)

Improve Pagination

The current pagination helper makes it complicated to jump to a concrete page by number. So an additional button should allow to type in a page number directly - probably as a replacement to an ellipsis symbol.
(Reported by @EleFri )

Reintroduce layer sorting in table view

In older versions of Kalamar, token annotations in table views were sortable by foundry and by layer. This should be possible in newer versions of Kalamar, that include tree views, as well.

Fix hint tests in Microsoft Edge

Currently the hint test suite does not pass all tests in Microsoft Edge. Because there a quite a lot of Windows users of KorAP, I expect a fix is urgent.

Webkit-Bug in Calendar-Widget

Reported by @michaelhanl:
When choosing a year or a month in the calendar widget, the builded query disappears.

It would also be nice to have the ability to type in years directly.

Fix ordering of tree items in constituency trees

Sometimes, when a match is sentence expended and the second sentence starts at the beginning of a paragraph, the paragraph is ranked below the sentence, although the tree-depth is closer to the root.

Example text:

<p>
  <s>a</s>
  <s>b</s>
</p>
<p>
  <s>x</s>
  <s>y</s>
</p>

In case the match is bx and the match is sentence expanded, the tree will look like that:

 /^\
|   s
|   |
s   p
|   |
b   x

A better solution would be to ignore the paragraph, in case it ends after the match, resulting in

 /^\
s   s
|   |
b   x

Making metadata field names more understandable

The metadata field names in the metadata view and the VC builder are for the user quite abundant and hard to understand, for example „pubDate“ or „corpusEditor“.

There could be different solutions for that:

  • a mapping to more intelligible names

  • a link to a cheat table in the metadata view and the VC builder, which displays the metadata field names in a more understandable way

  • descriptions to all metadata fields, that would be listed in the VC builder (see also descriptions in the annotation helper) and could be used for mouse overs in the metadata view

  • User-settings: The user can define the metadata fields, he is interested in for meta data fetching. To see all meta data fields he clicks on a „more“ button. He can also choose a order of the metadata fields.

  • More ideas are welcome ...

There are different point to consider by discussing the mapping/cheat table solution versus the descriptions of all metadata fields:
For all solutions localization has be kept in mind.
KorAP should be designed user-friendly and easy to use. The metadata field names and their abundance are for some users confusing. KorAP should not only be designed for the expert user, but serve the needs for different kind of users. Using mouse overs in the metadata view is laborious.

On the other hand descriptions can be as elaborate as necessary. Users using the cheat sheet couldn't talk about metadata with the other users, as they would talk about different names. Specific metadata names may not only be used by annotators, but also by other corpus analysis tools - and these names are used project wide. KorAP has also users who are organized in groups that compile their own corpora. It is therefore an straightforward solution to display the internal field names.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.