korap / kalamar Goto Github PK
View Code? Open in Web Editor NEW:octopus: Mojolicious-based Frontend for KorAP
License: BSD 2-Clause "Simplified" License
:octopus: Mojolicious-based Frontend for KorAP
License: BSD 2-Clause "Simplified" License
I choose the term corpus browsing for the "Korpuspräsentationen" in Cosmas II, because I think it fits best to the display of corpus information like document titles, dates, country of origin, etc. .
Sometimes, identical errors are listed in KoralQuery multiple times. These errors or notifications should be merged to only notify the user once.
Kalamar should be usable without JavaScript enabled - at least to a certain degree. Currently there are two important limitations:
In the token annotation view, the foundry/layer definitions shouldn't be sticky for small screens, as otherwise the annotations are not visible.
Reported by @michaelhanl:
When choosing a year or a month in the calendar widget, the builded query disappears.
It would also be nice to have the ability to type in years directly.
The current table/flow mechanism for KwiC view is quite inflexible, when multiple left columns are shown or the sigle column is optional (see #13). I expect a more flexible implementation using CSS flexbox.
The user should be able to choose certain settings to make the search in KorAP more comfortable
choose metadata field names, the user is interested in and their order (see also #40). (All metadata field are displayed after clicking at a "more" button)
...
From @EleFri
Wenn ich eine sehr lange Query eingebe, dann sehe ich die letzten paar Zeichen/Buchstaben nicht mehr. Auch wenn ich den Rechts-Pfeil benutze, bleiben die letzten Zeichen immer noch hinter dem Suchbutton versteckt. Das Problem habe ich nur auf meinem kleinen Laptop-Bildschirm, weil auf anderen großen Bildschirmen der Textfeld für die Suchanfrage so lang ist, dass es zu diesem Problem gar nicht kommt. Hilfreich für kleine Bildschirme wie meiner wäre wahrscheinlich ein zweizeiliger Textfeld für Suchanfragen.
I can confirm the Problem in Firefox for every view but the mobile view.
Fields with date value such as creationDate and pubDate should be formatted to a more readable text.
Users should have an easy way to report problems - like with the contact system in the first frontend. For the moment, we could simply force users to use GitHub issues, but I don't know if this will work for all (especially, if they already have KorAP-accounts and are identifiable).
(This was copied from the GDoc "[KorAP] Necessary Tasks for the 2nd Frontend to become potentially production ready" and Trac Ticket #220)
Add user interfaces for user group management:
See https://github.com/KorAP/Kustvakt/wiki for information about supported web-services.
The current pagination helper makes it complicated to jump to a concrete page by number. So an additional button should allow to type in a page number directly - probably as a replacement to an ellipsis symbol.
(Reported by @EleFri )
The login view on small screens (like mobile devices) is positioned on top of the screen, while the button (after an unauthorized search is executed) is on the lower left side. On small screens, the button should also be positioned on top and eventually the inactive sidebar as well (to prevent weird animations).
Currently not all metadata fields are shown, although they may be part of a fields
request.
For all texts, all metadata fields should be retrievable in a table view.
In addition, fields may be shown as a KWIC column, see #13 .
Currently the hint test suite does not pass all tests in Microsoft Edge. Because there a quite a lot of Windows users of KorAP, I expect a fix is urgent.
Currently, when the user logs in to KorAP, all search information is lost. This is not wanted, in case the user logs in to avoid a certain query rewrite. Therefore, all application state should be remembered during log in.
Annotation cheatsheet lists all tags for each foundry and layers. It is identical to the annotation helper tool, but in form of a table. While the annotation helper tool is useful for writing query, the cheatsheet is useful when users need to look up for the meaning/description of some annotations, e.g. while analysing the results.
CoreNLP
Layers | Tags | Description |
---|---|---|
Lemma | AA | Superlative phrase with "am" |
AP | Adjective phrase | |
Named entity | I-LOC | Location |
It should be separate pages (not in the query results) in the help/tutorial.
Since it contains the same content as the annotation helper, it should be able to be updated automatically, shouldn't it?
An alternative to the cheatsheet tables could be tooltips activated by mouse hovering over the annotation tags in a match info table view and a visualization/tree view.
Currently all API descriptions are stored in Wiki pages on GitHub. It would be nice to crawl these pages and embed them in the Kalamar documentation. I would suggest to use a Mojolicious::Command
to update the documentation templates after setup.
$ perl script/kalamar fetch_docs
In case regular expression matching is tested for fields that do not support it, the builder crashes after searching and isn't able to recover.
When choosing a key in the VC creator like author
the initial VC is author eq ...
. After clicking on the menu without selecting an item, followed by deactivating the menu by clicking anywhere else, the VC is missing a space delimitor between the key and the operator, like authoreq ...
.
When the relation menu is opened in the match view, and a relation is chosen (like malt/d
), the menu keeps the focus, so when the user wants to interact with the browser window (e.g. Ctrl-+
for zoom), the interaction is captured.
Whenever a query contains specific annotations possibly not available for all documents (e.g. [mate/l=Baum]
in Poliqarp), the user should get a notification that it may be useful to limit the virtual corpus to only contain documents annotated with the requested foundry/layer.
A simple button for this limitation should be attached to the virtual corpus helper.
This requirement may potentially be set by default by the user, meaning that whenever a user does a query, the VC is automatically rewritten in that way.
This feature would need a modifications in Koral and KoralQuery (see KorAP/Koral#27 ).
The feature was requested by franck.
Add user interfaces for virtual corpus (vc) related functions, including:
VC admin functions:
When a non-hierarchical relation is open and the viewport is on a scale other than 100%, the arrows miss their targets. This is not only true when the user zooms interactively, but also when the zoom is different before the relation view is openend.
The value of foundries
field should be listed in a more readable way.
Example of foundries
value
corenlp corenlp/constituency corenlp/morpho corenlp/sentences dereko dereko/structure dereko/structure/base-sentences-paragraphs-pagebreaks malt malt/dependency opennlp opennlp/morpho opennlp/sentences
Currently there is no option for collocation search (using the Neo4j-backend in KorAP) - should be implemented. Doesn’t sound tooo complicated. However, the response formats should be more coherent.
(This was copied from the GDoc "[KorAP] Necessary Tasks for the 2nd Frontend to become potentially production ready" and the Trac Ticket #219.)
While Firefox allows to click through the annotation assistant stepwise (foundry -> layer -> key -> value), WebKit stops on each level and with a click only the annotation is added to the searchbar, but the new menu keeps being closed.
Probably as a plugin, it is required to implement an authorization endpoint for Shibboleth SSO.
The implementation will be as followed:
In older versions of Kalamar, token annotations in table views were sortable by foundry and by layer. This should be possible in newer versions of Kalamar, that include tree views, as well.
Reposting Pavel Stranak's message from the blog:
Hi guys,
I would really like to try Kalamar, but I am getting “Forbidden. You don’t have permission to access
/kalamar on this server.”
Pavel
@pavel: Thanks for the note. Would you please say how you attempted the connection?
It makes a bit of a difference whether you install with admin priviledges or not; in the latter case, you have to follow the installation messages a bit more closely to make sure that all the dependencies are met, so my .bashrc has the following two lines now:
added at the request of perlbrew
source ~/perl5/perlbrew/etc/bashrc
added at the request of cpanm
cpanm --local-lib=~/perl5 local::lib && eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)
I run it by doing Kalamar$ morbo script/kalamar
and it's there, on http://localhost:3000/ .
Currently, in case a prefix search is used to filter entries in the annotation assistant, the browser won't clean this information when the annotation helper is reopened at another position. This bug was probably introduced by the hint test suite fix in #46 .
The metadata field names in the metadata view and the VC builder are for the user quite abundant and hard to understand, for example „pubDate“ or „corpusEditor“.
There could be different solutions for that:
a mapping to more intelligible names
a link to a cheat table in the metadata view and the VC builder, which displays the metadata field names in a more understandable way
descriptions to all metadata fields, that would be listed in the VC builder (see also descriptions in the annotation helper) and could be used for mouse overs in the metadata view
User-settings: The user can define the metadata fields, he is interested in for meta data fetching. To see all meta data fields he clicks on a „more“ button. He can also choose a order of the metadata fields.
More ideas are welcome ...
There are different point to consider by discussing the mapping/cheat table solution versus the descriptions of all metadata fields:
For all solutions localization has be kept in mind.
KorAP should be designed user-friendly and easy to use. The metadata field names and their abundance are for some users confusing. KorAP should not only be designed for the expert user, but serve the needs for different kind of users. Using mouse overs in the metadata view is laborious.
On the other hand descriptions can be as elaborate as necessary. Users using the cheat sheet couldn't talk about metadata with the other users, as they would talk about different names. Specific metadata names may not only be used by annotators, but also by other corpus analysis tools - and these names are used project wide. KorAP has also users who are organized in groups that compile their own corpora. It is therefore an straightforward solution to display the internal field names.
Currently both specifications for hint (the annotation assistant) and querycreator (the QueryByMatch assistant) fail in chrome.
Display of statistical information of virtual corpora (number of documents, texts, tokens, sentences)
Sometimes queries result in backend failures but are reported as "no matches found" instead of passing the error notification to the user.
Example queries, see KorAP/Krill#38 .
The sigles give a quick hint on the provenance of the query hits and their distribution wrt to source and time. Without them result lists can be pretty meaningless.
Add title
attributes to field names in meta view and probably widen the field name columns a bit.
Some features in Kalamar are rather hidden, like the annotation helper. An introduction tutorial like IntroJS may help new users to understand base functionalities of KorAP.
The arrow heads of green-highlighted relation arcs (appear when mouseovering relation arcs) are missing. It is sometimes hard to determine the relation direction when there are arrow heads at both the start and the end of a relation arc.
Also reported by Marc Kupietz.
For choosing not a exact date, but only a year, the button at the top has to be pressed. It should also be possible to type it in manually.
Similar to the queryByMatch assistant (or "query by example"), that let's the user create queries by choosing annotations in the annotation view, a corpusByMatch assistant can help the user to modify a virtual corpus by choosing metadata values in the metadata view.
For example, when the match contains the metadata field corpusSigle
with the value BZK
and the user clicks on BZK
, a line below the metadata view appears saying Limit corpus to: corpusSigle=BZK
. When choosing another field pubDate
with a value 1990-02-01
, the line is expanded to Limit corpus to: corpusSigle=BZK & pubDate=1990-02-01
. By clicking on the same value again, the constraint will be removed.
When the user clicks on the line, the chosen constraints will be anded to the virtual corpus and the window scrolls to the VC builder.
This feature won't be available for stored-only fields. At the moment, it would be possible for string and date/number fields. To work for text and keyword fields, KorAP/Krill#32 needs to be resolved.
Virtual corpus size should be shown both for private Virtual corpora in a user page and when specified together with a search query. Other info regarding the VC such as the document titles, dates, etc should be available as well. In Cosmas 2, such info can be exported.
Has there been an open issue regarding corpus browsing already?
It’s not absolutely clear yet, what this will include.
Login, registration, changing the password, email etc. will probably be served by the middleware. Integration in the 2nd frontend may not be necessary.
Now it seems that the middleware will even serve vcs, user defined queries etc. So the user management for the second frontend needs excellent caching mechanisms to make this work without a database close by. Search properties however will be served by the frontend.
(This was copied from the GDoc "[KorAP] Necessary Tasks for the 2nd Frontend to become potentially production ready" and the Trac Ticket #216)
The height of the relational view as SVG is often too much, especially when there is no arc at all. This should be limited to the heighest arc label in the tree visualisation.
Can be trivial - but can be cleverly complicated (e.g. for Query construct statistics).
(This was copied from the GDoc "[KorAP] Necessary Tasks for the 2nd Frontend to become potentially production ready" and Trac Ticket #221)
Sometimes, when a match is sentence expended and the second sentence starts at the beginning of a paragraph, the paragraph is ranked below the sentence, although the tree-depth is closer to the root.
Example text:
<p>
<s>a</s>
<s>b</s>
</p>
<p>
<s>x</s>
<s>y</s>
</p>
In case the match is bx
and the match is sentence expanded, the tree will look like that:
/^\
| s
| |
s p
| |
b x
A better solution would be to ignore the paragraph, in case it ends after the match, resulting in
/^\
s s
| |
b x
Grunt is a monolithic module builder, meaning all scripts are bundled in one large file. This is problematic with large dependencies that are not needed from start, e.g. libraris for tree visualization or statistic visualizations like d3.
WebPack is able to load dependencies asynchronously in chunks (using the already used AMD/commonJS formalism). So in the beginning this should load the page faster and use less memory on the client side.
In the case the query exceeds the input box, the annotation assistant hint (the orange box) won't follow the cursor position anymore.
Fixing this is probably hard, because the position line for the query is fixed width at the moment.
In combination with the corpusByMatch assistant (#27), and once the grouping mechanism is available in the backend (c.f. #26), it's helpful to establish a groupByMatch assistant in the metadata view.
When the user opens the metadata view of a match and creates a VC by using the metadata values, it's beneficial to support the creation of grouping by fields by letting the user click on the metadata keys to create group criteria.
For example, a match has the metadata fields author
and corpusSigle
. By clicking on author
a line below the metadata view appears saying Group by: field=author
. Once the user clicks on the corpusSigle
key, the line is expanded to Group by: field=author, field=corpusSigle
.
When the user clicks on the line, the user interface switches from search
to group
, keeping the query, the VC and the QL as before, but adding the group information by field=author, field=corpusSigle
. After running the grouping, a list of all matches is grouped by the metadata values of the fields author
and corpusSigle
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.