providedh / collaborative-platform Goto Github PK

Collaboration made easy

License: GNU Affero General Public License v3.0

Python 19.91% CSS 0.76% HTML 2.48% JavaScript 64.62% XSLT 0.50% Dockerfile 0.02% Shell 0.05% Jupyter Notebook 10.77% Sass 0.33% SCSS 0.56%

collaborative-platform's People

Watchers

Forkers

exploration-space

collaborative-platform's Issues

Make setup instructions IDE-agnostic

As of now, the README contains instructions for PyCharm IDE.
These instructions should work the same for users who do not work with PyCharm.
I think the best thing to do is just to use shell commands.

Simplified registration system for the workshops

Prepare a branch with simplified registration system just for the workshops, as Austrian party requested.

Changes in annotator needed for Ingredient integration

We have to make some changes in annotator to incorporate Recipes usecase.

The plan is to use <listObject>'s as lists of ingriedients, utensils ets., and in the text of the recipe to annotate only with <objectName> tag. Here is the example:

<body>
            <div type="ingredient">
                <listObject>
                    <object type="ingredient" xml:id="egg">Egg</object>
                    <object type="ingredient" xml:id="sugar">Sugar</object>
                </listObject>
            </div>
            <div type="utensil">
                <listObject>
                    <object type="utensil" xml:id="spoon">Spoon</object>
                </listObject>
            </div>
            <div type="recipe">
                <p>
                    Take three <objectName ref="egg">eggs</objectName>, mix with three <objectName ref="spoon">spoons</objectName> of <objectName ref="sugar">sugar</objectName>. Bon Appetite!
               </p>
            </div>
 </body>

Desired behaviour is that when user annotates new ingredient/utensil/etc., backend will automatically add it to the adequate list, so annotated recipes will always start with proper lists of requisites.
To enable user to annotate in this manner we need changes in annotator.
@Janchorizo, we need changes in annotator frontend. In "Add TEI annotations" tab, when Ingridient, Utensil or ProductionMethod is choosen, we want a list to appear, with avaliable choices of ingridients, utensils or production methods already avaliable in the lists in the file. We can leave parsing those lists into options to you, or we can provide you with the API endpoint returning those. Another option in the list should be "Add new". In case user choses it, the next textField should appear, allowing user to enter name for the new ingriedient/utensil/etc.
Request created by annotator should consist of positions like before, ref set in attribute_name parameter, and asserted_value containing id of object from the list if user chosen an object already avaliable in the file, or the new name inserted by the user otherwise.
@bug-rancher is already changing annotator backend accordlingly.
Also, we'd like @Janchorizo to render those list in more readable manner, possibly adding some headers, and spliting into separate lines.

Allow to convert plain text to TEI format

Provide the user with the ability to convert plain-text documents to the TEI format.

Models not being created for api_vis

During the migration to using to docker for development but have had some issues regarding the database.
For some reason migrations were not being applied. When creating the migrations individually, Django raised the following exception for the api_vis
model:

  File "/usr/local/lib/python3.7/site-packages/django/db/migrations/loader.py", line 49, in __init__
    self.build_graph()
  File "/usr/local/lib/python3.7/site-packages/django/db/migrations/loader.py", line 275, in build_graph
    self.graph.ensure_not_cyclic()
  File "/usr/local/lib/python3.7/site-packages/django/db/migrations/graph.py", line 274, in ensure_not_cyclic
    raise CircularDependencyError(", ".join("%s.%s" % n for n in cycle))
django.db.migrations.exceptions.CircularDependencyError: projects.0001_initial, api_vis.0001_initial

Seems to me like there is some kind of circular dependency. I would like to know if you have had any similar problems.

Style the documents in the annotator.

At least divide text to divisions (div elements), headers (head elements) and paragraphs (p elements).

Add visual feedback for the file upload process

Add visual hints for the drag & drop interaction.
Add progress bar for the uploading process.

Create a landing page with a brief explanation on certainty annotation in the platform

Add a descriptive landing page for newcomers to get an understanding of the
available functionality.

Extract pre-annotated TEI files using the food recipes relational database

The relational database already holds the definition of ingredients in on the table. This can be used to have ingredients (and other entities) already annotated in the TEI documents.

Ask for the database dump.

Allowing a/b testing in the platform.

Implementing a decorator for Django views that allows to do a/b testing by providing a set of static files to be sorted.

Missing "locus": "value" in request when adding a refference to element of the list

As the title

User should define names, icons and colours for the TEI entities.

Related to #32

Given that there is already a model for storing the preferences for displaying uncertainty,
we could also provide the ability to define what entities are to be displayed, along with the
colors and icons that should be used.

This could have other implications such as the fact that newer entities would be needed to
be stored and indexed.

This issue could be associated just with the styling and the indexing be done further in time.

Do continuous integration and deployment

Close reading history endpoint not retrieving annotations

As the title.

Create visualization dashboard app to support corpus analysis

Created and integrated the dashboard framework
Designed and implemented the views

Create variations for the date-related distant-reading visualization

Dependent on #8

Create a pixel-oriented visualization for distant-reading of the corpus

Implement and integrate the proposed pixel-oriented visualization.
Store the processed results and process documents upon upload and update.

Replace the color scheme for uncertainty entities

One of the needed changes related to #14

Add the ability to modify the color scheme for entities.

Add the option for modifying the current entity list of colours and icons. It would require to make choices permanent.

Close reading history API call no longer retrieves the contributor.

In branch api-vis.

Calls to the API retrieve file versions with the contributor field empty.

{"url": "https://providedh.ehum.psnc.pl/files/6/version/1/", "version": 1, "ignorance": 0, 
"timestamp": "2019-12-17 15:38:46.520574+00:00", "variation": 0, "contributor": "", 
"credibility": 1, "imprecision": 1, "incompleteness": 2}, 

{"url": "https://providedh.ehum.psnc.pl/files/6/version/2/", "version": 2, "ignorance": 0, 
"timestamp": "2019-12-17 15:38:46.599874+00:00", "variation": 0, "contributor": "", 
"credibility": 1, "imprecision": 1, "incompleteness": 2}, 

{"url": "https://providedh.ehum.psnc.pl/files/6/version/3/", "version": 3, "ignorance": 0,
 "timestamp": "2019-12-17 15:57:55.359992+00:00", "variation": 0, "contributor": "", 
"credibility": 1, "imprecision": 1, "incompleteness": 2}, 

{"url": "https://providedh.ehum.psnc.pl/files/6/version/4/", "version": 4, "ignorance": 0, 
"timestamp": "2019-12-17 16:07:45.151583+00:00", "variation": 0, "contributor": "", 
"credibility": 1, "imprecision": 1, "incompleteness": 2}]}```

Missing field in Annotator `create` request

During unification in Annotator (set Attribute name to sameAs), there is no field attribute_name in request send to WebSocket.

In Annotator `References` field didn't show up

If user at first write "sameAs" in Attribute name field, and then change Tag name to Person, References field will not show up.

Annotation creation fails for files containing non-valid xml:id values.

collaborative-platform/src/collaborative_platform/apps/close_reading/annotator.py

Lines 584 to 592 in 1e9f8e8

 def __create_certainty_description(self, json, annotation_ids, user_uuid): 

 target = " ".join(annotation_ids) 

 xml_id = f"certainty_{self.__file.name}-{self.__certainty_xml_id_number}" 

 categories = " ".join([get_ana_link(self.__file.project_id, cat) for cat in json["categories"]]) 

 certainty = f'<certainty ana="{categories}" locus="{json["locus"]}" cert="{json["certainty"]}" ' \ 

 f'resp="#{user_uuid}" target="{target}" xml:id="{xml_id}"/>' 

 new_element = etree.fromstring(certainty)

collaborative-platform/src/collaborative_platform/apps/close_reading/annotator.py

Lines 605 to 614 in 1e9f8e8

 def __create_certainty_description_for_attribute(self, json, annotation_ids, user_uuid): 

 target = " ".join(annotation_ids) 

 xml_id = f"certainty_{self.__file.name}-{self.__certainty_xml_id_number}" 

 categories = " ".join([get_ana_link(self.__file.project_id, cat) for cat in json["categories"]]) 

 certainty = f'<certainty ana="{categories}" locus="{json["locus"]}" cert="{json["certainty"]}" ' \ 

 f'resp="#{user_uuid}" target="{target}" match="@{json["attribute_name"]}"' \ 

 f'assertedValue="{json["asserted_value"]}" xml:id="{xml_id}"/>' 

 new_element = etree.fromstring(certainty)

Annotation fails when an xml element is attempted to be created for files which name contains
non valid xml:id values sucha as spaces, semicolons, or others.

This error does not raise an exception that I could see and no response is sent. However,
consecutive calls will fail as the websocket closes after a Timer Exception is raised.

Wrong message after file save in Annotator

Annotator after saving file should display message returned in message field in response. Now even when response has 304 code, Annotator display message "Changes successfully saved.".

In Annotator "create" button doesn't work if "certainty list" and "legend" are hidden.

As in title.

ID's shown n annotator are prefixed with "xxxx"

In annotator, when annotatnions starts on a beginning of a line, the icon is placed at athe end of line above.

Incorrect fragment positions in Annotator "create" request

When user try to add a second tag to tagged fragment, fragment positions are incorrect.
This bug occur on Firefox, but not on Chromium.

How to replicate bug:
On branch "reference_to_element_of_the_list"
Tested on "recipe_0.xml" file: https://raw.githubusercontent.com/providedh/ACDH_Salzburg_recipes/master/outputs/recipe_0.xml

Add first annotation to "Guetten" fragment:

"tag_name": "Ingredient"
"attribute_name": "ref"
"asserted_value": "object_recipe_0_xml-1"

Posisions in this request pointing to "Guetten" fragment, so it's ok

Add second annotation to annotated "Guetten" fragment:

"tag_name": "Ingredient"
"attribute_name": "ref"
"asserted_value": object_recipe_0_xml-2

Positions in this request pointing to "objectName ref="#object_recipe_0_xml-1">Guetten". Selected fragment can't start or end in the middle of the tag.

Allow the category attribute to have none or multiple values

The platform should handle multiple values for the category attribute. This includes
indexing, visual hints and form options in the annotator, and back-end support for
such annotation requests.

Change TEI specification if needed.
Front-end support for the annotator.
Back-end support for the annotator.
Front-end support for the TEI stats app.
Back-end support for the TEI stats app.

Add an algorithm analyzing entities in project for likelihood of being the same

Better color schemes and glyphs for representing uncertainty in annotator

As of now, we are using the scheme of underlining + icons in the annotator to indicate the different kinds/degrees of uncertainty in the text. However, now that we are supporting 5 levels + 2 categories, I've come to think this approach we have been using might be no longer valid.

I wonder if there's a better combination of colors/glyphs/other techniques to convey the 4 (with their respective 5 levels) + 1 categories in the annotator. For example, we could map categories to different text underline styles as illustrated here. For selecting color maps we could fall back to VSUPs: explanation and d3 code (or better, a modification of it).

Annotator integration

Needed changes to finish migration of previous annotation app to the new platform.

Create stream-graph on timeline to show uncertainty evolution.
Apply previous changes to API.
Fix save and history AJAX calls (dependent on previous point).
Add alerts for document load, reload, save, annotation, etc.
Solve branch for automatic merge.

Creating uncertainty annotations creates ingredient tags.

If the asserted value for locus=name or the tag name is either ingredient, utensil,
or productionMethod, the a tag with such name is created. This is not the desired
behavior if we were to use lists of objects and the correspondent objectName tag in
the body.

Update markups and side view in Annotator with unifications from database

Because now we keep all entity unifications and certainties added to this unifications in database instead of xml file, we need to render this elements in Annotator additionally. Message sent by WebSockets was extended by one more field certainties_from_db, with list of unifications and certainties inside. Every unification/certainty has extra boolean field committed. Committed unifications and certainties for them are sent to all connected users, and uncommitted unifications and certainties for them are sent only to theirs author.

Because of there is a lot of approaches to convert xml to json, I used this standard: https://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html (same as in "Retrieve metadata from header of a TEI file" point of file: https://docs.google.com/document/d/102pYWR1t7Ve2mjuFC5juithyVGAPVqFAPa54qq_yn-M/edit#heading=h.uqg41mc3u89)

Example content of certainties_from_db field:

"certainties_from_db":
[
  {
    "certainty": 
    {
      "@ana": "", 
      "@locus": "value", 
      "@cert": "high", 
      "@resp": "#person2", 
      "@target": "#person_dep_833105r082_tei_depositions_plus_original-1", 
      "@match": "@sameAs", 
      "@assertedValue": "#person_dep_833105r082_tei_depositions_plus_original-2 #person_dep_833105r082_tei_depositions_plus_original-3 Project_1/dep_834165r133_tei_depositions_plus_original.xml#person_dep_834165r133_tei_depositions_plus_original-1 Project_1/dep_834165r133_tei_depositions_plus_original.xml#person_dep_834165r133_tei_depositions_plus_original-2 Project_1/dep_834165r133_tei_depositions_plus_original.xml#person_dep_834165r133_tei_depositions_plus_original-3", 
      "@xml:id": "certainty_dep_833105r082_tei_depositions_plus_original.xml-1"
    }, 
    "committed": true
  }, 
  {
    "certainty": 
    {
      "@ana": "https://providedh.ehum.psnc.pl/api/projects/1/taxonomy/#ignorance", 
      "@locus": "value", 
      "@cert": "medium", 
      "@resp": "#person2", 
      "@target": "#certainty_dep_833105r082_tei_depositions_plus_original.xml-1", 
      "@match": "@sameAs", 
      "@xml:id": "certainty_dep_833105r082_tei_depositions_plus_original.xml-64", 
      "@assertedValue": "some asserted value", 
      "desc": "awesome description"
    }, 
    "committed": true
  }, 
  ...
]

Target in new xml:id-based annotations don't start with #.

The following excerpt is the result of creating a text selection annotation, and a second one using
the sidebar to add one by the id.

<classCode scheme="http://providedh.eu/uncertainty/ns/1.0">
          <certainty 
                  ana="https://example.com/api/projects/1/taxonomy/#ignorance https://example.com/api/projects/1/taxonomy/#credibility https://example.com/api/projects/1/taxonomy/#imprecision https://example.com/api/projects/1/taxonomy/#incompleteness" 
                  locus="value" 
                  cert="unknown"                   
                  resp="#person1" 
                  target="#date_dep_821026r012_tei__1__xml-1" 
                  xml:id="certainty_dep_821026r012_tei__1__xml-2" 
                  assertedValue="asd"/>
          <certainty 
                  ana="https://example.com/api/projects/1/taxonomy/#credibility https://example.com/api/projects/1/taxonomy/#imprecision" 
                  locus="name" 
                  cert="very high" 
                  resp="#person1" 
                  target="date_dep_821026r012_tei__1__xml-1" 
                  xml:id="certainty_dep_821026r012_tei__1__xml-4"/>
</classCode>

Support 5 different levels of uncertainty in user/machine annotations

Decide on the way to approach and implement the necessary changes to support
specifying a floating number for the certainty level of an annotation. This includes
changing the used attribute (possibly to deg), changing the annotator to process
each specific tag and assign styles based on the numerical value, and adding
back-end support for annotations made using this new approach.

Fix self-closing tags in the annotator

Integrate and optimize the TEI statistics view

Make the stats processing persistent to reduce response times.
Process the data-sets upon upload.
Make documents be processed upon updates.

Create variants for the pixel-oriented visualization

Dependent on #9

Breadcrumb navigation doesn't work on high resolutiions

When the browser window is big enough (I haven't check exactly, but I use 4K screen), the breadcrumb navigation is no longer clicable, and double-click on "file" results in the first word in recipe being selected.

Plan supporting persistent filters in the collaborative

Design and discuss how persistent filters can be integrated in the platform.
Such should take into account that a common use case to be supported is:

A user is asking for an entity search response but had previously filtered out some
documents in another app using the locations occurring in the text.

User should define names, descriptions and colours of uncertainty categories

When project is created a user should have possibility to change names, descriptions and colors for user recognized uncertainties. The default values for names and descriptions are in the following template TEI file with default taxonomy:

<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">	
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>Uncertainty Taxonomy for project @ProjectName</title>
      </titleStmt>
    </fileDesc>
    <encodingDesc>
      <classDecl>
        <taxonomy>
          <category>
            <catDesc>User recognized uncertainty</catDesc>
            <category xml:id="ignorance">
              <catDesc>Ignorance</catDesc>
              <desc>Ignorance is related to the fact that information could have been incorrectly assessed by the person gathering or organizing the data. It is also possible that people, not fully sure about how to deal with data, ignore some information and generate uncertainty during the evaluation and decision processes.</desc>
            </category>
            <category xml:id="credibility">
              <catDesc>Credibility</catDesc>
              <desc>Credibility concerns the weight that an agent can attach to its judgment. This concept can be linked to that of biased opinions, which are related to personal visions of the landscape, which can make for significant variations between different groups and individuals, given their backgrounds.</desc>
            </category>
            <category xml:id="imprecision">
              <catDesc>Imprecision</catDesc>
              <desc>Imprecision corresponds to the inability to express the true value because the absence of experimental values does not allow the definition of a probability distribution or because it is difficult to obtain the exact value of a measure.</desc>
            </category>
            <category xml:id="incompleteness">
              <catDesc>Incompleteness</catDesc>
              <desc>Incompleteness corresponds to the fact that not all situations are covered. Often it is impossible to know every possible option available.</desc>
            </category>
          </category>
          <category>
            <catDesc>Machine generated uncertainty</catDesc>
            <category xml:id="algorithmic">
              <catDesc>Algorithmic</catDesc>
            </category>
          </category>
        </taxonomy>
      </classDecl>
    </encodingDesc>
  </teiHeader>
</TEI>

When user define these values for a project, then colors should be stored in the database, and names and descriptions in the generated TEI file with a taxonomy for this project. Identifiers for categories should be generated form names by applying lowercase method and replacing whitespaces with a dash.

After defining these values, only colors should be changable in Project Settings View.

The genareted TEI file should be exposed in some URL in a scope of the project in order to ana attribute could refer to the nodes of this XML. It's connected with the task #11.

There is a need for API that returns the names and colors of all uncertainty categories, for instance in order to the annotator could properly handle them.

Create distant-reading visualization for date-related content

Implement the presented distant-reading vis for date entities.

Provide annotation statistics/summaries for result sets.

Design and create the entity normalization prototype

Design the entity normalization user interface
Design the entity normalization process
Implement entity normalization
Integrate entity normalization in the collaborative platform

Add the ability to toggle certain categories in the annotator

Add the ability to toggle the visibility for specific certainty levels and categories, and
specific entity types.

Add the xml:id attribute to certainty annotations.

Add back-end code for adding ids to certainty tags Needed for
creating annotations based on previously done ones.

Add navbar to the annotator

Ease backward navigating in the annotator with a nav-bar and possibly breadcrumbs.

Remove name from the list of entities managed in the platform

To be done at all levels (including annotator, indexing, etc).

Support “machine-generated” uncertainty category

Decide o the way to approach and implement the necessary changes needed to
support a new machine-generated category. This would include handling this
new value in the annotator.

A possible approach at this would be relying on the resp attribute to associate
the annotation to a specific algorithm. This would allow to specify further information regarding the algorithm. This approach would probably require the category
attribute to be optional.

	def __create_certainty_description(self, json, annotation_ids, user_uuid):
	target = " ".join(annotation_ids)
	xml_id = f"certainty_{self.__file.name}-{self.__certainty_xml_id_number}"

	categories = " ".join([get_ana_link(self.__file.project_id, cat) for cat in json["categories"]])
	certainty = f'<certainty ana="{categories}" locus="{json["locus"]}" cert="{json["certainty"]}" ' \
	f'resp="#{user_uuid}" target="{target}" xml:id="{xml_id}"/>'

	new_element = etree.fromstring(certainty)

	def __create_certainty_description_for_attribute(self, json, annotation_ids, user_uuid):
	target = " ".join(annotation_ids)
	xml_id = f"certainty_{self.__file.name}-{self.__certainty_xml_id_number}"

	categories = " ".join([get_ana_link(self.__file.project_id, cat) for cat in json["categories"]])
	certainty = f'<certainty ana="{categories}" locus="{json["locus"]}" cert="{json["certainty"]}" ' \
	f'resp="#{user_uuid}" target="{target}" match="@{json["attribute_name"]}"' \
	f'assertedValue="{json["asserted_value"]}" xml:id="{xml_id}"/>'

	new_element = etree.fromstring(certainty)

providedh / collaborative-platform Goto Github PK

collaborative-platform's People

Watchers

Forkers

collaborative-platform's Issues

Recommend Projects

Recommend Topics

Recommend Org