vu-rdm-tech / adminyoda Goto Github PK

A simple implementation of a "shadow database" to store administrative information and generate usage reports.

Python 11.18% HTML 1.23% CSS 9.11% JavaScript 78.38% Dockerfile 0.05% Shell 0.04%

adminyoda's Introduction

Yoda administration database

A simple implementation of a "shadow database" to store administrative information and generate usage reports using Django.

The database combines automatically gathered Yoda statistics with manually entered administrative information (owner and budget codes).

Gathering Yoda statistics

The irods_tasks.py at https://github.com/vu-rdm-tech/yoda_report should be run as a cronjob, it will output weekly stats in a json format.

Sample:

{
    "collections": { 
    // statistics of all Yoda collections (research-, vault-, dataset collections in a Vault)
        "research-staff-surfsram": {
            "size": 42428781421,
            "count": 122,
            "newest": "2023-08-03T12:31:47"
        },
        "research-staff-ubvu-geoplaza": {
            "size": 286556835009,
            "count": 8638,
            "newest": "2024-07-30T14:30:14"
        },
        "research-ub-test-environment": {
            "size": 23828739385,
            "count": 4225,
            "newest": "2024-07-31T14:24:32"
        },
        "vault-staff-surfsram": {
            "size": 0,
            "count": 0,
            "newest": "1970-01-01T00:00:00",
            "datasets": {}
        },
        "vault-staff-ubvu-geoplaza": {
            "size": 0,
            "count": 0,
            "newest": "1970-01-01T00:00:00",
            "datasets": {}
        },
        "vault-ub-test-environment": {
            "size": 23843991440,
            "count": 4262,
            "newest": "2024-07-04T13:25:27",
            "datasets": {
                "dataset1[1712235541]": {
                    "size": 1690087,
                    "count": 3,
                    "original_size": 1687300,
                    "original_count": 2,
                    "create_date": "2024-04-04T14:59:01",
                    "status": "UNPUBLISHED",
                    "retention_period": "10",
                    "data_classification": "Sensitive",
                    "data_access_rights": ""
                },
                "dataset2[1712235552]": {
                    "size": 4426033,
                    "count": 4,
                    "original_size": 4402753,
                    "original_count": 2,
                    "create_date": "2024-04-04T14:59:12",
                    "status": "UNPUBLISHED",
                    "retention_period": "10",
                    "data_classification": "Public",
                    "data_access_rights": ""
                },
                "DataSet3[1677085185]": {
                    "size": 25406,
                    "count": 5,
                    "original_size": 1626,
                    "original_count": 1,
                    "create_date": "2023-02-22T17:59:45",
                    "status": "UNPUBLISHED",
                    "retention_period": "10",
                    "data_classification": "Sensitive",
                    "data_access_rights": ""
                },
                "DataSet3[1687852556]": {
                    "size": 2281310,
                    "count": 6,
                    "original_size": 2259278,
                    "original_count": 3,
                    "create_date": "2023-06-27T09:55:56",
                    "status": "UNPUBLISHED",
                    "retention_period": "10",
                    "data_classification": "Sensitive",
                    "data_access_rights": ""
                },
                "DataSet3[1688559726]": {
                    "size": 2281335,
                    "count": 6,
                    "original_size": 2259268,
                    "original_count": 3,
                    "create_date": "2023-07-05T14:22:06",
                    "status": "UNPUBLISHED",
                    "retention_period": "10",
                    "data_classification": "Sensitive",
                    "data_access_rights": ""
                },
                "DataSet3[1716902500]": {
                    "size": 2281589,
                    "count": 6,
                    "original_size": 2261084,
                    "original_count": 4,
                    "create_date": "2024-05-28T15:21:40",
                    "status": "UNPUBLISHED",
                    "retention_period": "10",
                    "data_classification": "Sensitive",
                    "data_access_rights": ""
                },
                "DataSet3[1716903090]": {
                    "size": 2264757,
                    "count": 6,
                    "original_size": 2261084,
                    "original_count": 4,
                    "create_date": "2024-05-28T15:31:30",
                    "status": "UNPUBLISHED",
                    "retention_period": "10",
                    "data_classification": "Sensitive",
                    "data_access_rights": ""
                },
                "research-ub-test-environment[1720085722]": {
                    "size": 23828740923,
                    "count": 4226,
                    "original_size": 23828739184,
                    "original_count": 4225,
                    "create_date": "2024-07-04T11:35:22",
                    "status": "UNPUBLISHED",
                    "retention_period": "10",
                    "data_classification": "Sensitive",
                    "data_access_rights": ""
                }
            }
        }
    },
    "groups": {
        // All Yoda groups with (read-only) members, parent category and classification
        "datamanager-staff": {
            "category": "staff",
            "data_classification": "NA",
            "members": [
                "******@vu.nl",
                "******@gmail.com",
                "******@vu.nl"
            ],
            "read_members": []
        },
        "datamanager-ub-test": {
        "category": "ub-test",
        "data_classification": "NA",
        "members": [
            "******@vu.nl",
            "******@vu.nl"
        ],
        "read_members": []
        },
        "research-staff-surfsram": {
        "category": "staff",
        "data_classification": "basic",
        "members": [
            "******@vu.nl",
            "******@surf.nl",
            "******@tue.nl",
        ],
        "read_members": []
        },
        "research-staff-ubvu-geoplaza": {
            "category": "staff",
            "data_classification": "public",
            "members": [
                "******@vu.nl",
                "******@gmail.com"
            ],
            "read_members": []
        },
        "research-ub-test-environment": {
            "category": "ub-test",
            "data_classification": "basic",
            "members": [
                "******@vu.nl",
                "******@vu.nl",
                "******@vu.nl"
            ],
            "read_members": []
        }
    },
    "revision_collections": {
    // Collections containing revisions /<zone>/yoda/revisions
        "research-staff-surfsram": {
            "size": 0,
            "count": 0,
            "newest": "1970-01-01T00:00:00"
        },
        "research-staff-ubvu-geoplaza": {
            "size": 136927715048,
            "count": 6412,
            "newest": "2024-07-30T14:16:18"
        },
        "research-ub-test-environment": {
            "size": 0,
            "count": 0,
            "newest": "1970-01-01T00:00:00"
        }
    },
    "misc": {
    // Miscellaneous stats: total sizes and user count
        "size_total": 23369202815701,
        "internal_public_users_total": 410,
        "external_public_users_total": 100,
        "public_users_total": 510,
        "revision_size": 2483090983883,
        "trash_size": 1786259874460,
        "internal_users_total": 352,
        "external_users_total": 145,
        "users_total": 497
    },
    "collected": "20240801"
}

In the Django admin a "process irods stats" job should be created that runs projects.tasks.process_irods_stats. https://adminyoda.labs.vu.nl/admin/django_q/schedule/ This looks for data files in folder DATASRC, processes them and moves them to DATASRC/archived when finished. By running it hourly it does not matter when a new datafile is created.

Make sure to set the correct datafolder DATASRC in .env.

Dealing with deleted collections

This process cannot detect when a collection is deleted, the collection will just go missing from the datafile. To mark collections as deleted projects.tasks.cleanup should be scheduled. It checks if the collections were updated in the latest stats, if not they are marked as deleted. Projects with no associated folders are also marked as deleted.

Note that the data is not deleted from the database, we want to keep all historical data.

Adding administrative data

Editing a project

Using the buttoms you can also open the forms to edit or add Persons, Departments and Budget codes.

Note that Research folders, Vault folders and Vault datasets cannot be edited via the admin interface because these tables are filled automatically.

Deleting a project

Since we want to keep historical data you cannot delete a project record. Instead set the Delete date to today. Only do this when the project has no active Research Folders attached!

Adding research folder/group to a project

You cannot do this in the project form instead go to Research folders Use the dropdown list to select the project this Research folder needs to be added to. You can use the + button to open the add new project-form.

Automatically creating Projects for a new Groups

Most of the statistics are Project-based (because a research project could use more than one Yoda Group). For this reason the system expects new Projects to be entered manually, the associated "research folders" can then be added to the project.

Since the manual administration costs time and might be delayed you can also use Django Q to schedule projects.tasks.create_projects. This will automatically create a Project for each orphan Research Folder based on the group/folder name.

Group names are usually formatted: research-<faculty>-<department>-<projectname>. A new project will be created with Name projectname, department and faculty. Owner and Cost Center are set to dummy entries, they can be added in the admin interface later.

projects.tasks.create_projects does not automatically add departments and faculties. If they cannot be found the research folder stays unconnected to a project. Add the Department to the database manually and the create_projects-job will create the project when it runs again.

Note that with this process N projects will be created even if all N Yoda groups/folders belong to the same research project. This can easily be rectified by adding all the folders to a single project in the database and setting the extra projects to Deleted.

Customizing the Projects forms and lists

These use the standard Django admin forms and can easily be edited via projects/admin.py, consult the Django Documentation.

adminyoda's People

Contributors

Watchers

Forkers

bgoli

adminyoda's Issues

Do we need sizes of datasets in reports?

Dataset size is collected from iRODS but currently not stored in the database. Only the total vault size.

It's potentially interesting information, but not necessary for cost accounting.

Keeping the history is not needed (only changes would be license or metadata), so I could just add a size column to VaultDataset

Project stats size charts should be monthly

Display project stats from creation date

Need to work with the creation date.

Vault size and delta chart error in project 7

https://adminyoda.labs.vu.nl/projects/7

Possibly because of a failed submission. Fix and make sure errors are caught in the future.

Don't show deleted projects

In project counts
On the project list

Add datasets to admin panel

To have an overview of recently submitted packages include created date and status

Only set deleted date once

Update filter here, or the system keeps updating the deleted date. Also note #17

def cleanup():
    days = 2
    last_update = MiscStats.objects.order_by('collected').last().collected
    cutoff = make_aware(datetime.combine(last_update, datetime.min.time())) - timedelta(days=days)
    logger.info(f'Mark folders and datasets last updated before {cutoff} as deleted.')
    ResearchFolder.objects.filter(updated__lte=cutoff).update(deleted=now())
    VaultFolder.objects.filter(updated__lte=cutoff).update(deleted=now())
    VaultDataset.objects.filter(updated__lte=cutoff).update(deleted=now())

Email reports

Generate mail monthly:

"Please notify us if this information is no longer correct.":

Project owner
Cost Center
Budget owner
Department

simple table like on https://adminyoda.labs.vu.nl/projects/6 :

Current (at peildatum) space usage on research
Current space usage on datasets in Vault
Datasets and status
link to the adminyoda page for graphs
Estimated cost

Register data classification

imeta ls -u research-staff-ubvu-geoplaza

...
----
attribute: data_classification
value: public
units:
----
...

Mark projects with deleted research groups as deleted

Add to cleanup task.

Vault and delta stats not shown for project 7

https://adminyoda.labs.vu.nl/projects/7

ERROR 2024-01-29 12:53:35,793 django.request log_response | Internal Server Error: /projects/project_delta_chart_json/7
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/core/handlers/base.py", line 197, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/projects/views.py", line 414, in project_delta_chart_json
    vault.append(round(vault_stats[label]['delta'] / div, 2))
                       ~~~~~~~~~~~^^^^^^^
KeyError: '2023-10'

I think this might be because of a botched Vault Submission

Create filters for project list

Make it easier to find a particular project.

This should help:
https://django-tables2.readthedocs.io/en/latest/pages/filtering.html

Sort budget codes

Possible Vaults under datamanager- folder

While trying to remove a group with an existing Vault it looks like the vault folder and datasets are moved to the datamanager folder??? Or this happens when updating metadata???
/tempZone/home/datamanager-peter-test/vault-accept172/mapje[1627374525]

Yearly cost report

Generate Excel document? Dimitri
Email
Always on December 1st

Add groups per faculty diagram

So we have a usable graph even if projects have not been entered yet

Check for Vault folders under Datamanager

irods_tasks errors on orphaned vault

Group was created and deleted within a few days and the vault folder was not deleted (should it have been?)

When the harvest runs only the vault folder is data. But normally a VaultFolder record is always created together with the ResearchFolder record. Only in this case there is no Research Group.

This seems to be a failure state, normally when a group is deleted the Vault folder is also deleted if it's empty.

Let's just skip the vault folder and log the error for now.

Add pagination to project list

https://docs.djangoproject.com/en/4.1/topics/pagination/

Read dataset retention time

Info is in the json metadata file in the vault, but I think it's also in the irods metadata of the vault folder, that's probably easier to access.

Add "Data Classfication" group metadata

Will make it possible to use it in report and to find errors.

Create a list of projects with completely archived data

For cleanup reasons.

Safest selection is:
Groups:
Has 1 dataset
And Vault size == Research Size
And Vault file count (in folder original) == Research file count

Store information on "Data package access" value of Vault datasets

Store in database.
Add a diagram showing the number of open/closed/restricted datasets.
Add number of open/closed/restricted datasets to statistics report.

Add archived and published datasets graph to main page

Revisions should be added to the total project size

Yoda statistics page also does this.

Retrieve from irods, done: 9d80cc6
Add revision_size column in ResearchStats
Show on project page. (add to total or show stacked?)

{"labels": ["Q1-2022", "Q2-2021", "Q3-2021", "Q4-2021"], "datasets": [{"label": "Research", "backgroundColor": "rgba(253,192,134, 0.4)", "borderColor": "rgba(253,192,134)", "borderWidth": 1, "data": [38.94, 38.94, 38.94, 38.94]}, {"label": "Vault", "backgroundColor": "rgba(127,201,127, 0.4)", "borderColor": "rgba(127,201,127)", "borderWidth": 1, "data": [0.0, 0.0, 0.0, 0.0]}]}

Create new groups report for data managers

A data steward mentioned it would be nice to get a heads-up when a new project is created.

Sending them a full report with all the groups might be overkill, but maybe a monthly report showing which projects were added to their categories in the past month is a good idea.

Deleted folders are not set to size 0

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.