Giter Club home page Giter Club logo

adminyoda's Introduction

Yoda administration database

A simple implementation of a "shadow database" to store administrative information and generate usage reports using Django.

The database combines automatically gathered Yoda statistics with manually entered administrative information (owner and budget codes).

Gathering Yoda statistics

The irods_tasks.py at https://github.com/vu-rdm-tech/yoda_report should be run as a cronjob, it will output weekly stats in a json format.

Sample:

{
    "collections": { 
    // statistics of all Yoda collections (research-, vault-, dataset collections in a Vault)
        "research-staff-surfsram": {
            "size": 42428781421,
            "count": 122,
            "newest": "2023-08-03T12:31:47"
        },
        "research-staff-ubvu-geoplaza": {
            "size": 286556835009,
            "count": 8638,
            "newest": "2024-07-30T14:30:14"
        },
        "research-ub-test-environment": {
            "size": 23828739385,
            "count": 4225,
            "newest": "2024-07-31T14:24:32"
        },
        "vault-staff-surfsram": {
            "size": 0,
            "count": 0,
            "newest": "1970-01-01T00:00:00",
            "datasets": {}
        },
        "vault-staff-ubvu-geoplaza": {
            "size": 0,
            "count": 0,
            "newest": "1970-01-01T00:00:00",
            "datasets": {}
        },
        "vault-ub-test-environment": {
            "size": 23843991440,
            "count": 4262,
            "newest": "2024-07-04T13:25:27",
            "datasets": {
                "dataset1[1712235541]": {
                    "size": 1690087,
                    "count": 3,
                    "original_size": 1687300,
                    "original_count": 2,
                    "create_date": "2024-04-04T14:59:01",
                    "status": "UNPUBLISHED",
                    "retention_period": "10",
                    "data_classification": "Sensitive",
                    "data_access_rights": ""
                },
                "dataset2[1712235552]": {
                    "size": 4426033,
                    "count": 4,
                    "original_size": 4402753,
                    "original_count": 2,
                    "create_date": "2024-04-04T14:59:12",
                    "status": "UNPUBLISHED",
                    "retention_period": "10",
                    "data_classification": "Public",
                    "data_access_rights": ""
                },
                "DataSet3[1677085185]": {
                    "size": 25406,
                    "count": 5,
                    "original_size": 1626,
                    "original_count": 1,
                    "create_date": "2023-02-22T17:59:45",
                    "status": "UNPUBLISHED",
                    "retention_period": "10",
                    "data_classification": "Sensitive",
                    "data_access_rights": ""
                },
                "DataSet3[1687852556]": {
                    "size": 2281310,
                    "count": 6,
                    "original_size": 2259278,
                    "original_count": 3,
                    "create_date": "2023-06-27T09:55:56",
                    "status": "UNPUBLISHED",
                    "retention_period": "10",
                    "data_classification": "Sensitive",
                    "data_access_rights": ""
                },
                "DataSet3[1688559726]": {
                    "size": 2281335,
                    "count": 6,
                    "original_size": 2259268,
                    "original_count": 3,
                    "create_date": "2023-07-05T14:22:06",
                    "status": "UNPUBLISHED",
                    "retention_period": "10",
                    "data_classification": "Sensitive",
                    "data_access_rights": ""
                },
                "DataSet3[1716902500]": {
                    "size": 2281589,
                    "count": 6,
                    "original_size": 2261084,
                    "original_count": 4,
                    "create_date": "2024-05-28T15:21:40",
                    "status": "UNPUBLISHED",
                    "retention_period": "10",
                    "data_classification": "Sensitive",
                    "data_access_rights": ""
                },
                "DataSet3[1716903090]": {
                    "size": 2264757,
                    "count": 6,
                    "original_size": 2261084,
                    "original_count": 4,
                    "create_date": "2024-05-28T15:31:30",
                    "status": "UNPUBLISHED",
                    "retention_period": "10",
                    "data_classification": "Sensitive",
                    "data_access_rights": ""
                },
                "research-ub-test-environment[1720085722]": {
                    "size": 23828740923,
                    "count": 4226,
                    "original_size": 23828739184,
                    "original_count": 4225,
                    "create_date": "2024-07-04T11:35:22",
                    "status": "UNPUBLISHED",
                    "retention_period": "10",
                    "data_classification": "Sensitive",
                    "data_access_rights": ""
                }
            }
        }
    },
    "groups": {
        // All Yoda groups with (read-only) members, parent category and classification
        "datamanager-staff": {
            "category": "staff",
            "data_classification": "NA",
            "members": [
                "******@vu.nl",
                "******@gmail.com",
                "******@vu.nl"
            ],
            "read_members": []
        },
        "datamanager-ub-test": {
        "category": "ub-test",
        "data_classification": "NA",
        "members": [
            "******@vu.nl",
            "******@vu.nl"
        ],
        "read_members": []
        },
        "research-staff-surfsram": {
        "category": "staff",
        "data_classification": "basic",
        "members": [
            "******@vu.nl",
            "******@surf.nl",
            "******@tue.nl",
        ],
        "read_members": []
        },
        "research-staff-ubvu-geoplaza": {
            "category": "staff",
            "data_classification": "public",
            "members": [
                "******@vu.nl",
                "******@gmail.com"
            ],
            "read_members": []
        },
        "research-ub-test-environment": {
            "category": "ub-test",
            "data_classification": "basic",
            "members": [
                "******@vu.nl",
                "******@vu.nl",
                "******@vu.nl"
            ],
            "read_members": []
        }
    },
    "revision_collections": {
    // Collections containing revisions /<zone>/yoda/revisions
        "research-staff-surfsram": {
            "size": 0,
            "count": 0,
            "newest": "1970-01-01T00:00:00"
        },
        "research-staff-ubvu-geoplaza": {
            "size": 136927715048,
            "count": 6412,
            "newest": "2024-07-30T14:16:18"
        },
        "research-ub-test-environment": {
            "size": 0,
            "count": 0,
            "newest": "1970-01-01T00:00:00"
        }
    },
    "misc": {
    // Miscellaneous stats: total sizes and user count
        "size_total": 23369202815701,
        "internal_public_users_total": 410,
        "external_public_users_total": 100,
        "public_users_total": 510,
        "revision_size": 2483090983883,
        "trash_size": 1786259874460,
        "internal_users_total": 352,
        "external_users_total": 145,
        "users_total": 497
    },
    "collected": "20240801"
}

In the Django admin a "process irods stats" job should be created that runs projects.tasks.process_irods_stats. https://adminyoda.labs.vu.nl/admin/django_q/schedule/ alt text This looks for data files in folder DATASRC, processes them and moves them to DATASRC/archived when finished. By running it hourly it does not matter when a new datafile is created.

Make sure to set the correct datafolder DATASRC in .env.

Dealing with deleted collections

This process cannot detect when a collection is deleted, the collection will just go missing from the datafile. To mark collections as deleted projects.tasks.cleanup should be scheduled. It checks if the collections were updated in the latest stats, if not they are marked as deleted. Projects with no associated folders are also marked as deleted.

Note that the data is not deleted from the database, we want to keep all historical data.

Adding administrative data

Editing a project

alt text Using the buttoms you can also open the forms to edit or add Persons, Departments and Budget codes.

Note that Research folders, Vault folders and Vault datasets cannot be edited via the admin interface because these tables are filled automatically.

Deleting a project

Since we want to keep historical data you cannot delete a project record. Instead set the Delete date to today. Only do this when the project has no active Research Folders attached!

Adding research folder/group to a project

You cannot do this in the project form instead go to Research folders alt text Use the dropdown list to select the project this Research folder needs to be added to. You can use the + button to open the add new project-form.

Automatically creating Projects for a new Groups

Most of the statistics are Project-based (because a research project could use more than one Yoda Group). For this reason the system expects new Projects to be entered manually, the associated "research folders" can then be added to the project.

Since the manual administration costs time and might be delayed you can also use Django Q to schedule projects.tasks.create_projects. This will automatically create a Project for each orphan Research Folder based on the group/folder name.

Group names are usually formatted: research-<faculty>-<department>-<projectname>. A new project will be created with Name projectname, department and faculty. Owner and Cost Center are set to dummy entries, they can be added in the admin interface later.

  • projects.tasks.create_projects does not automatically add departments and faculties. If they cannot be found the research folder stays unconnected to a project. Add the Department to the database manually and the create_projects-job will create the project when it runs again.

Note that with this process N projects will be created even if all N Yoda groups/folders belong to the same research project. This can easily be rectified by adding all the folders to a single project in the database and setting the extra projects to Deleted.

Customizing the Projects forms and lists

These use the standard Django admin forms and can easily be edited via projects/admin.py, consult the Django Documentation.

adminyoda's People

Contributors

peer35 avatar

Watchers

James Cloos avatar Brett Olivier avatar  avatar

Forkers

bgoli

adminyoda's Issues

Do we need sizes of datasets in reports?

Dataset size is collected from iRODS but currently not stored in the database. Only the total vault size.

It's potentially interesting information, but not necessary for cost accounting.

Keeping the history is not needed (only changes would be license or metadata), so I could just add a size column to VaultDataset

Only set deleted date once

Update filter here, or the system keeps updating the deleted date. Also note #17

def cleanup():
    days = 2
    last_update = MiscStats.objects.order_by('collected').last().collected
    cutoff = make_aware(datetime.combine(last_update, datetime.min.time())) - timedelta(days=days)
    logger.info(f'Mark folders and datasets last updated before {cutoff} as deleted.')
    ResearchFolder.objects.filter(updated__lte=cutoff).update(deleted=now())
    VaultFolder.objects.filter(updated__lte=cutoff).update(deleted=now())
    VaultDataset.objects.filter(updated__lte=cutoff).update(deleted=now())

Email reports

Generate mail monthly:

"Please notify us if this information is no longer correct.":

  • Project owner
  • Cost Center
  • Budget owner
  • Department

simple table like on https://adminyoda.labs.vu.nl/projects/6 :

  • Current (at peildatum) space usage on research
  • Current space usage on datasets in Vault
  • Datasets and status
  • link to the adminyoda page for graphs
  • Estimated cost

Register data classification

imeta ls -u research-staff-ubvu-geoplaza
...
----
attribute: data_classification
value: public
units:
----
...

Vault and delta stats not shown for project 7

https://adminyoda.labs.vu.nl/projects/7

ERROR 2024-01-29 12:53:35,793 django.request log_response | Internal Server Error: /projects/project_delta_chart_json/7
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/core/handlers/base.py", line 197, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/projects/views.py", line 414, in project_delta_chart_json
    vault.append(round(vault_stats[label]['delta'] / div, 2))
                       ~~~~~~~~~~~^^^^^^^
KeyError: '2023-10'

I think this might be because of a botched Vault Submission

Possible Vaults under datamanager- folder

While trying to remove a group with an existing Vault it looks like the vault folder and datasets are moved to the datamanager folder??? Or this happens when updating metadata???
/tempZone/home/datamanager-peter-test/vault-accept172/mapje[1627374525]

irods_tasks errors on orphaned vault

Group was created and deleted within a few days and the vault folder was not deleted (should it have been?)

When the harvest runs only the vault folder is data. But normally a VaultFolder record is always created together with the ResearchFolder record. Only in this case there is no Research Group.

This seems to be a failure state, normally when a group is deleted the Vault folder is also deleted if it's empty.

Let's just skip the vault folder and log the error for now.

Read dataset retention time

Info is in the json metadata file in the vault, but I think it's also in the irods metadata of the vault folder, that's probably easier to access.

Project stats in wrong chronolgical order

{"labels": ["Q1-2022", "Q2-2021", "Q3-2021", "Q4-2021"], "datasets": [{"label": "Research", "backgroundColor": "rgba(253,192,134, 0.4)", "borderColor": "rgba(253,192,134)", "borderWidth": 1, "data": [38.94, 38.94, 38.94, 38.94]}, {"label": "Vault", "backgroundColor": "rgba(127,201,127, 0.4)", "borderColor": "rgba(127,201,127)", "borderWidth": 1, "data": [0.0, 0.0, 0.0, 0.0]}]}

Create new groups report for data managers

A data steward mentioned it would be nice to get a heads-up when a new project is created.

Sending them a full report with all the groups might be overkill, but maybe a monthly report showing which projects were added to their categories in the past month is a good idea.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.