Giter Club home page Giter Club logo

oea-assessment's Introduction

OEA Hack Assessment

Problem Statement

Confirming the scenario and problem statement to be addressed:

Green Valley Schools is an education system with 35,000 students in South State. School leaders want to see a report that shows them students’ attendance and learning outcomes (e.g., assessment scores, grades, or marks). School leaders would like to use this data to draw meaningful insights on patterns of student engagement and learning outcomes. Attendance is a key indicator of student engagement. They approached you to build an end-to-end solution for school leaders to use that addresses data ingestion, cleansing, preparation, analysis, and data visualization

Datasets

Test datasets used for extraction, ingestions & visualisation.

Batch 1

Dataset 1 provided by OpenEduAnalytics github:

https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/Student_and_School_Data_Systems/test_data/_batch1/studentattendance.csv
https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/Student_and_School_Data_Systems/test_data/**batch1/studentdemographics.csv**
https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/Student_and_School_Data_Systems/test_data/**batch1/studentsectionmark.csv**

Batch 2

Dataset 2 provided by OpenEduAnalytics github:

https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/Student_and_School_Data_Systems/test_data/batch2/studentattendance.csv
https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/Student_and_School_Data_Systems/test_data/batch2/studentdemographics.csv
https://raw.githubusercontent.com/microsoft/OpenEduAnalytics/main/modules/Student_and_School_Data_Systems/test_data/batch2/studentsectionmark.csv

Batch 3

Empired custom dataset:

https://raw.githubusercontent.com/andreas-empired/oea-assessment/main/batch3/studentattendance.csv
https://raw.githubusercontent.com/andreas-empired/oea-assessment/main/batch3/studentdemographics.csv
https://raw.githubusercontent.com/andreas-empired/oea-assessment/main/batch3/studentsectionmark.csv

Landing and ingesting data

Our response includes a master pipeline which is an end-to-end pipeline, extracting data from an array of HTTP endpoints followed by ingesting the extracted to Stage 2 persistence. The default batch category used is Delta. The pipeline however also provides capability to leverage both the snapshot and incremental data batch categories by passing it as a parameter. For each of the batch categories, we've used existing implementation provided by the available OEA framework library. The master pipeline contains extraction of all three datasets but also the ingestion to Stage 2 followed by the creation of both Lake and SQL databases.

Landing data

In response to the requirement:

1. Land csv data into stage1 data lake in OEA reference architecture [3 points]

Also referenced from the master pipeline, the solution contains a dedicated pipeline to isolate extraction of data from source only. The pipeline consumes datasets from the source system, and sinks the data to Stage 1 zone.

Partitioned by date:

Ingesting Data

In response to the requirement:

2. Process that data into delta format into stage2 using the scripts included in the modules for the test data sets. All personal identifiable information must be pseudonymized [5 points]

Also referenced from the master pipeline, the solution contains a dedicated pipeline to isolate ingestion from from Stage 1 only. A key activity of the pipeline is the dependency of the OEA_connector Notebook, and the initialisation of the OEA framework to support base implementations, such as OEA documented batch category scope:

Delta data is data that contains only changes that need to be incorporated. Processing this incoming data requires existing data to be updated and new data to be inserted (also referred to as upsert).

Identifiable information has been pseudonymized as per extension of BaseOEAModule:

  • hash of student identifiers

Stage 1 data successfully extracted for batch 1, batch 2 and batch 3:

...and ingested to stage 2 np:

...and to stage 2 p:

Lake Database

In response to the requirement:

3. Create a lake db in Synapse studio that creates a joint view of the data [3 points]

Based on batch 1, batch 2 and batch 3 the solution contains a Lake database. The is based on parameter flags passed for any of the ingest pipelines:

Resulting in a Lake database containing all data:

Visualisation

In response to requirement:

4. Provide PowerBI visualizations that give insights to education system leaders for decision making and meaningful insights [5 points]

School Performance Summary Dashboard

Attendance Summary Dashboard

Attendance Summary Dashboard for Individual School

Assessment Score Summary Dashboard

Assesment Score Summary Dashboard for Individual School

Attendance vs Assesment Score Dashboard

Attendance vs Assesment Score Dashboard for Individual School

Semantic Model

In response to requirement:

5. Create a basic semantic model defining the relationships that exist across the data sets [2 points]

The semantic data model and the key relationships within the datasets are referenced as below:

Key relationships in the dataset:

Data Dictionary

In response to requirement:

6. Provide a data dictionary [2 points]

We have used an instance of Azure Purview to scan and create a data dictionary. The collection has been setup to point to the OEA Serverless Endpoint.

The scan performed on the data source resulted in listing of the assets:

The assets overview, details and schema related information is referenced in next section. The data dictionary information was enriched by adding the asset description, data elements definition etc. in Azure Purview.

Student Attendance

Student Demographics

Student Section Mark

oea-assessment's People

Contributors

andreas-empired avatar piyushlalwani456 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.