Giter Club home page Giter Club logo

clover-health-exercise's Introduction

Engineering Exercise โ€“ Clover Health

Instructions

Part of the onsite includes a portion where we want to evaluate your code. Please complete this exercise and get it back to us within 48 hours of your visit.

Problem definition

You have directories containing data files and specification files. The specification files describe the structure of the data files. Write an application in the language of your choice that reads format definitions from specification files. Use these definitions to load the data files into a database.

Problem details

Data files exist in a data/ directory relative to your application and specification files exist in a specs/ directory relative to your application.

Specification files will have filenames equal to the file type they specify and extension of .csv. So fileformat1.csv would be the specification for files of type fileformat1.

Data files will have filenames based on their specification file name, followed by an underscore, followed by the drop date and an extension of .txt. For example, fileformat1_2007-10-15.txt would be a data file to be parsed using specs/fileformat1.csv, which arrived on 10/15/2007.

Format files will be csv formated with columns "column name", "width", and "datatype".

  • "column name" will be the name of that column in the database table
  • "width" is the number of characters taken up by the column in the data file
  • "datatype" is the SQL data type that should be used to store the value in the database table.

Data files will be flat text files with lines matching single records for the database. Lines are formatted as specified by their associated specification file.

Example

This is an example file pair; other files may vary in structure while still fitting the structure of the problem details (above):

specs/testformat1.csv:

"column name",width,datatype

name,10,TEXT

valid,1,BOOLEAN

count,3,INTEGER

Here we have a specification that describes 3 columns:

  • The first 10 characters labeled "name" of type TEXT
  • The next 1 character labeled "valid" of type BOOLEAN ('1' = True, '0' = False)
  • The last 3 characters labeled "count" of type INTEGER

data/testformat1_2015-06-28.txt:

Foonyor 1 1

Barzane 0-12

Quuxitude 1103

Processing this data file results in the following table:

name valid count
1 Foonyor TRUE 1
2 Barzane FALSE -12
3 Quuxitude TRUE 103

Expectations

Expectations

  • Clover primarily uses Python, but your application can be written with language/libraries of your choosing. Take this opportunity to best demonstrate your talents!
  • Database type and connection mechanism is left to your discretion.
  • You are expected to write unit tests for different cases of the core logics. You do not have to use any specific unit test frameworks but should include instructions to run your tests should you use any. You may even write a simple driver program for us to run your unit test cases with mockup data.
  • You should implement the conversions for the SQL data types: TEXT, BOOLEAN, and INTEGER
  • You should be able to handle any specification file that matches the problem description (not just the given example)
  • Files can be assumed to use UTF-8 encoding
  • You should be prepared to discuss implementation decisions and possible extensions to your application.

Solution

The solution is using a Spring Boot powered application to run a REST service with H2 in-memory database for demo purpose. The application supports calling the REST service to import a data file by name. The application then processes the data file and inserts the data into the H2 database. The data files can then subsequently be queried using the REST service by ID.

Database schema design

The database is designed so support multiple data files with different type of columns and many rows.

DB Schema

Build

# build using maven
> mvn clean install

Run

# run, this will start the server running on port 8080
> java -jar target/clover-health-demo.jar

Import a data file

# POST filename to import the datafile. The application knows how to access the file
curl -X POST http://localhost:8080/data-file/testformat1_2015-06-28.txt 

Get all imported files

# Call GET all to retrieve all imported data files
curl  http://localhost:8080/data-files | json_pp

Response

[
   {
      "id" : 1,
      "name" : "testformat1_2015-06-28.txt",
      "importedAt" : "2021-03-21T21:18:54.552+0000",
      "columns" : [
         {
            "id" : 1,
            "width" : 10,
            "type" : "TEXT",
            "rows" : [
               {
                  "valueTxt" : "Foonyor",
                  "id" : 1
               },
               {
                  "id" : 4,
                  "valueTxt" : "Barzane"
               },
               {
                  "id" : 7,
                  "valueTxt" : "Quuxitude"
               }
            ],
            "name" : "name"
         },
         {
            "rows" : [
               {
                  "valueBool" : true,
                  "id" : 2
               },
               {
                  "id" : 5,
                  "valueBool" : false
               },
               {
                  "valueBool" : true,
                  "id" : 8
               }
            ],
            "type" : "BOOLEAN",
            "name" : "valid",
            "id" : 2,
            "width" : 1
         },
         {
            "name" : "count",
            "type" : "INTEGER",
            "rows" : [
               {
                  "id" : 3,
                  "valueInt" : 1
               },
               {
                  "valueInt" : -12,
                  "id" : 6
               },
               {
                  "valueInt" : 103,
                  "id" : 9
               }
            ],
            "width" : 3,
            "id" : 3
         }
      ]
   }
]

Get one file by ID

# Call GET by ID to retrieve one imported data file
curl http://localhost:8080/data-file/1 | json_pp

Response

{
   "id" : 1,
   "columns" : [
      {
         "width" : 10,
         "rows" : [
            {
               "valueTxt" : "Foonyor",
               "id" : 1
            },
            {
               "id" : 4,
               "valueTxt" : "Barzane"
            },
            {
               "id" : 7,
               "valueTxt" : "Quuxitude"
            }
         ],
         "name" : "name",
         "type" : "TEXT",
         "id" : 1
      },
      {
         "type" : "BOOLEAN",
         "id" : 2,
         "width" : 1,
         "rows" : [
            {
               "id" : 2,
               "valueBool" : true
            },
            {
               "id" : 5,
               "valueBool" : false
            },
            {
               "valueBool" : true,
               "id" : 8
            }
         ],
         "name" : "valid"
      },
      {
         "id" : 3,
         "type" : "INTEGER",
         "rows" : [
            {
               "valueInt" : 1,
               "id" : 3
            },
            {
               "id" : 6,
               "valueInt" : -12
            },
            {
               "valueInt" : 103,
               "id" : 9
            }
         ],
         "width" : 3,
         "name" : "count"
      }
   ],
   "name" : "testformat1_2015-06-28.txt",
   "importedAt" : "2021-03-22T17:23:08.415+0000"
}

Delete one file by ID

# Call DELETE by ID to delete one imported data file from DB
curl -X DELETE http://localhost:8080/data-file?id=1

Response

{"Status":"File deleted successfully"}

Delete all files

# Call DELETE to delete all files from DB
curl -X DELETE http://localhost:8080/data-files

Response

SUCCESS

Query the DB by going to H2 Console

Navigate in your browser to http://localhost:8080/h2-console

H2 Console

Press Connect

H2 Console1

clover-health-exercise's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.