Part of the onsite includes a portion where we want to evaluate your code. Please complete this exercise and get it back to us within 48 hours of your visit.
You have directories containing data files and specification files. The specification files describe the structure of the data files. Write an application in the language of your choice that reads format definitions from specification files. Use these definitions to load the data files into a database.
Data files exist in a data/ directory relative to your application and specification files exist in a specs/ directory relative to your application.
Specification files will have filenames equal to the file type they specify and extension of .csv. So fileformat1.csv would be the specification for files of type fileformat1.
Data files will have filenames based on their specification file name, followed by an underscore, followed by the drop date and an extension of .txt. For example, fileformat1_2007-10-15.txt would be a data file to be parsed using specs/fileformat1.csv, which arrived on 10/15/2007.
Format files will be csv formated with columns "column name", "width", and "datatype".
- "column name" will be the name of that column in the database table
- "width" is the number of characters taken up by the column in the data file
- "datatype" is the SQL data type that should be used to store the value in the database table.
Data files will be flat text files with lines matching single records for the database. Lines are formatted as specified by their associated specification file.
This is an example file pair; other files may vary in structure while still fitting the structure of the problem details (above):
specs/testformat1.csv:
"column name",width,datatype
name,10,TEXT
valid,1,BOOLEAN
count,3,INTEGER
Here we have a specification that describes 3 columns:
- The first 10 characters labeled "name" of type TEXT
- The next 1 character labeled "valid" of type BOOLEAN ('1' = True, '0' = False)
- The last 3 characters labeled "count" of type INTEGER
data/testformat1_2015-06-28.txt:
Foonyor 1 1
Barzane 0-12
Quuxitude 1103
Processing this data file results in the following table:
name | valid | count | |
---|---|---|---|
1 | Foonyor | TRUE | 1 |
2 | Barzane | FALSE | -12 |
3 | Quuxitude | TRUE | 103 |
Expectations
- Clover primarily uses Python, but your application can be written with language/libraries of your choosing. Take this opportunity to best demonstrate your talents!
- Database type and connection mechanism is left to your discretion.
- You are expected to write unit tests for different cases of the core logics. You do not have to use any specific unit test frameworks but should include instructions to run your tests should you use any. You may even write a simple driver program for us to run your unit test cases with mockup data.
- You should implement the conversions for the SQL data types: TEXT, BOOLEAN, and INTEGER
- You should be able to handle any specification file that matches the problem description (not just the given example)
- Files can be assumed to use UTF-8 encoding
- You should be prepared to discuss implementation decisions and possible extensions to your application.
The solution is using a Spring Boot powered application to run a REST service with H2 in-memory database for demo purpose. The application supports calling the REST service to import a data file by name. The application then processes the data file and inserts the data into the H2 database. The data files can then subsequently be queried using the REST service by ID.
The database is designed so support multiple data files with different type of columns and many rows.
# build using maven
> mvn clean install
# run, this will start the server running on port 8080
> java -jar target/clover-health-demo.jar
# POST filename to import the datafile. The application knows how to access the file
curl -X POST http://localhost:8080/data-file/testformat1_2015-06-28.txt
# Call GET all to retrieve all imported data files
curl http://localhost:8080/data-files | json_pp
Response
[
{
"id" : 1,
"name" : "testformat1_2015-06-28.txt",
"importedAt" : "2021-03-21T21:18:54.552+0000",
"columns" : [
{
"id" : 1,
"width" : 10,
"type" : "TEXT",
"rows" : [
{
"valueTxt" : "Foonyor",
"id" : 1
},
{
"id" : 4,
"valueTxt" : "Barzane"
},
{
"id" : 7,
"valueTxt" : "Quuxitude"
}
],
"name" : "name"
},
{
"rows" : [
{
"valueBool" : true,
"id" : 2
},
{
"id" : 5,
"valueBool" : false
},
{
"valueBool" : true,
"id" : 8
}
],
"type" : "BOOLEAN",
"name" : "valid",
"id" : 2,
"width" : 1
},
{
"name" : "count",
"type" : "INTEGER",
"rows" : [
{
"id" : 3,
"valueInt" : 1
},
{
"valueInt" : -12,
"id" : 6
},
{
"valueInt" : 103,
"id" : 9
}
],
"width" : 3,
"id" : 3
}
]
}
]
# Call GET by ID to retrieve one imported data file
curl http://localhost:8080/data-file/1 | json_pp
Response
{
"id" : 1,
"columns" : [
{
"width" : 10,
"rows" : [
{
"valueTxt" : "Foonyor",
"id" : 1
},
{
"id" : 4,
"valueTxt" : "Barzane"
},
{
"id" : 7,
"valueTxt" : "Quuxitude"
}
],
"name" : "name",
"type" : "TEXT",
"id" : 1
},
{
"type" : "BOOLEAN",
"id" : 2,
"width" : 1,
"rows" : [
{
"id" : 2,
"valueBool" : true
},
{
"id" : 5,
"valueBool" : false
},
{
"valueBool" : true,
"id" : 8
}
],
"name" : "valid"
},
{
"id" : 3,
"type" : "INTEGER",
"rows" : [
{
"valueInt" : 1,
"id" : 3
},
{
"id" : 6,
"valueInt" : -12
},
{
"valueInt" : 103,
"id" : 9
}
],
"width" : 3,
"name" : "count"
}
],
"name" : "testformat1_2015-06-28.txt",
"importedAt" : "2021-03-22T17:23:08.415+0000"
}
# Call DELETE by ID to delete one imported data file from DB
curl -X DELETE http://localhost:8080/data-file?id=1
Response
{"Status":"File deleted successfully"}
# Call DELETE to delete all files from DB
curl -X DELETE http://localhost:8080/data-files
Response
SUCCESS
Navigate in your browser to http://localhost:8080/h2-console
Press Connect