2015 San Diego Data Analysis Contest
What is the demographic profile of the typical ELA student?
- machine learning/regression analysis
- correlations
- machine learning (predictive model)
- We need to answer the previous question first
- We need to answer the previous question first
- Is there a way to track specific students?
- Do language learners of some languages have an easier or more difficult time learning English than native speakers of other languages? (eg. do Spanish speakers learn English quicker than say speakers of Kurdish)?
- We can look at CELDT tests across primary language
- Can we see if families utilize other social services available (tutoring, food assistance, community groups, etc.), and if so, does this type of community involvement have an effect on English learning ability?
- We can look at the number of students getting school lunches and the EL tests
How to marry the files
All files have a cds_code column, with each school being associated with the same code across all files.
Each row needs to be one school with all of the data associated with that school. To create only one row for each school we need to transform data files by taking multiple rows and making them one. In order to do this and still track the necessary components we need to take each of the columns (listed below for each specific document), code the column into the rest of the column names, and then combine the columns. We should probably talk about this over the phone.
use: api_2012
celdt
- subgroup ID 00
- overall performance level 0
- test purpose 0
- grade 00 000000_columnName
star
- subgroup 000
- grade 00
- test_id 00 0000000_columnName
enroll_2013
- ethnic 0
- gender L 0L_columnName