This project uses data from speed dating experiments to predict the compatibility between two people, male and female. Very interesting topic, pretty applicable in the right situations and a good read. Your description of your data set is excellent, really leads me into your project without much effort – great introduction to the project overall. Maybe it would be good to include some definition of compatibility from the experiment itself – perhaps it’s a number from a survey in the experiment or a yes/no response (I think this is mentioned later in the report but would be best in the beginning so we know what your model will be looking for).
The removed data is a bit tricky in this case, because you mentioned before that a lot of the data is open-ended and therefore a bunch of it is missing or incomplete. It seems like a bit of an error to me that an “NA” or blank value should be interpreted as a 0, since a number like that can affect the model unintentionally. Honestly I don’t know how you’d fix this, or if it’s even necessary to fix – wouldn’t uninterested be labeled more often as 1 instead of just blank? This is a tough call.
Would also be nice if the regressions that you mentioned in your project were included in the data set. Maybe you didn’t have time or couldn’t bring them up so I don’t think it’s a big deal for now, just make sure you include them. It’d make this a lot easier to read. I feel like a lot of your analysis at this point is a bit incomplete, or needs a lot of refining before you can say any patterns are present. But I do like the direction you’re headed in with the Next Steps portion of the project – well-crafted ideas based on the information that you’ve already gathered.