Repository of data and code to use the models described in the paper "Citation Needed: A Taxonomy and Algorithmic Assessment of Wikipedia's Verifiability"
Hi, thanks for this! I like the simplicity of your approach: you cleaned up the original script just enough, and the HTML parsing doesn't make too many assumptions about the structure of the document.
Here's a couple of questions/comments I would have pointed out in a code review.
I'm curious about your splitter function: why split on . and then .[*] ?
Although your HTML parsing is simple enough, I wonder if, in your testing, you've found corner cases where it doesn't work -- e.g., <p> tags that you're grabbing but didn't mean to?
Not a big deal, but making your sortPred a lambda would have been more idiomatic I think.