Comments (2)
Indeed, according to the website the data has been moved to Internet Archive a week ago to
https://archive.org/download/stackexchange/stackexchange_archive.torrent.
Tonight, I will download, check the content of the torrnet and report back in this thread.
Thanks for pointing this out, @TFGIT!
from buildingmachinelearningsystemswithpython.
So, I looked into the torrent with ktorrent and it contains stackoverflow-*.7z files of the proper sizes. In our case we would be interested in stackoverflow.com-Posts.7z. To convert it to the better chewable TSV format, you might want to change https://github.com/luispedro/BuildingMachineLearningSystemsWithPython/blob/master/ch05/so_xml_to_tsv.py#L27 accordingly.
from buildingmachinelearningsystemswithpython.
Related Issues (13)
- totsv ch05 : parsing error HOT 1
- set directory to utils.py in chapter 09 HOT 1
- cannot import name 'CHART_DIR' HOT 3
- Execution killed HOT 2
- Movie features, user features HOT 10
- TypeError: unsupported operand type(s) for +: 'int' and 'str' HOT 1
- Logic error in code HOT 4
- utils.load_sanders_data() in ch06 doesn't work
- build LDA model by using AP data fail
- ch09 is not correct HOT 2
- Example with a single Tweet - Chapter6
- Chapter 2: Issue with threshold.py HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from buildingmachinelearningsystemswithpython.