brentozarultd / soddi Goto Github PK
View Code? Open in Web Editor NEWStackOverflow Data Dump Importer. Forked from https://bitbucket.org/bitpusher/soddi/ after the original author passed away.
License: Other
StackOverflow Data Dump Importer. Forked from https://bitbucket.org/bitpusher/soddi/ after the original author passed away.
License: Other
In v1.4 and v1.5. More info later.
In the app, if you click Help/About, everything's outdated. (Sky passed away, his web site is no longer available, Jeremiah took his mirror of the project down, and it's probably not .NET 3.5 SP1.) Let's just point people to the Github repo for now to read the readme there:
Novice users may find it difficult to craft their own ADO.NET connection string. For convenience, let's add Microsoft's DataConnectionDialog.
Please provide a zip only variant of the binaries. :-)
Start SODDI, check that box, import data, close the app, and open it back up again. The "Set Ids as Identity Fields" checkbox doesn't stay checked.
(Using v1.4.)
In SQL Server:
I am using latest version of SODDI (1.5) with latest StackOverflow dumps (122018). Import activity fails everytime for Posts.XML dump file after an approx execution time of 1 hour. Other other dumps, it successfully imports. I tried switching various values of BatchSize but it did not help.
I have verified, remote query timeout is set to 0 on SQL Server side.
When trying to import the latest Stackoverflow data files using SODDI v1.5, to MySQL 5.7, the Users table import fails with the error:
Unknown column AccountId in field list
Recent dump 122018 is of large size. So to make XML data import quick, I added multiple Data Files in New filegroup 'DATA' & marked it as Default. Still, the tool SODDI created all the tables in PRIMARY filegroup.
Could it be modified to create tables in DEFAULT filegroup rather then on PRIMARY.
If your table's row count is smaller than the batch size setting, it'll show 0 for "Count" even when rows are successfully imported.
Repro steps:
Just adding this in Issues so we know we have a known issue.
Right now, every table has an Id field. It's not defined as an identity because in theory, users shouldn't be adding new rows to the Stack Overflow database - that just wasn't a goal for the export.
Unfortunately, with identity fields, you can't load them in parallel. To explicitly insert data into identity fields, you have to turn on identity_insert - but that can only be turned on for one table at a time:
https://docs.microsoft.com/en-us/sql/t-sql/statements/set-identity-insert-transact-sql
So we shouldn't do this by default across the board (because then everybody's loads will go slower).
Instead:
Tried setting the Batch Size to 500,000 (no comma), but still went through 5,000 rows at a time.
(Using v1.4)
When opening the solution file in VS2017 for the first time there is a warning that the setup project SODDI_Setup is not supported. The way forward is unclear. A note needs to be added to provide an easier path for novice programmers that would like to complete How to Think Like the SQL Server Engine.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.