Giter Club home page Giter Club logo

soddi's People

Contributors

bennett-elder avatar bonskijr avatar brentozar avatar jorriss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

soddi's Issues

Change data types on PostHistory

In SQL Server:

  • RevisionGUID - is currently a char(36), should be a uniqueidentifier
  • Comment - is currently an ntext, should be nvarchar(max)
  • Text - is currently an ntext, should be nvarchar(max)

Query TimeOut while Importing Data of Posts.XML after 1 Hour

I am using latest version of SODDI (1.5) with latest StackOverflow dumps (122018). Import activity fails everytime for Posts.XML dump file after an approx execution time of 1 hour. Other other dumps, it successfully imports. I tried switching various values of BatchSize but it did not help.

I have verified, remote query timeout is set to 0 on SQL Server side.

Users table creation fails with an error

When trying to import the latest Stackoverflow data files using SODDI v1.5, to MySQL 5.7, the Users table import fails with the error:
Unknown column AccountId in field list

Table Creation Ignoring DEFAULT FileGroup

Recent dump 122018 is of large size. So to make XML data import quick, I added multiple Data Files in New filegroup 'DATA' & marked it as Default. Still, the tool SODDI created all the tables in PRIMARY filegroup.

Could it be modified to create tables in DEFAULT filegroup rather then on PRIMARY.

"Count" column shows 0 for small tables

If your table's row count is smaller than the batch size setting, it'll show 0 for "Count" even when rows are successfully imported.

Repro steps:

  1. Set the "Batch Size" count higher than the smallest table's rows (the default of 5000 is fine for dba.stackexchange.com's Tags table)
  2. Import a relatively small site, like dba.stackexchange.com
  3. When it finishes, the "Count" column shows 0 for Tags. I'm guessing the Count is only updated each time a batch finishes. (Which is fine, we don't need an exact row count.)

Just adding this in Issues so we know we have a known issue.

Option to make Id fields identities

Right now, every table has an Id field. It's not defined as an identity because in theory, users shouldn't be adding new rows to the Stack Overflow database - that just wasn't a goal for the export.

Unfortunately, with identity fields, you can't load them in parallel. To explicitly insert data into identity fields, you have to turn on identity_insert - but that can only be turned on for one table at a time:

https://docs.microsoft.com/en-us/sql/t-sql/statements/set-identity-insert-transact-sql

So we shouldn't do this by default across the board (because then everybody's loads will go slower).

Instead:

  • Add a checkbox on the main UI to "Set Ids as Identity Fields"
  • During table creations, define the Id field as an IDENTITY(1,1)
  • Load tables one at a time, using SET IDENTITY_INSERT ON/OFF

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.