Giter Club home page Giter Club logo

Comments (7)

ryan-loveland avatar ryan-loveland commented on September 3, 2024 1

@fivetran-jamie I was testing the data in the data warehouse from which Fivetran piped in Stripe data and using the built-in query writer in Stripe via Sigma. I am no longer working on this project however, so I cannot provide any additional information beyond what I provided previously.

from dbt_stripe.

ryan-loveland avatar ryan-loveland commented on September 3, 2024

🤔 I have been looking over some of the SQL templates provided by Stripe-Sigma and see that when looking at subscription invoices and invoice line items they are applying the following filter (invoice_line_items.source_type = 'subscription' or invoice_line_items.subscription is not null). I see the the package uses just the invoice_line_item.subscription_id is not null in the stripe__subscription_line_items model...

When looking at my data I am seeing some inconsistencies in whether there is a subscription_id at the invoice or invoice_line_item level as well as the usage of type.

At this point I am not sure what would be the correct value to join, filter on. My current thought is there may be a reason why it is on the invoice (main line) and not on the invoice line item (line item) or vice versa in cases of multiple line items on an invoice having their own relationship with a subscription - granted I am not sure this is even possible through Stripe - I just have seen similar structures within finance systems. Stripe including the type only makes this more confusing though.

from dbt_stripe.

fivetran-joemarkiewicz avatar fivetran-joemarkiewicz commented on September 3, 2024

This is really interesting 🤔

I wouldn't have thought that the invoice_line_items.source_type = 'subscription' could result in null subscription_ids. Let me check with the Stripe connector PM if they have any insight into this and if they can help provide any direction on this join.

Thanks!

from dbt_stripe.

fivetran-joemarkiewicz avatar fivetran-joemarkiewicz commented on September 3, 2024

I just wanted to provide an update here that I have not been able to get a strong understanding around the best path forward with this open issue. The limitation of not having production data is seemingly one of the largest blockers for testing out the behaviors.

If anyone else that comes along to this thread has some insight or commentary to provide, we would extremely welcome it! In the meantime I will keep investigating.

from dbt_stripe.

ryan-loveland avatar ryan-loveland commented on September 3, 2024

I did some additional testing against our production data.

First, I validated data within Stripe alone to check differences between the subscription_id between the two tables in question:

with invoice_subs as (
  
    select
      invoices.id as invoice_id,
      invoices.date as invoice_created_at,
      invoices.status as invoice_status,
      invoices.subscription_id,
      invoices.total / 100.00 as invoice_total,
      invoice_line_items.source_type,
      invoice_line_items.proration as is_proration,
      invoice_line_items.subscription as invoice_line_item_sub_id,
      invoice_line_items.amount / 100.00 as invoice_line_item_amount,
      case when invoices.subscription_id = invoice_line_items.subscription
        then true
      else false end as sub_matches,
      prices.id as price_id,
      prices.recurring_interval,
      products.id as product_id,
      products.active as is_active

    from invoices

    join invoice_line_items 
      on invoices.id = invoice_line_items.invoice_id
    left join prices
      on invoice_line_items.price_id = prices.id
    left join products
      on prices.product_id = products.id
  
  )
  
  select * from invoice_subs where sub_matches = false

I exported the data and reviewed several scenarios and pulled up invoices in the UI to confirm my findings. See below:

  • Reviewing invoices, invoice line items, subscriptions data in Stripe where the subscription_id does not match between the invoices and invoice_line_items. Total count: 4,600
    • The invoice_line_items that do not have a subscription_id and invoices does. Total count: 625
      • All but 1 were related to proration and all were invoiceitem source type.
      • Spot checking some of these and they have multiple line items that appear related to subscription changes so I would have expected a subscription_id on the invoice_line_items instead.
      • Only 16 were related to invoices where the total was greater than 0.
      • There doesn't appear to be any trends related to timing of the creation of the invoices. Seen as far back as August 2016 and as recent as the April 2022.
      • Conclusion: These invoices were either created in a way that broke the invoice_line_items relationship or there is a bug.
    • The invoice_line_items that do have a subscription_id and invoices does not is related to proration where each line item links to the current and previous subscription. Total count: 3888
      • Note: there were 3 that weren't related to proration, but they were One-time price/products - not sure why a subscription_id would be recorded for these.
      • Conclusion: Other than the 3, this was an expected difference. I do not believe it is worth investigating further.
    • Both the invoice_line_items and invoices subscription_id are null. Total count: 87
      • All but 1 are related to One-time price/products. The single exception also showed subscription as the source type. Not sure why a subscription_id wasn't recorded.
      • Conclusion: Other than the 1, this was an expected result. I do not believe it is worth investigating further.
    • Overall conclusion: There appears to be scenarios where only the invoices records a subscription_id which is a cause for concern if only relying on the invoice_line_items table. It is unclear to me if this problem is created by the implementation and creation of the invoices on our end or is caused by a bug. In either case, it may make sense to fallback to the invoices.subscription_id if there is no invoice_line_items.subscription_id.

The next step is checking against the data being synced to our data warehouse which when running a similar query gave far more results which leads me to believe there are changes happening on the invoice and/or the line items that isn't syncing or there is a delay. More to come.

Update: When running the same query against the data warehouse:

  • There was an additional 4,325 records that do not have a invoice_line_items.subscription_id.
    • All of them show as subscription type.
    • Only 28 records were prior to 01/01/22 with the oldest being Sept 2019.
    • The invoice_line_item.id appears to reflect the subscription_id for these records.
    • Conclusion: There appears to be a long delay or bug with syncing, updating the invoice_line_items records to reflect the correct IDs. This has significant impact since this value would be used to filter and join subscription data.
  • Additional testing: I did the same query using invoice_items and that came much closer to the sources differences. 4,577. Though, in both the source and destination it would appear this table has less records than the invoice_line_items table and is missing some needed columns.

from dbt_stripe.

fivetran-jamie avatar fivetran-jamie commented on September 3, 2024

hey @d3ad-pix3l, picking this up again! were there any cases in which

invoices.subscription_id != invoice_line_items.subscription_id 
and invoices.subscription_id is not null 
and invoice_line_items.subscription_id is not null

and could you explain the difference in the two environments you ran that query in? is the data warehouse your Fivetran-piped Stripe data?

from dbt_stripe.

fivetran-reneeli avatar fivetran-reneeli commented on September 3, 2024

Subscription_id is now brought into the invoice staging model. While it is brought into the downstream invoice_details model via the subscription object, it is being joined to the invoice via invoice.subscription_id.

from dbt_stripe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.