The spotlight's discuss from getsentry

Overlay needs to take over entire screen real estate to prevent scrolling content

Scroll bar is active and represents the main page. The overlay has focus and you should not be able to interact w/ the main page without closing it.

Event normalization

As we aim to support most of our SDKs, we'll encounter some slight variations on where specific event properties are attached.
We should add some normalization, a light version of what Relay is doing.

PHP No type in item payload, should use item header
Python No SDK identifier in envelope header, should use item payload

Toolbar: Fix missing paddings and span tree view

Evaluate if necessary padding has been applied to all visual elements (including states like hover etc., see screenshot)

the contents in the overlay have no padding left (9ms)
the hover state of the HTTP span shows that the background is not reaching into the left enough
the tree rendering of spans seem off, this is not so obvious if you have additional information present like "db", but becomes clear once this is missing. Maybe we can hide the additional dash in case there is no such value like "db"

Add indpendent sidecar capability

As a default "Other" installation method for the sidecar, we should simply provide an 'npx @spotlightjs/sidecar' behavior to just run the proxy standalone.

[Epic] Design / UX issues

This is an epic for all the pre-launch design and UX issues that have been noticed.

Issues

Beta Give feedback

Old Readme Notes

Saving here for easier access:

Zero to One

To make this production grade there's a number of changes we'd want to make.

First and foremost is how the relay works. Currently there is a sidecar implemented in both sentry-javascript and sentry-python. Whoever gets the port first wins. The implementations are independent (desirable for compatibility), but are also not equal.

The next issue we see on the relay is that its always-on. Its a little odd behaviorally that a web server is launching when, for example, you are simply running a CLI command. This could be resolved by making the relay lazy, but if the relay is lazy it must buffer data. Per the first problem, the relays do not implement this uniformly at the moment.

Third is how the SDKs communicate with the Relay. Its just assumed its always available on :8969. That obviously wouldn't be true, and the main concern is making sure its only pushed to, and only running when we're in a development environment (how do we detect that?).

On the same front of SDK communication, we'd want to pass the relay information (the port, if its active) downstream via baggage. Its ok for SDKs to attempt to auto discover it, but it'd be a lot more reliable if our trace baggage contained the relay connection information.

For the overlay there's also a number of improvements that we'd like to make:

Cluster similar events together, primarily visually. You might hit 40 "resource not found" errors, and while you may want to look at that, it makes navigating the UI quite difficult.
SDKs could contain more information, such as the process they're running under, the number of events received, and meta configuration (release, environment, etc). They could also guide you on how to send data to Sentry (e.g. set the DSN).
More debug information would be useful locally. Take a look at Django's error page - it contains information around the HTTP request headers, environment configuration, and other settings. We could enable those kinds of debug experiences from the SDK and lock them to only working with Spotlight. Some of this should be available within the trace, and could just need better exposure.
Framework native integration. The overlay already supports the concept of "fullscreen by default", which means we could hook some frameworks error pages and simply load Spotlight in place of their existing (or non-existant) error page.

Technical Notes

There's some interesting choices I made, and I learned along the way.

CSS Scoping

Because this is an embedded overlay we needed to scope styles to only our code, and at the same time we needed to make sure the parent document didn't impact us. To do this we're using a shadow DOM. This allows us to fully render all of our components, our styles (which is just Tailwind!), inside of what might as well be considered an iframe.

Relay Sidecar

The relay has a bunch of challenges to make it work:

It should be invisible to the user, thus it launches from the SDKs themselves. This also means it could attempt to launch multiple processes.
It needs to be seamless, meaning we can't install a sidecar that requires a Python runtime if we're using Node.
It needs to cleanly start up and shutdown - this proved annoying in Python (and doesnt cleanly shut down).
It needs to both buffer events to deal w/ the async nature of when the overlay loads, but also expire events so future loads arent filled with previous requests. This could be made a lot better with some kind of session ID (or if traces were more encapsulated) so you could auto hide prior sessions.

In the end I ended up with a variation of a circular buffer for keeping events around, using a time-based expiration to ensure only recent events were pushed out.

Data Quality

If anything, this project has showcased the importance of data quality. While our error data is generally good, there's a number of areas for improvement, some of which are extermely critical.

JavaScript

We inject browser related instrumentation as faux spans, except those spans are not accurate. e.g. the "browser.request" span will show up after the server has received and hydrated the request. Traces need to look correct, and this not only doesnt look correct, but it isn't.
What is and isnt a trace is extremely confusing once you start rendering full traces. When does a trace end? More importantly than that, when does it start? For example, when I click a link on the page, the trace is still going, but once the browser receives the navigation it creates a new trace.
There are a lot of cases where we fail to achieve a parent (root) transaction, meaning we're left with a bunch of siblings that have to be clustered together. This shouldn't happen, and happens even in Remix where we have end-to-end instrumentation available to us.
Various data is missing. I patched the SDK a bunch to materialize as much of the payload as I could, but for example the status attribute is sometimes not persent on transactions.
Within errors, filenames are illegible. We do not strip prefixes in any situation so we're left with long absolute paths for application-relative code. This problem is even worse with node packages (and subpackages). Example of app-local code:

# what Sentry shows as filename
/home/dcramer/src/peated/apps/api/src/routes/triggerSentry.tsx

# This would be better
src/peated/apps/api/src/routes/triggerSentry.tsx

# This would be best, and accurate
~/src/routes/triggerSentry.tsx

When you stitch together a full trace you see a lot of gaps. One of the biggest is "what service is this". We do not have names of services, process names, or any other kind of identifying information exposed.
Orphan spans exist from setTimeout calls (no parent, but valid trace)

Notes from Development

(this is a loose collection of scribbles I made while building the POC)

Data needs pushed from an SDK into the Debugger. To do this we have a few constraints:

The Widget must be pull-based, as theres no way for it to accept push data natively. We're using Server Sent Events for this.
The SDK needs access to a sidecar in order to setup a push-based stream to power SSE.
The SDK will send all events to this local sidecar, and the widget will connect to it to receive events.

Theres a race condition for when its running and when the UI connects. That could be ok as long as the sidecar maintains a ring buffer backlog (TODO).

The widget is an embeddable React application. It's currently using Tailwind which wouldn't work in prod without using an IFRAME (which would mean the sidecar has to render the iframe in addition to the framework loading the trigger JS). Both the debugger and the trigger could be embedded from the same sidecar, which means the sidecar could be implemented per-language with a simple JS shim that also gets bundled (either via CDN or packaged locally, maybe both?).

TODO

Multi service POC. Both for multiple python services as well as a JS intercept.

For multi sidecar what we could do is:

Have the first process create a sidecar
Pass sidecar connection info downstream via baggage
Sub services funnel payloads back to the root sidecar, which then transfers them to the JS widget

Per-trace clustering. You've effectively got a stream open, but if the stream changes, that is, you SPA to another location (and the core trace ID changes) it should hide all old events. This is probably a way to navigate to a "list of traces captuerd", and it just defaults by showing the current one.
Some kind of pinning. There's a little bit of a confusing flow that would happen if you had multiple windows open, creating multiple requests. Or if you're on a shared env and multiple people are making requests. May be solvable via the trace clustering solution.

Another issue we've hit is the fact that we need various exposure to hooks:

~~Python was somewhat easy to hook in _capture_event~~. We're now hooking the envelope endpoints. Python still easy.
JavaScript is a nightmare, and requiers overrides in a number of spots. Somewhat easy in Node (extend _captureEvent or w/e). Browser is awful. Can't inject via integrations as integrations don't do a damn thing, and the only other way would be beforeSend (which probalby doesnt trigger).
Some events are not fully materialized event in captureEvent. For example, Node seems to be missing some things like timestamps and event_ids in transactions.
JS SDK seems to not pass baggage if DSN is not set.
JS SDK - why does captureCheckIn not call captureEvent? Its duplicating an enabled check. More malformed data.
Sentry's frontend is generating multiple trace IDs for what should be one trace (e.g. on the issues list load)
The parent span (e.g. the root span or root transaction span) is not present in the span tree.
Sampling would need to happen outside of the payload creation, so we can still get debug information locally and only apply sampling decisions to if we send data upstream or not.
Attachments are probably not parsed correclty out of the Envelope. Docs are quite complex to read.
When you navigate to a new page in Remix its creating a transaction coupled to the prior trace (the origin load), and upon navigation creates a new trace. I'm not sure I'd expect this behavior.

Generally speaking we need:

Better hooks, they're all half baked and incomplete
A "event was fully processed" hook should exist in all SDKs.

Sidecar concerns:

There's a race condition for when yuou connect to the sidecar, which means it needs to keep a buffer of events. That said, its possible to connect and get either 1) no events, 2) too old of events. We need to solve this one.
Python implementation is hypothetically superior right now, but its got some issues w/ deadlocking the uwsgi process.

Setup POC

Pull down sentry-python (most up to date sidecar implementation)

git clone [email protected]:getsentry/sentry-python.git ~/src/sentry-python
cd ~/src/sentry-python
git checkout feat/hackweek-2023-spotlight

If you are using the Python SDK simply symlink the SDK into your project:

cd ~/src/myproject
pip install -e ~/src/sentry-python

If you not using the Python SDK in your app, run the sidecar manually:

cd ~/src/sentry-python
python sentry_sdk/spotlight.py &

If you are using the JavaScript SDK in your app, setup the repo similarly within your project:

git clone [email protected]:getsentry/sentry-python.git ~/src/sentry-python
cd ~/src/sentry-python
git checkout feat/hackweek-2023-spotlight

Note: sentry-python has the most functional sidecar implementation currently. Both SDKs automatically attempt to keep a sidecar running, so its a race to whomever claims the port.

Lastly, add Spotlight to your app:

import * as Spotlight from "sentry-spotlight";
Spotlight.init();

Expand toolbar design

Right now its simplistic, which was fine for hackweek, but in practice it could be a lot more useful.

Current:

There's a bit of complexity with things like Astro, but given we hide our overlay when Astro is present I think we can ignore that.

What if take this, allow you to define its snap point (e.g. bottomLeft, bottomRight, bottomCenter), and extend it so integrations can expose hooks.

What I'd like:

[Spotlight Icon] [Error Icon (Count)] [Trace Icon (Span Count)]

Disable Spotlight in production on Docs

We can revisit if we want to try to expose a "demo" spotlight later, but this will likely just confuse people.

Tree rendering issue on children of children

Docs: Add proxy design to Sidecar

Sidecar section needs to detail the proxy design and implementation details.

SDKs should become supplemental information

This is a carry over from hackweek when it was fully Sentry-coupled. Its still valuable information, but maybe we can combine SDKs with some kind of Meta tab that also includes integration information and versions?

e.g.

[Integrations]

Sentry - 1.0
├── @sentry/node - 3s ago - blah blah
└── @sentry/browser - blah blah
Astro - 1.0

Current display:

TODO Brain Dump

We probably want to break this out, but i wanted to get a quick brain dump of stuff i dont want to forget.

Mechanism to render Spotlight with a default error - for example if the backend has a 500 error, we want to be able to just render Spotlight with all its debug information in place of the backends page. That just means loading the modal immediately. Some draft API already in place for this (you have to both load data sync via hydration payload, dedupe it if it comes async, and then trigger it to open/show a specific thing)
Sidecar probably needs to have a way to inform downstream SDKs where it is. We dont really want to hardcode a port, but if we dont then we need a way to tell the other SDKs where it is. Its possible this same mechanism could tell the other SDKs that its enabled at all, meaning it could automatically send to the sidecar or not send at all if its not available. This would solve the "how does this still work in prod.
UI needs to be extensible both from an augmentation (new panels/tabs?) and data payloads POV. This means taht both the UI has to be able to register new widgets as well as the event source stream needs to be able to forward those payloads. I think we should POC a logging adapter as part of this (basically make a simple widget thats an integration that lets you tail/filter logs).
Protocols probably should come first. Defining what the proxy should do, how we think it should work, and doing similar for the event soruce stream. For example, should traces come in as "evenlope" type events? "sentry:envelope" type events? This plays hand in hand with the sidecar as theres no reason it shouldnt be something that could be re-implemented in a framework or any language if it made sense.
The little toolbar/widget thing we will want to be moer thoughtful about as it gets in the way. [Partner] has an idea so lets sync w/ them. Additionally we probably want a clean way to show/hide the toolbar so if it is in the way you can hide it. Maybe a default shortcut like tilde?

I will update this doc as more comes to mind.

Distributed Traces don't show all spans

This trace is a connected BE->FE trace but I only see spans from the BE. It says though that there are 48 spans. Probably something going wrong with us creating the span tree:

[v0] Spotlight 🤝 Astro

This issue is a collection of open tasks and issues that we need to implement for a v0 launch of spotlight for Astro.
Some of these tasks pose fundamental problems while many others are smaller items. This list is by no means 100% complete or specifically ordered but it should serve as an overview.

Tasks for the spotlight core package:

Spotlight Core

Beta Give feedback

Finalize Sentry integration refactor (finalize integration API and routing) #19

4 of 4
ref(integrations): Refactor processEvent and tab content props #42
ref(integrations): Make tabs functional to restore notification counts #41
fix(sentry): Reverse stack traces of incoming Sentry error events #56
(low prio): Proper Spotlight Icon
Figure out good back navigation UX #51
(low prio): Open with keypress #158

Type: Enhancement
(low prio): Open up styling #157

Type: Enhancement
Options

Tasks for spotlight so it works well with Astro:

Spotlight in Astro:

Beta Give feedback

fix(astro): Reset button state correctly #49
Only open spotlight as window, not fullscreen
[Prio High] Show Spotlight in SSR error pages #59
[Prio High] Add Context Lines to client-side island errors #60
Spotlight Code is bundled twice in Astro Dev mode #68

Type: Bug-confirmed
[Astro] Show default button if dev overlay not activated #136
[Astro] Fix toolbar notification dot #137
Options

Everything related to repo infrastructure, publishing, deployment, etc:

Repo, Publishing, Deployment:

Beta Give feedback

Configure Package publishing #18
Extract and publish @spotlightjs/astro package #38
Fix Website deploy
Options

Things that are relevant for spotlight in the Astro SDK:

Astro SDK:

Beta Give feedback

Instrument Astro Server via Middleware sentry-javascript#9444

4 of 4

Package: astro Type: Improvement
Test islands
Options

Side-car related tasks (not strictly Astro-related but definitely something we want to tackle rather soon):

Sidecar

Beta Give feedback

Ensure it works generally with Vite apps
Options

Docs: Consolidate Integrations

Let's consolidate the "How to build an integration" and "List of integrations" into one section of the docs.

Docs: Document linking in development

Figure out good back navigation UX

Currently, going back (e.g. from an error detail to the errors list) is broken. It used to be possible by pressing [ESC], however it had weird interchanges with pressing ESC on top level pages (e.g. errors/traces list). I think we should re-think how navigating back is done and most obvious for users.

Ideas:

Add a back button
Intercept browser back button presses/history changes to navigate back. This might be tricky though with the React Router MemoryRouter
Bring back the [ESC] back functionality (is this obvious for users though?)
Avoid explicit back logic but reset tabs when clicking onto the tabs?

Sentry: Reject unsupported envelopes

We should reject any unknown envelope sent to the relay. There might be some complexity here, but basically we don't want to pollute the in-memory buffer with replays/attachments when they're entirely unused right now.

Website: Add focal point of large splash image of trace details

Website layout:

[Spotlight]
[Small: Slogan]

[Large Splash Image]

[Code block with npm install, init, and run command, with links to other setups]

[Everything Else]

Docs: Document SDK requirements

How do SDKs currently identify if it should be relaying to the proxy? Does it just try no matter what? Do I need specific SDK versions?

Docs: Document conditional load

Folks will not want to enable this in prod, so somehow, someway we should be onboarding them to have this setup correctly using e.g. NODE_ENV

Proxy Design

Some rough thoughts on design of proxy and protocol.

The proxy is a simple HTTP server that acts as a relay. SDKs push events to this server, and Spotlight pulls them using server-sent events. This makes the implementation fairly simply, but there are a few concerns to note down:

The proxy needs to buffer events, so that a Spotlight client may connect after an SDK has sent events, and still receive history. This is likely best done via a circular buffer, but said buffer also should expire events after some reasonable time period (that is, theres no reason for the proxy to return events from 3 hours ago, even if it still has them in memory).
The protocol should be extensible without code (meaning new extensions dont require a new proxy). By default the SDK will transmit event and envelope formats, and the proxy may simply decode and reroute those through the EventSource pipe. Additionally however we want to enable third parties to write and recieve from this stream, which means that EventSource protocol should probably be 1) namespaced (e.g. sentry SDKs payloads use a 'sentry:' prefix for its events), and 2) freeform, meaning payloads can be freely send a known format, and the UI is primarily responsible for hydrating that into what it needs.
The proxy needs to be able to fail gracefully if another is already running. This may not always be true, for example if two different implementations of a sidecar were running on two different ports, that might be ok. As long as SDKs (and anything else) can communicate downstream where the proxy is running it should ensure seamless communication.

Extract and publish @spotlightjs/astro package

We want to extract the Astro integration from the website package and make it a proper Astro integration package.

For the package to work properly with the npx astro add command, we need to keep a few things in mind:

The integration factory function needs to be the default export of the package
keywords need to be set according to the integrations guide
Other fields mentioned in the guide should also be set
AFAICT, we technically don't need to transpile to JS and we should be good to simply publish the TS source files. (Astro will bundle the integrations in its build). Let's see if this actually works.

Note: We can also first simply extract the package and worry about the package.json, keywords, etc later.

Make ContextLines Integration more robust

Sidecar doesn't accept gzipped Envelopes/Requests?

Add OG / Social Share Images on Website

Sentry should be always-enabled

https://spotlightjs.com/reference/configuration/

If this is really a problem we can add a flag to disable it in the future, but Sentry provides the core dataset that makes it valuable, so making this a config step seems silly.

Productionize Spotlight

Quick writeup on how this works to get people up to speed.

Overview

Spotlight leverages the data collection of the Sentry SDKs to create a better local debugging experience. It does this without using the Sentry service at all, and is simply an embedded UI application that presents the existing rich events in a similar form factor.

Technical Design

Spotlight consists of three primary concerns:

Sentry SDKs and envelope/event manifestation - Spotlight consumes raw envelopes/events, and is aware of their format and attributes.
A proxy to relay SDK events to the widget, which is required due to the nature of event generation being async (and distributed).
Lastly, the Spotlight widget, which is the this repository.

SDK Changes

To make Spotlight work, all SDKs need to fully manifest envelopes even when the DSN is not configured. This means they cannot short circuit envelope generation (at least for traces and errors right now), and should delay that logic until the transport would happen.

e.g. captureException() would generate the full event, and sendEvent() would short-circuit

SDKs would then need to be Spotlight aware, meaning they would need to check for (or blindly send to) a Spotlight proxy. This proxy is extremely simple (see later notes), and is used to take the generated SDK events and allow a JavaScript consumer to subscribe and receive them.

In an ideal world, SDKs could also launch this proxy and automatically inject spotlight, meaning you could fully utilize a Sentry SDK as simply a better debug experience, with zero additional configuration.

There are some additional notes about SDK challenges and potential changes scattered throughout the other sections.

Proxy

In order for Spotlight to receive events, we create a unidirectional channel of events. The way this works is SDKs push envelopes to a local proxy over HTTP - using the same existing transport implementations ideally - and Spotlight connects to that proxy using Server-Sent Events. This means that Spotlight can receive events from any SDK that is running locally, but may receive them with a small time-delay*, or out of order.

One of the biggest complexities is how this proxy server gets run. In the hackweek POC it is simply spun up within the SDK, and it fails silently if the port is already in use. To productionize this would we need a few things:

A protocol for the proxy's channel API (mostly straight forward).
A protocol for SDKs discovering if the proxy is online, potentially running the proxy themselves, and explicitly disabling running the proxy in case e.g. a framework bundles one (or is already running elsewhere).
(Recommended) A protocol to pass the proxy's information over trace baggage to children so that they can avoid the discovery process.

Importantly the proxy does two things:

It relays all events through this EventStream channel
It utilizes a ring buffer with item expiration to constrain memory consumption, handle network flakiness (and race conditions)

Note: Spotlight exposes an API to preload events, which means a framework (say Django) could inject the events it caught during server-rendering, which means Spotlight could instantly load and render debug information.

Spotlight Widget

The widget is the most straight forward part. It's simply an embedded React SPA that receives events and renders various components. It does this by showing a small widget in the corner, and streaming in the various events and rendering whatevers needed.

One technical complexity of the widget that is worth understanding is that it uses the shadow DOM APIs. It does this to avoid any conflict with the host application's stylesheets.

Going forward there's some additions we'd want in the existing POC:

The ability to augment Spotlight with additional widgets (at the very least, new tabs/panels). This means e.g. a framework like Astro could augment their own debug information.
Improvements to the SDK layer to know which service is generating an event. That is, ideally services have some kind of name, so Spotlight (and Sentry) could reference which service an error came from, or triggered a portion of a trace.

Design

What do we need?

Generic component/button on the page (currently bottom right)
Component Modal: How should it show up on the page? (Fullscreen or Modal)
- Empty State
- Error Detail (Stacktrace / Source Context / various key:value pairs )
- Trace View (Spans)
Logo
Landing page describing Spotlight
Docs page (simple left-hand nav + content)

Things to keep in mind

Spotlight is a standalone project and doesn't (maybe even shouldn't) have to be in the design of Sentry
Spotlight should become a standalone community-driven project and while Sentry will be it's first integration, it should support an open API that any framework or SDK can send errors/traces to Spotlight in a local development environment
It has to look flashy while at the same time providing a lot of clarity toward important information

Goal

Users should want to have this in their dev setup and should feel in every way superior compared to the framework-provided development error pages.

Inspirations / Examples

https://flareapp.io/docs/ignition/introducing-ignition/overview
https://github.com/barryvdh/laravel-debugbar
https://reactnative.dev/blog/2020/07/06/version-0.63

React Error page

Screenshots of Spotlight (Today)

Component/Button bottom right of the page

Empty Modal

Trace View

Docs: Visual styles have odd margins

Large blank space on the left side. Maybe we want this but it feels odd. I'd expect the nav to be attached to the center column.

Local variables missing in stack trace

Scroll issues with Spotlight overlay

There are two key issues present here:

Width/overflow is not correct at least on the trace view, as there should not be a horizontal scrollbar in this case.
The vertical scroll is actually masking another issue, where its not taking over page focus entirely. Meaning you can scroll the main page when you're trying to scroll this dialog.

Traces not present from node-experimental

Going to attempt to debug. Unclear whats going on but using connect() and they're clearly getting bubbled up.

Maybe serialization issues w/ the implementation:

Trace rendering failing

Trace rendering fails with "Cannot create a faux parent" when incomplete traces.

dcramer/peated#107

We should also consider adding an empty trace dialog, capture if there were errors, and direct people to open a GitHub issue.

Sidecar: OOM happens in undefined scenarios

Not entirely sure what's going on - I noticed it with what appeared to be a recursive serialization error (when trying to serialize from @sentry/remix to connect()).

We may want to 1. reduce max items, 2. add a bytes-based eviction policy to the ring buffer.

Configure Package publishing

We want to configure publishing of the following packages:

@spotlightjs/core
@spotlightjs/sidecar

Later, we'll add more packages, for instance @spotlightjs/astro, but this can be done afterwards

{
  "type": "transaction",
  "content_type": "application/json"
}
{
  "timestamp": 1700655103.176607,
  "platform": "PHP",
  ...

Tasks

Beta Give feedback

Figure out nested inter- and intra-tab routing
Remove all event/trace/span ids from navigation context
Remove integrations from navigation context and remove the context all together
Figure out/bring back [ESC] key functionality
Options

getsentry / spotlight Goto Github PK

spotlight's Issues

Issues

Zero to One

Technical Notes

CSS Scoping

Relay Sidecar

Data Quality

JavaScript

Notes from Development

TODO

Setup POC

Spotlight Core

Spotlight in Astro:

Repo, Publishing, Deployment:

Astro SDK:

Sidecar

Overview

Technical Design

SDK Changes

Proxy

Spotlight Widget

What do we need?

Things to keep in mind

Goal

Inspirations / Examples

Screenshots of Spotlight (Today)

Tasks

Recommend Projects

Recommend Topics

Recommend Org