getsentry / spotlight Goto Github PK
View Code? Open in Web Editor NEWYour Universal Debug Toolbar
Home Page: https://spotlightjs.com
License: Other
Your Universal Debug Toolbar
Home Page: https://spotlightjs.com
License: Other
As we aim to support most of our SDKs, we'll encounter some slight variations on where specific event properties are attached.
We should add some normalization, a light version of what Relay is doing.
type
in item payload, should use item headerEvaluate if necessary padding has been applied to all visual elements (including states like hover etc., see screenshot)
As a default "Other" installation method for the sidecar, we should simply provide an 'npx @spotlightjs/sidecar' behavior to just run the proxy standalone.
This is an epic for all the pre-launch design and UX issues that have been noticed.
Saving here for easier access:
To make this production grade there's a number of changes we'd want to make.
First and foremost is how the relay works. Currently there is a sidecar implemented in both sentry-javascript and sentry-python. Whoever gets the port first wins. The implementations are independent (desirable for compatibility), but are also not equal.
The next issue we see on the relay is that its always-on. Its a little odd behaviorally that a web server is launching when, for example, you are simply running a CLI command. This could be resolved by making the relay lazy, but if the relay is lazy it must buffer data. Per the first problem, the relays do not implement this uniformly at the moment.
Third is how the SDKs communicate with the Relay. Its just assumed its always available on :8969
. That obviously wouldn't be true, and the main concern is making sure its only pushed to, and only running when we're in a development environment (how do we detect that?).
On the same front of SDK communication, we'd want to pass the relay information (the port, if its active) downstream via baggage. Its ok for SDKs to attempt to auto discover it, but it'd be a lot more reliable if our trace baggage contained the relay connection information.
For the overlay there's also a number of improvements that we'd like to make:
Cluster similar events together, primarily visually. You might hit 40 "resource not found" errors, and while you may want to look at that, it makes navigating the UI quite difficult.
SDKs could contain more information, such as the process they're running under, the number of events received, and meta configuration (release, environment, etc). They could also guide you on how to send data to Sentry (e.g. set the DSN).
More debug information would be useful locally. Take a look at Django's error page - it contains information around the HTTP request headers, environment configuration, and other settings. We could enable those kinds of debug experiences from the SDK and lock them to only working with Spotlight. Some of this should be available within the trace, and could just need better exposure.
Framework native integration. The overlay already supports the concept of "fullscreen by default", which means we could hook some frameworks error pages and simply load Spotlight in place of their existing (or non-existant) error page.
There's some interesting choices I made, and I learned along the way.
Because this is an embedded overlay we needed to scope styles to only our code, and at the same time we needed to make sure the parent document didn't impact us. To do this we're using a shadow DOM. This allows us to fully render all of our components, our styles (which is just Tailwind!), inside of what might as well be considered an iframe.
The relay has a bunch of challenges to make it work:
It should be invisible to the user, thus it launches from the SDKs themselves. This also means it could attempt to launch multiple processes.
It needs to be seamless, meaning we can't install a sidecar that requires a Python runtime if we're using Node.
It needs to cleanly start up and shutdown - this proved annoying in Python (and doesnt cleanly shut down).
It needs to both buffer events to deal w/ the async nature of when the overlay loads, but also expire events so future loads arent filled with previous requests. This could be made a lot better with some kind of session ID (or if traces were more encapsulated) so you could auto hide prior sessions.
In the end I ended up with a variation of a circular buffer for keeping events around, using a time-based expiration to ensure only recent events were pushed out.
If anything, this project has showcased the importance of data quality. While our error data is generally good, there's a number of areas for improvement, some of which are extermely critical.
We inject browser related instrumentation as faux spans, except those spans are not accurate. e.g. the "browser.request" span will show up after the server has received and hydrated the request. Traces need to look correct, and this not only doesnt look correct, but it isn't.
What is and isnt a trace is extremely confusing once you start rendering full traces. When does a trace end? More importantly than that, when does it start? For example, when I click a link on the page, the trace is still going, but once the browser receives the navigation it creates a new trace.
There are a lot of cases where we fail to achieve a parent (root) transaction, meaning we're left with a bunch of siblings that have to be clustered together. This shouldn't happen, and happens even in Remix where we have end-to-end instrumentation available to us.
Various data is missing. I patched the SDK a bunch to materialize as much of the payload as I could, but for example the status
attribute is sometimes not persent on transactions.
Within errors, filenames are illegible. We do not strip prefixes in any situation so we're left with long absolute paths for application-relative code. This problem is even worse with node packages (and subpackages). Example of app-local code:
# what Sentry shows as filename
/home/dcramer/src/peated/apps/api/src/routes/triggerSentry.tsx
# This would be better
src/peated/apps/api/src/routes/triggerSentry.tsx
# This would be best, and accurate
~/src/routes/triggerSentry.tsx
When you stitch together a full trace you see a lot of gaps. One of the biggest is "what service is this". We do not have names of services, process names, or any other kind of identifying information exposed.
Orphan spans exist from setTimeout calls (no parent, but valid trace)
(this is a loose collection of scribbles I made while building the POC)
Data needs pushed from an SDK into the Debugger. To do this we have a few constraints:
Theres a race condition for when its running and when the UI connects. That could be ok as long as the sidecar maintains a ring buffer backlog (TODO).
The widget is an embeddable React application. It's currently using Tailwind which wouldn't work in prod without using an IFRAME (which would mean the sidecar has to render the iframe in addition to the framework loading the trigger JS). Both the debugger and the trigger could be embedded from the same sidecar, which means the sidecar could be implemented per-language with a simple JS shim that also gets bundled (either via CDN or packaged locally, maybe both?).
For multi sidecar what we could do is:
Per-trace clustering. You've effectively got a stream open, but if the stream changes, that is, you SPA to another location (and the core trace ID changes) it should hide all old events. This is probably a way to navigate to a "list of traces captuerd", and it just defaults by showing the current one.
Some kind of pinning. There's a little bit of a confusing flow that would happen if you had multiple windows open, creating multiple requests. Or if you're on a shared env and multiple people are making requests. May be solvable via the trace clustering solution.
Another issue we've hit is the fact that we need various exposure to hooks:
Python was somewhat easy to hook in _capture_event. We're now hooking the envelope endpoints. Python still easy.
JavaScript is a nightmare, and requiers overrides in a number of spots. Somewhat easy in Node (extend _captureEvent or w/e). Browser is awful. Can't inject via integrations as integrations don't do a damn thing, and the only other way would be beforeSend (which probalby doesnt trigger).
Some events are not fully materialized event in captureEvent. For example, Node seems to be missing some things like timestamps and event_ids in transactions.
JS SDK seems to not pass baggage if DSN is not set.
JS SDK - why does captureCheckIn not call captureEvent? Its duplicating an enabled check. More malformed data.
Sentry's frontend is generating multiple trace IDs for what should be one trace (e.g. on the issues list load)
The parent span (e.g. the root span or root transaction span) is not present in the span tree.
Sampling would need to happen outside of the payload creation, so we can still get debug information locally and only apply sampling decisions to if we send data upstream or not.
Attachments are probably not parsed correclty out of the Envelope. Docs are quite complex to read.
When you navigate to a new page in Remix its creating a transaction coupled to the prior trace (the origin load), and upon navigation creates a new trace. I'm not sure I'd expect this behavior.
Generally speaking we need:
Sidecar concerns:
There's a race condition for when yuou connect to the sidecar, which means it needs to keep a buffer of events. That said, its possible to connect and get either 1) no events, 2) too old of events. We need to solve this one.
Python implementation is hypothetically superior right now, but its got some issues w/ deadlocking the uwsgi process.
Pull down sentry-python
(most up to date sidecar implementation)
git clone [email protected]:getsentry/sentry-python.git ~/src/sentry-python
cd ~/src/sentry-python
git checkout feat/hackweek-2023-spotlight
If you are using the Python SDK simply symlink the SDK into your project:
cd ~/src/myproject
pip install -e ~/src/sentry-python
If you not using the Python SDK in your app, run the sidecar manually:
cd ~/src/sentry-python
python sentry_sdk/spotlight.py &
If you are using the JavaScript SDK in your app, setup the repo similarly within your project:
git clone [email protected]:getsentry/sentry-python.git ~/src/sentry-python
cd ~/src/sentry-python
git checkout feat/hackweek-2023-spotlight
Note: sentry-python
has the most functional sidecar implementation currently. Both SDKs automatically attempt to keep a sidecar running, so its a race to whomever claims the port.
Lastly, add Spotlight to your app:
import * as Spotlight from "sentry-spotlight";
Spotlight.init();
Right now its simplistic, which was fine for hackweek, but in practice it could be a lot more useful.
Current:
There's a bit of complexity with things like Astro, but given we hide our overlay when Astro is present I think we can ignore that.
What if take this, allow you to define its snap point (e.g. bottomLeft, bottomRight, bottomCenter), and extend it so integrations can expose hooks.
What I'd like:
[Spotlight Icon] [Error Icon (Count)] [Trace Icon (Span Count)]
We can revisit if we want to try to expose a "demo" spotlight later, but this will likely just confuse people.
Sidecar section needs to detail the proxy design and implementation details.
This is a carry over from hackweek when it was fully Sentry-coupled. Its still valuable information, but maybe we can combine SDKs with some kind of Meta tab that also includes integration information and versions?
e.g.
[Integrations]
Sentry - 1.0
├── @sentry/node - 3s ago - blah blah
└── @sentry/browser - blah blah
Astro - 1.0
Current display:
We probably want to break this out, but i wanted to get a quick brain dump of stuff i dont want to forget.
Mechanism to render Spotlight with a default error - for example if the backend has a 500 error, we want to be able to just render Spotlight with all its debug information in place of the backends page. That just means loading the modal immediately. Some draft API already in place for this (you have to both load data sync via hydration payload, dedupe it if it comes async, and then trigger it to open/show a specific thing)
Sidecar probably needs to have a way to inform downstream SDKs where it is. We dont really want to hardcode a port, but if we dont then we need a way to tell the other SDKs where it is. Its possible this same mechanism could tell the other SDKs that its enabled at all, meaning it could automatically send to the sidecar or not send at all if its not available. This would solve the "how does this still work in prod.
UI needs to be extensible both from an augmentation (new panels/tabs?) and data payloads POV. This means taht both the UI has to be able to register new widgets as well as the event source stream needs to be able to forward those payloads. I think we should POC a logging adapter as part of this (basically make a simple widget thats an integration that lets you tail/filter logs).
Protocols probably should come first. Defining what the proxy should do, how we think it should work, and doing similar for the event soruce stream. For example, should traces come in as "evenlope" type events? "sentry:envelope" type events? This plays hand in hand with the sidecar as theres no reason it shouldnt be something that could be re-implemented in a framework or any language if it made sense.
The little toolbar/widget thing we will want to be moer thoughtful about as it gets in the way. [Partner] has an idea so lets sync w/ them. Additionally we probably want a clean way to show/hide the toolbar so if it is in the way you can hide it. Maybe a default shortcut like tilde?
I will update this doc as more comes to mind.
This issue is a collection of open tasks and issues that we need to implement for a v0 launch of spotlight for Astro.
Some of these tasks pose fundamental problems while many others are smaller items. This list is by no means 100% complete or specifically ordered but it should serve as an overview.
Tasks for the spotlight core package:
Tasks for spotlight so it works well with Astro:
Everything related to repo infrastructure, publishing, deployment, etc:
Things that are relevant for spotlight in the Astro SDK:
Side-car related tasks (not strictly Astro-related but definitely something we want to tackle rather soon):
Let's consolidate the "How to build an integration" and "List of integrations" into one section of the docs.
See also #81
Need to easily articulate how folks should develop against Spotlight, which often requires a semi-real world app to test behaviors. aka step us dumb dumbs through using pnpm link or w/e
Currently, going back (e.g. from an error detail to the errors list) is broken. It used to be possible by pressing [ESC], however it had weird interchanges with pressing ESC on top level pages (e.g. errors/traces list). I think we should re-think how navigating back is done and most obvious for users.
Ideas:
MemoryRouter
We should reject any unknown envelope sent to the relay. There might be some complexity here, but basically we don't want to pollute the in-memory buffer with replays/attachments when they're entirely unused right now.
Website layout:
[Spotlight]
[Small: Slogan]
[Large Splash Image]
[Code block with npm install, init, and run command, with links to other setups]
[Everything Else]
How do SDKs currently identify if it should be relaying to the proxy? Does it just try no matter what? Do I need specific SDK versions?
Folks will not want to enable this in prod, so somehow, someway we should be onboarding them to have this setup correctly using e.g. NODE_ENV
Some rough thoughts on design of proxy and protocol.
The proxy is a simple HTTP server that acts as a relay. SDKs push events to this server, and Spotlight pulls them using server-sent events. This makes the implementation fairly simply, but there are a few concerns to note down:
The proxy needs to buffer events, so that a Spotlight client may connect after an SDK has sent events, and still receive history. This is likely best done via a circular buffer, but said buffer also should expire events after some reasonable time period (that is, theres no reason for the proxy to return events from 3 hours ago, even if it still has them in memory).
The protocol should be extensible without code (meaning new extensions dont require a new proxy). By default the SDK will transmit event and envelope formats, and the proxy may simply decode and reroute those through the EventSource pipe. Additionally however we want to enable third parties to write and recieve from this stream, which means that EventSource protocol should probably be 1) namespaced (e.g. sentry SDKs payloads use a 'sentry:' prefix for its events), and 2) freeform, meaning payloads can be freely send a known format, and the UI is primarily responsible for hydrating that into what it needs.
The proxy needs to be able to fail gracefully if another is already running. This may not always be true, for example if two different implementations of a sidecar were running on two different ports, that might be ok. As long as SDKs (and anything else) can communicate downstream where the proxy is running it should ensure seamless communication.
We want to extract the Astro integration from the website package and make it a proper Astro integration package.
For the package to work properly with the npx astro add
command, we need to keep a few things in mind:
keywords
need to be set according to the integrations guideNote: We can also first simply extract the package and worry about the package.json, keywords, etc later.
https://spotlightjs.com/reference/configuration/
If this is really a problem we can add a flag to disable it in the future, but Sentry provides the core dataset that makes it valuable, so making this a config step seems silly.
Quick writeup on how this works to get people up to speed.
Spotlight leverages the data collection of the Sentry SDKs to create a better local debugging experience. It does this without using the Sentry service at all, and is simply an embedded UI application that presents the existing rich events in a similar form factor.
Spotlight consists of three primary concerns:
To make Spotlight work, all SDKs need to fully manifest envelopes even when the DSN is not configured. This means they cannot short circuit envelope generation (at least for traces and errors right now), and should delay that logic until the transport would happen.
e.g. captureException() would generate the full event, and sendEvent() would short-circuit
SDKs would then need to be Spotlight aware, meaning they would need to check for (or blindly send to) a Spotlight proxy. This proxy is extremely simple (see later notes), and is used to take the generated SDK events and allow a JavaScript consumer to subscribe and receive them.
In an ideal world, SDKs could also launch this proxy and automatically inject spotlight, meaning you could fully utilize a Sentry SDK as simply a better debug experience, with zero additional configuration.
There are some additional notes about SDK challenges and potential changes scattered throughout the other sections.
In order for Spotlight to receive events, we create a unidirectional channel of events. The way this works is SDKs push envelopes to a local proxy over HTTP - using the same existing transport implementations ideally - and Spotlight connects to that proxy using Server-Sent Events. This means that Spotlight can receive events from any SDK that is running locally, but may receive them with a small time-delay*, or out of order.
One of the biggest complexities is how this proxy server gets run. In the hackweek POC it is simply spun up within the SDK, and it fails silently if the port is already in use. To productionize this would we need a few things:
Importantly the proxy does two things:
Note: Spotlight exposes an API to preload events, which means a framework (say Django) could inject the events it caught during server-rendering, which means Spotlight could instantly load and render debug information.
The widget is the most straight forward part. It's simply an embedded React SPA that receives events and renders various components. It does this by showing a small widget in the corner, and streaming in the various events and rendering whatevers needed.
One technical complexity of the widget that is worth understanding is that it uses the shadow DOM APIs. It does this to avoid any conflict with the host application's stylesheets.
Going forward there's some additions we'd want in the existing POC:
Users should want to have this in their dev setup and should feel in every way superior compared to the framework-provided development error pages.
https://flareapp.io/docs/ignition/introducing-ignition/overview
https://github.com/barryvdh/laravel-debugbar
https://reactnative.dev/blog/2020/07/06/version-0.63
There are two key issues present here:
Trace rendering fails with "Cannot create a faux parent" when incomplete traces.
We should also consider adding an empty trace dialog, capture if there were errors, and direct people to open a GitHub issue.
Not entirely sure what's going on - I noticed it with what appeared to be a recursive serialization error (when trying to serialize from @sentry/remix
to connect()).
We may want to 1. reduce max items, 2. add a bytes-based eviction policy to the ring buffer.
We want to configure publishing of the following packages:
@spotlightjs/core
@spotlightjs/sidecar
Later, we'll add more packages, for instance @spotlightjs/astro
, but this can be done afterwards
e.g. how do we get this onto npm/etc
Lets add a "Contributions" section or similar to Docs to contain this
Some SDKs do not send a type:
property in the item payload.
However, the type is required in the item header, so let's use this instead.
{
"type": "transaction",
"content_type": "application/json"
}
{
"timestamp": 1700655103.176607,
"platform": "PHP",
...
Replace Miro embed with native image. Excalidraw probably easiest.
Using a tabbed code block.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.