Comments (13)
For storing/mmap-ing on demand, looks like we want to improve this on WAL side TSDB anyway (e.g https://matrix.to/#/!WaUKIfoqfiyWQhenET:matrix.org/$1568229168815166NkByc:matrix.org?via=matrix.org). We should start some design discussion on Prometheus at some point. However, this will not solve Querying
as you mentioned.
For querying those two facts that you mentioned are crucial:
One instance generates 10 series. This is - at least in my experience - a lot less than the usual metric series count per instance.
What's the instance here? Anyway - yea the cardinality will be smaller for sure.
One datapoint is comparatively much bigger, than with metrics. A cpu/mem profile is tens of kilobytes, a trace is tens of megabytes - as opposed to a float being 64 bytes.
Let’s say we want to query long timespans. At query time, we are only interested in metadata. So the datapoint sizes will negatively impact the hot-path performance, even though we don’t need them here (as opposed to metric-scraping prometheus, where we are aggregating data over time while querying).
I think those summarize very well the problem we are solving here. No aggregations needed, huge sample data and low cardinality.
To me, it means that:
- Lowering min block size would help a lot. We have less index due to smaller cardinality so it might actually bring some benefits.
- Fetching on deman for WAL would help, but again at then while constructing block we would probably need to keep all or most of it in memory.
I think the idea you propose, if we would only just store references to the profile in other storage
(e.g obj store) instead of profile itself might make sense. I think we might want to look how Loki does as it's pretty similar topic (metadata + payload)
from parca.
(I'm about to go on vacation, so I apologize in advance for delayed responses after this one)
First of all, I think it's super awesome that you are getting involved and want to improve things, I very highly appreciate the effort you are putting in and I want to work with you to improve this situation! :)
The way I've been thinking about it is: Prometheus TSDB has some of the exact the same problems, just on a much smaller scale in regards to the sample value size. Prometheus TSDB WAL already writes segments of 128Mb on disk, but keeps the data in-memory. My idea around this was if we could mmap the segments instead of just using them to re-create the in-memory representation of the WAL then we would only have things in memory that we are actually querying + at most 128Mb. To be clear, I realize that "just mmap-ing" is unlikely to be enough, it might need us to rethink the segment format a bit to allow for a strategy like this as well as adapt the read path to potentially be aware of this, kind of as a way of layered caches.
@bwplotka @krasi-georgiev you know the TSDB and WAL code better than I do, do you think this could be feasible?
from parca.
Just saying, I'm also concerned about the read path. Because there's a lot of "bytes" put into the data structure we use for querying, even though it doesn't need to be there. (later we do a single get for the single datapoint of interest)
(reiterating this as I'd like to hear their thoughts on this too, because maybe I'm trying to fix a problem that just isn't there)
It would also hugely impact query performance if using some kind of remote storage if I understand those mechanisms correctly. Downloading multiple TB from object storage and scanning them, even though we only really care about metadata at this stage, is fairly suboptimal.
from parca.
could you save me a bit of time digging through the conprof code and give an example of the data format that you are saving in tsdb at the moment?
Also what are the most common queries to get that data.
from parca.
I think that's the code conprof/tsdb@70f0d4a
We're saving go pprof/trace files, which are from the tsdb point of view, slices of bytes containing from a few tens of Kilobytes, up to a few tens of Megabytes.
I'm not sure what you mean with most common queries:
usually you want to scan some small timespan for datapoints with the given labels, but are not interested in the actual data stored there. Then you select the datapoint to open the pprof. That's when you get an iterator to the wanted series and seek to the single timestamp you're interested in and read the bytes there.
storageFetcher := func(_ string, _, _ time.Duration) (*profile.Profile, string, error) {
var prof *profile.Profile
q, err := p.db.Querier(0, math.MaxInt64)
if err != nil {
level.Error(p.logger).Log("err", err)
}
ss, err := q.Select(m...)
if err != nil {
level.Error(p.logger).Log("err", err)
}
ss.Next()
s := ss.At()
i := s.Iterator()
t, err := stringToInt(timestamp)
if err != nil {
return nil, "", err
}
i.Seek(t)
_, buf := i.At()
prof, err = profile.Parse(bytes.NewReader(buf))
return prof, "", err
}
Currently there are no aggregations.
from parca.
What's the instance here? Anyway - yea the cardinality will be smaller for sure.
With one instance I mean one scraping target. One scraping target has ~10 different kinds of profiles to get, which create one series each.
from parca.
yeah I think the simplest solution would be to just keep a reference to the file path in tsdb and when needed just open the file from disk rather than keep the raw bytes in memory.
from parca.
I think that's actually pretty reasonable, essentially using tsdb just as an index. In order for that to work though we absolutely need isolation/MVCC in tsdb otherwise we're gonna have a bad time. I know it's been on our plate for a long time, maybe it's time to finally finish it.
from parca.
Why would isolation/MVCC be required? the access to the files would be read only so more than a single process can read from the same file.
from parca.
No when inserting we need isolation between it showing up in the index and appending to the series.
from parca.
Could you please expand on the problem? As I see it you'd first write the file to some storage layer (file system, object storage, etc.) and after it's closed (and immutable) write its file path to TSDB, the same way you write a byte slice now.
from parca.
Yes you're right, doing it in that order should solve that problem.
from parca.
We have a major re-work of the storage as part of the Conprof -> Parca re-brand https://www.parca.dev/docs/storage
from parca.
Related Issues (20)
- Add percentage to table HOT 1
- clean up development container images from container registry
- Parca v0.19.0 panics
- Add axis labels to the metrics graph
- Validate profiles to require period type and sample type
- Blinking Meta Info panel when docked and hovered over the overlapping nodes.
- Parca's own memory usage is close to node's capacity HOT 4
- parca.dev/debuginfod-client/0.18.0 nodes put a lot of unnecessary load on public debuginfod servers HOT 8
- String interning HOT 3
- normalize profile: normalize locations: Transaction Conflict. Please retry HOT 2
- Bucket configuration options HOT 3
- Y-axis labels sometimes appear strange sometimes
- Someting went wrong when running parca server HOT 4
- Closing quote not being autocompleted HOT 2
- Make relative time picker linear HOT 9
- System hangs while browsing flame graph
- cross compiling for ARM support HOT 1
- add qps and bytes limiter for presigned client
- Special characters in search bar get lost when changing UI settings HOT 1
- Java appliacation most functions cannot be parsed
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parca.