06chaynes / http-cache Goto Github PK
View Code? Open in Web Editor NEWA caching middleware that follows HTTP caching rules
Home Page: https://http-cache.rs/
License: Apache License 2.0
A caching middleware that follows HTTP caching rules
Home Page: https://http-cache.rs/
License: Apache License 2.0
Hi, would love a changelog on the project, so we know what changes between versions - especially the more major releases (eg 0.3 -> 0.4)
Thanks!
When I use a client with this middleware installed to send any request (even one using an "uncacheable" HTTP method such as PATCH) the middleware attempts to clone the request, which results in an error for uncloneable requests, such as those with a streaming body. For example:
let patch_response_result = client
.patch(upload_file_url.clone())
.header("Content-Type", "application/offset+octet-stream")
.header("Tus-Resumable", "1.0.0")
.header("Upload-Offset", bytes_acknowledged)
.body(Body::wrap_stream(strm))
.send()
.await;
yields: Middleware error: Request object is not cloneable. Are you passing a streaming body?
I believe the issue is here: https://github.com/06chaynes/http-cache/blob/main/http-cache-reqwest/src/lib.rs#L138 (gets called regardless of whether the request is cacheable)
I am wanting to use this library in a reverse proxy.
However, I would like GET requests to be cached without expiring until I receive a POST request for the same origin.
I can control the HTTP headers of each server without any problems.
However, to implement this logic, I needed Clear-Site-Data to be taken into account in the cache system.
Would you know if it is possible to do what I would like and if there is any chance Clear-Site-Data will be implemented?
See docs
Enhancement: Store Deserialized HttpResponse for Improved Performance
To enhance the performance of the HTTP cache library, consider adding the capability to store the deserialized version of the HttpResponse
. This would help developers in avoiding repeated deserialization after retrieving from the cache.
Current Behavior:
HttpResponse
, developers need to deserialize the response every time.Proposed Enhancement:
HttpResponse
.Let me know what you think?
For my current usage of this library it would be great if we could:
Happy to propose a PR and thank you for this amazing crate!
chop chop
First, thanks for http-cache
๐๐ฝ , I'm using it in various places in Stencila and it's been working great!
I'm currently using it as part of our Docker / OCI integration. I am finding that it is caching response despite the Cache-Control: no-cache
header being set on the request (note "x-cache-lookup": "HIT"
below. Is this intended or am I doing something wrong or misinterpreting the standard?
let response = self
.get(["/blobs/", digest].concat())
.header("Cache-Control", "no-cache")
.send()
.await?;
println!("{:#?}", response);
Response {
url: Url {
scheme: "http",
cannot_be_a_base: false,
username: "",
password: None,
host: Some(
Domain(
"localhost",
),
),
port: Some(
5000,
),
path: "/v2/test/blobs/sha256:125a6e411906fe6b0aaa50fc9d600bf6ff9bb11a8651727ce1ed482dc271c24c",
query: None,
fragment: None,
},
status: 200,
headers: {
"docker-content-digest": "sha256:125a6e411906fe6b0aaa50fc9d600bf6ff9bb11a8651727ce1ed482dc271c24c",
"x-content-type-options": "nosniff",
"x-cache": "HIT",
"x-cache-lookup": "HIT",
"content-length": "30421006",
"content-type": "application/octet-stream",
"docker-distribution-api-version": "registry/2.0",
"etag": "\"sha256:125a6e411906fe6b0aaa50fc9d600bf6ff9bb11a8651727ce1ed482dc271c24c\"",
"accept-ranges": "bytes",
"date": "Sun, 29 May 2022 22:08:28 +0000",
"cache-control": "max-age=31536000",
"age": "0",
},
}
I can add a random ?cache-buster=
to the URL to work around it but that feels a it hackish.
In reqwest-middleware-cache
I added url
to store which is supposed to store response URL, but this implementation seems to be storing request URL. The two can differ when the HTTP client is following redirects(it's the whole point of storing URL in cache).
In reqwest
, redirect handling is baked into its core, and response URL can be retrieved from its Response
object. In surf
however, redirect handling is a middleware. I believe the consequence is that cache behavior would change depending on register order of middlewares, though I haven't tested it. It's not ideal, but I'm not sure if it can be fixed...
In my use case it'd be useful to have the information whether a request hit the cache or was sent to the server. Is it possible to get that info?
Thanks for a useful caching implementation!
use [CacheMode::IgnoreRules] How to manually handle cache expiration
http-cache/http-cache/src/managers/cacache.rs
Lines 10 to 13 in 20bbf6e
cacache's methods accept anything that implements AsRef<Path>
, and if you already have a PathBuf
(say from dirs) it is harder to get a String
. (But it is easy to go from a String to a PathBuf via PathBuf::from
.)
For some of the requests we make with reqwest, i'd like to not store the request but some transformed (much smaller) data derived from the response. Is that possible with this library? I'm imagining something like:
if let Some((transformed, http_cache_info)) = cache.get(cache_key) {
// Returns None if the cache is fresh
if let Some((fresh_response, http_cache_info)) = reqwest_maybe_get_cached(request, cache_key).await? {
// Cache is outdated
let transformed = parse(fresh_response.error_for_status()?.text().await?)?;
cache.store((transformed, http_cache_info))
} else {
// The transformed response is fresh
transformed
}
} else {
// No cache hit
let (fresh_response, http_cache_info) = reqwest_get_cached(request, cache_key).await? {
let transformed = parse(fresh_response.error_for_status()?.text().await?)?;
cache.store((transformed, http_cache_info))
}
For a different request type we make an initial range request and then several range requests to extract some data, where i need effectively the same functionality: Make an initial HEAD request and either return the cached response or extract some header and continue with some (uncached) GET range requests.
I tried setting mode
to ForceCache, and playing around with `options -- to no avail. Is this simply not implemented?
I have an http cache that I've built up using Python requests-cache. It's in an SQLite database, as is the default for that project. I'm considering porting the scraper to Rust, but I don't want to lose the gigabytes of cached responses I've already fetched.
Is there a staightforward way to use this SQLite database with http-cache
? It looks like the existing DB uses Python pickle to serialize the data, soโฆ I'm guessing not. But I thought I'd ask just in case anyone has addressed this use case before.
The goal is to add WASM support without changing much of the functionality.
I don't have much to explain.. comments to improve the description or edition of the issue would be appreciated.
About Bounty claim:
I request @06chaynes to setup something like algora to officially put bounty or we can settle it with Paypal or Github Sponsors (one time) or UPI(for Indians).
Edit: this is quite urgent, I am sorry but I would have to put a deadline as 31st Dec 2023, 12am IST(+5:30).
Hi and thanks for your code. I am trying to cache json responses from an REST API. I am trying both the DarkBird and CaCache Cache.Managers The API request is
pub async fn process_file_through_api(file_path: &PathBuf, client: &ClientWithMiddleware) -> Result<()> {
//
let url = "http://localhost:8000/general/v0/general";
println!("Converting PDF file {}...", file_path.display());
// Read the file content
let file_content = fs::read(&file_path).await.unwrap();
let form = reqwest::multipart::Form::new()
.text("strategy", "hi_res")
.text("languages", "eng")
.text("pdf_infer_table_structure", "false")
.part("files", multipart::Part::bytes(file_content).file_name(file_path.file_name().unwrap().to_str().unwrap().to_string()));
let response = client.post(url)
.header("Accept", "application/json")
.header("api-key", "XXXXXXX".to_string())
.multipart(form)
.timeout(Duration::from_secs(3 * 60)) // 3 minutes in seconds
.send()
.await?;
Ok(())
}
The request use reqwest::multipart to construct a form to submitted. The code works fine with reqwest without middleware but fails with both CaCache and Darkbird.
Here is the trace:
Error: Middleware(Middleware error: Request object is not cloneable. Are you passing a streaming body?
Caused by:
Request object is not cloneable. Are you passing a streaming body?
Stack backtrace:
0: std::backtrace_rs::backtrace::dbghelp::trace
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library\std\src\..\..\backtrace\src\backtrace\dbghelp.rs:98
1: std::backtrace_rs::backtrace::trace_unsynchronized
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library\std\src\..\..\backtrace\src\backtrace\mod.rs:66
2: std::backtrace::Backtrace::create
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library\std\src\backtrace.rs:331
3: std::backtrace::Backtrace::capture
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library\std\src\backtrace.rs:297
4: anyhow::kind::Boxed::new
at C:\Users\enric\.cargo\registry\src\index.crates.io-6f17d22bba15001f\anyhow-1.0.77\src\kind.rs:116
5: http_cache_reqwest::from_box_error
at C:\Users\enric\.cargo\registry\src\index.crates.io-6f17d22bba15001f\http-cache-reqwest-0.12.0\src\lib.rs:193
6: core::ops::function::FnOnce::call_once<enum2$<reqwest_middleware::error::Error> (*)(alloc::boxed::Box<dyn$<core::error::Error,core::marker::Send,core::marker::Sync>,alloc::alloc::Global>),tuple$<alloc::boxed::Box<dyn$<core::error::Error,core::marker::Send
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962\library\core\src\ops\function.rs:250
7: enum2$<core::result::Result<tuple$<>,alloc::boxed::Box<dyn$<core::error::Error,core::marker::Send,core::marker::Sync>,alloc::alloc::Global> > >::map_err<tuple$<>,alloc::boxed::Box<dyn$<core::error::Error,core::marker::Send,core::marker::Sync>,alloc::alloc
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962\library\core\src\result.rs:829
8: http_cache_reqwest::impl$2::handle::async_block$0<http_cache::managers::cacache::CACacheManager>
at C:\Users\enric\.cargo\registry\src\index.crates.io-6f17d22bba15001f\http-cache-reqwest-0.12.0\src\lib.rs:214
9: core::future::future::impl$1::poll<alloc::boxed::Box<dyn$<core::future::future::Future<assoc$<Output,enum2$<core::result::Result<reqwest::async_impl::response::Response,enum2$<reqwest_middleware::error::Error> > > > >,core::marker::Send>,alloc::alloc::Glo
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962\library\core\src\future\future.rs:125
10: core::future::future::impl$1::poll<alloc::boxed::Box<dyn$<core::future::future::Future<assoc$<Output,enum2$<core::result::Result<reqwest::async_impl::response::Response,enum2$<reqwest_middleware::error::Error> > > > >,core::marker::Send>,alloc::alloc::Glo
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962\library\core\src\future\future.rs:125
11: reqwest_middleware::client::impl$1::execute_with_extensions::async_fn$0
at C:\Users\enric\.cargo\registry\src\index.crates.io-6f17d22bba15001f\reqwest-middleware-0.2.4\src\client.rs:160
12: reqwest_middleware::client::impl$4::send::async_fn$0
at C:\Users\enric\.cargo\registry\src\index.crates.io-6f17d22bba15001f\reqwest-middleware-0.2.4\src\client.rs:314
13: testbirdhttp::process_file_through_api::async_fn$0
at .\src\main.rs:94
14: testbirdhttp::scan_folder::async_fn$0
at .\src\main.rs:129
15: testbirdhttp::main::async_block$0
at .\src\main.rs:170
16: tokio::runtime::park::impl$4::block_on::closure$0<enum2$<testbirdhttp::main::async_block_env$0> >
at C:\Users\enric\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.35.1\src\runtime\park.rs:282
17: tokio::runtime::coop::with_budget
at C:\Users\enric\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.35.1\src\runtime\coop.rs:107
18: tokio::runtime::coop::budget
at C:\Users\enric\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.35.1\src\runtime\coop.rs:73
19: tokio::runtime::park::CachedParkThread::block_on<enum2$<testbirdhttp::main::async_block_env$0> >
at C:\Users\enric\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.35.1\src\runtime\park.rs:282
20: tokio::runtime::context::blocking::BlockingRegionGuard::block_on<enum2$<testbirdhttp::main::async_block_env$0> >
at C:\Users\enric\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.35.1\src\runtime\context\blocking.rs:66
21: tokio::runtime::scheduler::multi_thread::impl$0::block_on::closure$0<enum2$<testbirdhttp::main::async_block_env$0> >
at C:\Users\enric\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.35.1\src\runtime\scheduler\multi_thread\mod.rs:87
22: tokio::runtime::context::runtime::enter_runtime<tokio::runtime::scheduler::multi_thread::impl$0::block_on::closure_env$0<enum2$<testbirdhttp::main::async_block_env$0> >,enum2$<core::result::Result<tuple$<>,enum2$<reqwest_middleware::error::Error> > > >
at C:\Users\enric\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.35.1\src\runtime\context\runtime.rs:65
23: tokio::runtime::scheduler::multi_thread::MultiThread::block_on<enum2$<testbirdhttp::main::async_block_env$0> >
at C:\Users\enric\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.35.1\src\runtime\scheduler\multi_thread\mod.rs:86
24: tokio::runtime::runtime::Runtime::block_on<enum2$<testbirdhttp::main::async_block_env$0> >
at C:\Users\enric\.cargo\registry\src\index.crates.io-6f17d22bba15001f\tokio-1.35.1\src\runtime\runtime.rs:350
25: testbirdhttp::main
at .\src\main.rs:178
26: core::ops::function::FnOnce::call_once<enum2$<core::result::Result<tuple$<>,enum2$<reqwest_middleware::error::Error> > > (*)(),tuple$<> >
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962\library\core\src\ops\function.rs:250
27: std::sys_common::backtrace::__rust_begin_short_backtrace<enum2$<core::result::Result<tuple$<>,enum2$<reqwest_middleware::error::Error> > > (*)(),enum2$<core::result::Result<tuple$<>,enum2$<reqwest_middleware::error::Error> > > >
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962\library\std\src\sys_common\backtrace.rs:154
28: std::rt::lang_start::closure$0<enum2$<core::result::Result<tuple$<>,enum2$<reqwest_middleware::error::Error> > > >
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962\library\std\src\rt.rs:166
29: std::rt::lang_start_internal::closure$2
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library\std\src\rt.rs:148
30: std::panicking::try::do_call
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library\std\src\panicking.rs:504
31: std::panicking::try
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library\std\src\panicking.rs:468
32: std::panic::catch_unwind
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library\std\src\panic.rs:142
33: std::rt::lang_start_internal
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library\std\src\rt.rs:148
34: std::rt::lang_start<enum2$<core::result::Result<tuple$<>,enum2$<reqwest_middleware::error::Error> > > >
at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962\library\std\src\rt.rs:165
35: main
36: invoke_main
at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
37: __scrt_common_main_seh
at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
38: BaseThreadInitThunk
39: RtlUserThreadStart)
error: process didn't exit successfully: `target\debug\testbirdhttp.exe` (exit code: 1)
Any suggestion ?
Thanks and Happy New Year
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.