Comments (14)
Can confirm persistence is not broken on 0.5.1. This is a regression into release channel
from skytable.
This needs bisection
from skytable.
Bisection (to be updated ..)
from skytable.
This is the offending commit: ce466eb. Bisection complete
from skytable.
Is this related to the cloned file descriptor?
from skytable.
Updates:
flock::FileLock::reacquire
doesn't cause problems
from skytable.
This patch fixes it: bisect-bad-persistence-0.5.2.patch
from skytable.
Yes, the cloned descriptor is the source of errors although it isn't immediately clear why
from skytable.
This is the best patch we can have with a reacquire:
diff --git a/server/src/coredb/mod.rs b/server/src/coredb/mod.rs
index 76f1fe4..2887bbd 100644
--- a/server/src/coredb/mod.rs
+++ b/server/src/coredb/mod.rs
@@ -286,7 +286,7 @@ impl CoreDB {
bgsave: BGSave,
snapshot_cfg: SnapshotConfig,
restore_file: Option<PathBuf>,
- ) -> TResult<(Self, Option<flock::FileLock>, flock::FileLock)> {
+ ) -> TResult<(Self, Option<flock::FileLock>)> {
let coretable = diskstore::get_saved(restore_file)?;
let mut background_tasks: usize = 0;
if !bgsave.is_disabled() {
@@ -324,13 +324,12 @@ impl CoreDB {
));
let lock = flock::FileLock::lock("data.bin")
.map_err(|e| format!("Failed to acquire lock on data file with error '{}'", e))?;
- let cloned_descriptor = lock.try_clone()?;
if bgsave.is_disabled() {
- Ok((db, Some(lock), cloned_descriptor))
+ Ok((db, Some(lock)))
} else {
// Spawn the BGSAVE service in a separate task
tokio::spawn(diskstore::bgsave_scheduler(db.clone(), bgsave, lock));
- Ok((db, None, cloned_descriptor))
+ Ok((db, None))
}
}
/// Create an empty in-memory table
diff --git a/server/src/dbnet/mod.rs b/server/src/dbnet/mod.rs
index 598142f..38516b4 100644
--- a/server/src/dbnet/mod.rs
+++ b/server/src/dbnet/mod.rs
@@ -44,8 +44,7 @@ use crate::config::PortConfig;
use crate::config::SnapshotConfig;
use crate::config::SslOpts;
use crate::dbnet::tcp::Listener;
-use crate::diskstore::{self, snapshot::DIR_REMOTE_SNAPSHOT};
-use diskstore::flock;
+use crate::diskstore::snapshot::DIR_REMOTE_SNAPSHOT;
mod tcp;
use crate::CoreDB;
use libsky::TResult;
@@ -316,17 +315,16 @@ pub async fn run(
snapshot_cfg: SnapshotConfig,
sig: impl Future,
restore_filepath: Option<PathBuf>,
-) -> (CoreDB, flock::FileLock) {
+) -> CoreDB {
let (signal, _) = broadcast::channel(1);
let (terminate_tx, terminate_rx) = mpsc::channel(1);
- let (db, lock, cloned_descriptor) =
- match CoreDB::new(bgsave_cfg, snapshot_cfg, restore_filepath) {
- Ok((db, lock, cloned_descriptor)) => (db, lock, cloned_descriptor),
- Err(e) => {
- log::error!("ERROR: {}", e);
- process::exit(0x100);
- }
- };
+ let (db, lock) = match CoreDB::new(bgsave_cfg, snapshot_cfg, restore_filepath) {
+ Ok((db, lock)) => (db, lock),
+ Err(e) => {
+ log::error!("ERROR: {}", e);
+ process::exit(0x100);
+ }
+ };
match fs::create_dir_all(&*DIR_REMOTE_SNAPSHOT) {
Ok(_) => (),
Err(e) => match e.kind() {
@@ -388,12 +386,14 @@ pub async fn run(
log::info!("Signalling all workers to shut down");
}
}
- server.finish_with_termsig().await;
+ {
+ server.finish_with_termsig().await;
+ }
if let Some(Err(e)) = lock.map(|mut val| val.unlock()) {
log::error!("Failed to release lock on data file with '{}'", e);
process::exit(0x100);
}
- (db, cloned_descriptor)
+ db
}
/// This is a **test only** function
diff --git a/server/src/diskstore/flock.rs b/server/src/diskstore/flock.rs
index 9f6a313..9f37756 100644
--- a/server/src/diskstore/flock.rs
+++ b/server/src/diskstore/flock.rs
@@ -97,18 +97,6 @@ impl FileLock {
// Now write to the file
self.file.write_all(bytes)
}
- pub fn try_clone(&self) -> Result<Self> {
- Ok(Self {
- file: self.file.try_clone()?,
- unlocked: false,
- })
- }
- /// Reacquire a lock
- pub fn reacquire(&mut self) -> Result<()> {
- Self::_lock(&self.file)?;
- self.unlocked = false;
- Ok(())
- }
}
impl Drop for FileLock {
diff --git a/server/src/main.rs b/server/src/main.rs
index b0b0074..2d461c3 100644
--- a/server/src/main.rs
+++ b/server/src/main.rs
@@ -77,10 +77,10 @@ fn main() {
.enable_all()
.build()
.unwrap();
- let (db, mut descriptor) = runtime.block_on(async {
+ let db = runtime.block_on(async {
let (tcplistener, bgsave_config, snapshot_config, restore_filepath) =
check_args_and_get_cfg().await;
- let (db, descriptor) = run(
+ let db = run(
tcplistener,
bgsave_config,
snapshot_config,
@@ -88,7 +88,7 @@ fn main() {
restore_filepath,
)
.await;
- (db, descriptor)
+ db
});
// Make sure all background workers terminate
drop(runtime);
@@ -97,19 +97,21 @@ fn main() {
1,
"Maybe the compiler reordered the drop causing more than one instance of CoreDB to live at this point"
);
- // Try to acquire lock almost immediately
- if let Err(e) = descriptor.reacquire() {
- log::error!("Failed to reacquire lock on data file with error: '{}'", e);
- panic!("FATAL: data file relocking failure");
- }
- if let Err(e) = flush_db!(db, descriptor) {
+ let mut lock = match diskstore::flock::FileLock::lock("data.bin") {
+ Ok(lck) => lck,
+ Err(e) => {
+ log::error!("Failed to reacquire lock on data file with '{}'", e);
+ std::process::exit(0x100);
+ }
+ };
+ if let Err(e) = flush_db!(db, lock) {
log::error!("Failed to flush data to disk with '{}'", e);
loop {
// Keep looping until we successfully write the in-memory table to disk
log::warn!("Press enter to try again...");
io::stdout().flush().unwrap();
io::stdin().read(&mut [0]).unwrap();
- if let Ok(_) = flush_db!(db, descriptor) {
+ if let Ok(_) = flush_db!(db, lock) {
log::info!("Successfully saved data to disk");
break;
} else {
from skytable.
Patch updated to remove reacquire
from skytable.
We should backport this fix and rollout to 0.5.3
from skytable.
Fix backported: https://github.com/skytable/skytable/tree/0.5.2/backports/fix-storage
from skytable.
This should be tagged into 0.5.3
from skytable.
Backport tagged and pushed into release channel: https://github.com/skytable/skytable/releases/tag/v0.5.3
from skytable.
Related Issues (20)
- Feature: Subscribe to inserts / updates on a table
- Feature: Supply initial schema to the server HOT 1
- Feature: Support 'cargo install' for the Skytable binaries
- Tracking issue: Skytable 0.8.0 HOT 3
- how is multithreading achieved? through glommio or monoio? HOT 4
- Feature: 1. golang client connector, 2. distributed / cluster. HOT 2
- Feature: disk backed? HOT 1
- Infra: Resume M1 builds HOT 5
- I'm running into a weird issue with subprocess output HOT 2
- Performance: PGO applicability HOT 2
- Feature: Support key filter api HOT 3
- Feature: support uuid as key HOT 1
- Bug: `lskeys` with `u64::MAX` crashes the server HOT 4
- Infra: Resume ARM builds HOT 3
- Feature: Raft clustering HOT 1
- Feature: Range queries HOT 1
- CLI: Error when storing a json object with python HOT 2
- Docs: how to specify keyspace and table with the skysh CLI without REPL? HOT 2
- Release track: Skytable Octave (0.8) HOT 6
- [Octave] Data upgrades from 0.7 to 0.8 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from skytable.