ark-builders / arklib Goto Github PK
View Code? Open in Web Editor NEWCore of the programs in ARK family
License: MIT License
Core of the programs in ARK family
License: MIT License
At the moment, we write all fields in single JSON: URL, title and description; all these fields are treated as parts of resource. This causes ResourceId
be recalculated every time title or description changes. ResourceId
should depend only on the URL. The rest must be written into upcoming metadata storage.
This changes the very core of ARK Shelf app (mobile and desktop versions).
We are interested in overall performance for index construction, i.e. both pure hashing speed and collisions amount are important. Note that there are at least 2 kinds of t1ha
function: t1ha0
(the fastest) and t1ha1
(portable).
We should also measure t1ha2
and t1ha3
since they should have less collisions.
It seems that we have 3 different ways to provide created_at
timestamp:
.ark/cache/metadata
.ark/user/properties
.ark/index
Here is an example:
https://github.com/mozilla/uniffi-rs/blob/main/examples/todolist/src/todolist.udl
We have bindings for Android in https://github.com/ARK-Builders/arklib-android but we maintain them manually. Uniffi-rs might help reducing boilerplate and necessity to update bindings manually.
If it also provides equivalent bindings for iOS from the same file, it's especially beneficial.
This repo must be the "single source of truth" for the whole ARK project.
We have also ArkLib Android repo providing bindings to Rust code for Android and also some other functionality. That extra functionality will go to this repo eventually.
An important file from the Android repo defines storage path constants:
object ArkFiles {
const val ARK_FOLDER = ".ark"
const val STATS_FOLDER = "stats"
const val FAVORITES_FILE = "favorites"
// User-defined data
const val TAG_STORAGE_FILE = "user/tags"
const val SCORE_STORAGE_FILE = "user/scores"
const val PROPERTIES_STORAGE_FOLDER = "user/properties"
// Generated data
const val METADATA_STORAGE_FOLDER = "cache/metadata"
const val PREVIEWS_STORAGE_FOLDER = "cache/previews"
const val THUMBNAILS_STORAGE_FOLDER = "cache/thumbnails"
}
These constants must be moved to this repo and be imported to the Android repo via bindings.
By using a Rust crate we can solve both issues with PDFs in ARK Navigator:
ARK-Builders/ARK-Navigator#153
ARK-Builders/ARK-Navigator#157
Maybe something from here?
Previews should be possible to generate in both low and high quality: the function should accept quality parameter with acceptable values low
, medium
, high
where high
is looking nice on laptop and zoomable. For easier verification, this function should have dedicated command in this tool: https://github.com/ARK-Builders/ark-cli which would accept path to a PDF file and save JPG/PNG to another file.
It's necessary to initialize compound indexes, store them somewhere in library process, allow to re-use them, allow to use them as normal indexes. Aggregated indexes must provide interface as close to "plain" index interface as possible. Aggregated indexes should use plain indexes as shards delegating operations to them (execute some operation on all shards, if any succeeded return its result).
https://github.com/ARK-Builders/ARK-Navigator/blob/1d6cfa9a15d95a2ca1d7628042142f972931393f/app/src/main/java/space/taran/arknavigator/mvp/model/repo/index/AggregatedResourcesIndex.kt
Am I right that after we loaded Rust library in an Android app, we can:
Because the same index should be re-usable in different aggregated indexes and also the app can request the same aggregated index again several times, would be unnecessary to re-construct them all the time.
Hidden subfolders must be ignored during the filesystem traversal. But if the root is hidden we still need to perform the traversal.
At this moment, our index is structured like this:
pub struct ResourceIndex {
pub id2path: HashMap<ResourceId, CanonicalPathBuf>,
pub path2id: HashMap<CanonicalPathBuf, IndexEntry>,
pub collisions: HashMap<ResourceId, usize>,
root: PathBuf,
}
Ideally, we need id2path
to have values of type HashSet<CanonicalPathBuf>
because an id can have multiple path attached due to id collisions. We track a number of these collisions, but removing an id in a generic way still requires iteration through all paths to find matching entries (see the PR #57). Otherwise, if just take the path from id2path
and remove it, we'll have the id left in path2id
which is unreachable from id2path
. And collisions[id]
would be positive, too.
An idea I had some time before is composite ids:
The way I see it functioning is:
collisions
mapping for the id, we just add it into id2path
.collisions
mapping for the id, we add value of another quick hashing function to it.We can use SHA-256 instead of using several hash functions, but it's very slow so we want to avoid it because we work with local user files. We might have background indexing process which upgrades fast ids to secure ids though.
Either way, if we do tricks with ids we have a problem:
Upgraded(old_id, new _id)
.We can't allow half-written files in storages, even in cache.
Write data to a temporary file, then move it to the target path.
This is a blocker for cross-platform functioning.
ark-cli monitor
..ark/index
exists.Pay attention to WARN
lines:
[kirill@lenovo TEST]$ RUST_LOG=info ark-cli monitor
Building index of folder /tmp/TEST
[2023-03-06T16:21:33Z INFO arklib] Index has not been registered before
[2023-03-06T16:21:33Z INFO arklib::index] Loading the index from file
[2023-03-06T16:21:33Z WARN arklib::index] No persisted index was found by path /tmp/TEST/.ark/index
[2023-03-06T16:21:33Z INFO arklib::index] Building the index from scratch
[2023-03-06T16:21:36Z INFO arklib::index] Index built
[2023-03-06T16:21:36Z INFO arklib::index] Storing the index to file
[2023-03-06T16:21:36Z INFO arklib] Index was registered
Build succeeded in 3.667931346s
Updating succeeded in 158.747289ms
^C
[kirill@lenovo TEST]$ RUST_LOG=info ark-cli monitor #just load index from the file and check for updates
Building index of folder /tmp/TEST
[2023-03-06T16:21:57Z INFO arklib] Index has not been registered before
[2023-03-06T16:21:57Z INFO arklib::index] Loading the index from file
[2023-03-06T16:21:58Z INFO arklib::index] Storing the index to file
[2023-03-06T16:21:58Z INFO arklib] Index was registered
Build succeeded in 297.370813ms
^C
[kirill@lenovo TEST]$ grep gagarin .ark/index
1678115269692 200433-880886451 gagarin.jpg
[kirill@lenovo TEST]$ mv gagarin.jpg /tmp/
[kirill@lenovo TEST]$ RUST_LOG=info ark-cli monitor #must load the index and remove disappeared resource
Building index of folder /tmp/TEST
[2023-03-06T16:22:43Z INFO arklib] Index has not been registered before
[2023-03-06T16:22:43Z INFO arklib::index] Loading the index from file
[2023-03-06T16:22:43Z WARN arklib::index] No such file or directory (os error 2)
[2023-03-06T16:22:43Z INFO arklib::index] Building the index from scratch
[2023-03-06T16:22:47Z INFO arklib::index] Index built
[2023-03-06T16:22:47Z INFO arklib::index] Storing the index to file
[2023-03-06T16:22:47Z INFO arklib] Index was registered
Build succeeded in 3.651734571s
Thanks @mdrlzy for discovering this bug.
This would prevent pushing code breaking code style.
Index
module must be well-tested to ensure there are no significant bugs:
https://github.com/ARK-Builders/arklib/blob/main/src/index.rs
Test cases must be run during CI.
Using ark-cli:
.ark/index
file.ark-cli monitor
, wait till the index is computed, Ctrl+C..ark.index
.Expected: the index file exists.
Actual: it is absent.
Example:
[kirill@lenovo test]$ rm .ark/index
[kirill@lenovo test]$ ../ark-cli monitor
Building index of folder /tmp/test
Build succeeded in 12.901021531s
Updating succeeded in 287.573914ms
Updating succeeded in 283.567582ms
Updating succeeded in 292.411364ms
^C
[kirill@lenovo test]$ ls -lah .ark/index
ls: cannot access '.ark/index': No such file or directory
If you enable debug log, you can see there is an IO error
in arklib (absent file).
[kirill@lenovo test]$ RUST_LOG=debug ../ark-cli monitor
Building index of folder /tmp/test
[2024-01-03T15:29:07Z INFO arklib] Index has not been registered before
[2024-01-03T15:29:07Z INFO arklib::index] Loading the index from file /tmp/test/.ark/index
[2024-01-03T15:29:07Z WARN arklib::index] IO error
[2024-01-03T15:29:07Z INFO arklib::index] Building the index from scratch
[2024-01-03T15:29:07Z DEBUG arklib::index] Discovering all files under path /tmp/test
^C
Apparently, IO errors are not handled in a nice way.
See ARK-Builders/ARK-Navigator#142 for the context.
TL;DR: At the moment, index is built for any "root" folder and stored in memory.
Would be cool to persist it, so we wouldn't recalculate resource ids on different devices. If we store the index into our .ark
folder, then it cache gets synced to other devices (that's why replicated). Smart write to avoid conflicts is necessary, but would be too difficult for this moment, let's assume now that only 1 device writes index into some root at the same moment.
Let's fix .ark/index
path for it.
ARK Shelf app crashes when loading some test data provided here:
ARK-Builders/ARK-Navigator#412 (comment)
Below is the crash stack trace:
Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 13813 (DefaultDispatch), pid 13787 (ilders.arkshelf)
Cmdline: dev.arkbuilders.arkshelf
pid: 13787, tid: 13813, name: DefaultDispatch >>> dev.arkbuilders.arkshelf <<<
#01 pc 000000000060f204 /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!libarklib.so
#02 pc 000000000060c9d0 /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!libarklib.so
#03 pc 000000000060c7f4 /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!libarklib.so
#04 pc 000000000060c540 /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!libarklib.so
#05 pc 000000000060b234 /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!libarklib.so
#06 pc 000000000060c290 /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!libarklib.so
#07 pc 000000000062b63c /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!libarklib.so
#08 pc 000000000062b95c /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!libarklib.so
#09 pc 00000000002e9f84 /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!libarklib.so (Java_dev_arkbuilders_arklib_LibKt_loadLinkFileNative+1272)
#12 pc 0000000000422c3a [anon:dalvik-classes.dex extracted in memory from /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk]
#14 pc 000000000000558e [anon:dalvik-classes3.dex extracted in memory from /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!classes3.dex]
#16 pc 0000000000005234 [anon:dalvik-classes3.dex extracted in memory from /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!classes3.dex]
#18 pc 000000000000514a [anon:dalvik-classes3.dex extracted in memory from /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!classes3.dex]
#20 pc 00000000004a7baa [anon:dalvik-classes.dex extracted in memory from /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk]
#22 pc 00000000001586aa [anon:dalvik-classes9.dex extracted in memory from /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!classes9.dex]
#24 pc 000000000014af02 [anon:dalvik-classes9.dex extracted in memory from /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!classes9.dex]
#26 pc 00000000004a7bfe [anon:dalvik-classes.dex extracted in memory from /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk]
#28 pc 0000000000150eac [anon:dalvik-classes9.dex extracted in memory from /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!classes9.dex]
#30 pc 000000000018654e [anon:dalvik-classes9.dex extracted in memory from /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!classes9.dex]
#32 pc 000000000018dffe [anon:dalvik-classes9.dex extracted in memory from /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!classes9.dex]
#34 pc 000000000018d102 [anon:dalvik-classes9.dex extracted in memory from /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!classes9.dex]
#36 pc 000000000018be12 [anon:dalvik-classes9.dex extracted in memory from /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!classes9.dex]
#38 pc 000000000018bf40 [anon:dalvik-classes9.dex extracted in memory from /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!classes9.dex]
#40 pc 000000000018bef0 [anon:dalvik-classes9.dex extracted in memory from /data/app/~~IUZgOgwoWHgu_I0i8Furdg==/dev.arkbuilders.arkshelf--T4EM_LqAKrWhBDSX9c5YA==/base.apk!classes9.dex]
We can move Link-to-Web resource kind from ARK Shelf (ARK-Builders/ARK-Shelf#1)
and ARK Navigator (ARK-Builders/ARK-Shelf#37) into this library.
We would benefit from this in 2 ways:
Benchmark should be ran in CI upon all commits in main
.
Simplified procedure for atomic writing is "write to a temporary file, then hard link to the destination". This becomes a problem when the "destination" and "temporary file" reside on different filesystems, e.g. /tmp
and /home
.
Possible solutions:
The Test
atomic/files.rs
#[test]
fn multiple_version_files() {
leaves the artifact {}_cellphoneID
in the tmp folder.
Correct Behavior: It should be cleaned up after the test.
This issue is an open discussion to changes proposed to the API in src/index.rs
.
pub struct ResourceIndex {
/// A mapping of resource IDs to their corresponding file paths
id2path: HashMap<ResourceId, PathBuf>,
/// A mapping of file paths to their corresponding index entries
path2id: HashMap<PathBuf, IndexEntry>,
/// A mapping of resource IDs to the number of collisions they have
pub collisions: HashMap<ResourceId, usize>,
/// The root path of the index
root: PathBuf,
}
impl ResourceIndex {
/// Returns the number of entries in the index
///
/// Note that the amount of resource can be lower in presence of collisions
pub fn count_files(&self) -> usize;
/// Returns the number of resources in the index
pub fn count_resources(&self) -> usize;
/// Builds a new resource index from scratch using the root path
///
/// This function recursively scans the directory structure starting from
/// the root path, constructs index entries for each resource found, and
/// populates the resource index
pub fn build<P: AsRef<Path>>(root_path: P) -> Self;
/// Loads a previously stored resource index from the root path
///
/// This function reads the index from the file system and returns a new
/// [`ResourceIndex`] instance. It looks for the index file in
/// `$root_path/.ark/index`.
///
/// Note that the loaded index can be outdated and `update_all` needs to
/// be called explicitly by the end-user. For automated updating and
/// persisting the new index version, use [`ResourceIndex::provide()`] method.
pub fn load<P: AsRef<Path>>(root_path: P) -> Result<Self>;
/// Stores the resource index to the file system
///
/// This function writes the index to the file system. It writes the index
/// to `$root_path/.ark/index` and creates the directory if it's absent.
pub fn store(&self) -> Result<()>;
/// Provides the resource index, loading it if available or building it from
/// scratch if not
///
/// If the index exists at the provided `root_path`, it will be loaded,
/// updated, and stored. If it doesn't exist, a new index will be built
/// from scratch
pub fn provide<P: AsRef<Path>>(root_path: P) -> Result<Self>;
/// Updates the index based on the current state of the file system
///
/// Returns an [`IndexUpdate`] object containing the paths of deleted and
/// added resources
pub fn update_all(&mut self) -> Result<IndexUpdate>;
/// Indexes a new entry identified by the provided path, updating the index
/// accordingly.
///
/// The caller must ensure that:
/// - The index is up-to-date except for this single path
/// - The path hasn't been indexed before
///
/// Returns an error if:
/// - The path does not exist
/// - Metadata retrieval fails
pub fn index_new(&mut self, path: &dyn AsRef<Path>) -> Result<IndexUpdate>;
/// Updates a single entry in the index with a new resource located at the
/// specified path, replacing the old resource associated with the given
/// ID.
///
/// # Restrictions
///
/// The caller must ensure that:
/// * the index is up-to-date except for this single path
/// * the path has been indexed before
/// * the path maps into `old_id`
/// * the content by the path has been modified
///
/// # Errors
///
/// Returns an error if the path does not exist, if the path is a directory
/// or an empty file, if the index cannot find the specified path, or if
/// the content of the path has not been modified.
pub fn update_one(
&mut self,
path: &dyn AsRef<Path>,
old_id: ResourceId,
) -> Result<IndexUpdate>;
/// Inserts an entry into the index, updating associated data structures
///
/// If the entry ID already exists in the index, it handles collisions
/// appropriately
fn insert_entry(&mut self, path: PathBuf, entry: IndexEntry);
/// Removes the given resource ID from the index and returns an update
/// containing the deleted entries
pub fn forget_id(&mut self, old_id: ResourceId) -> Result<IndexUpdate>;
/// Removes an entry with the specified path and updates the collision
/// information accordingly
///
/// Returns an update containing the deleted entries
fn forget_path(
&mut self,
path: &Path,
old_id: ResourceId,
) -> Result<IndexUpdate>;
Some of the issues with the current structure are:
ResoureIndex
which prevents us from simply serializing the index to a fileprovide
may be confusing when compared to other methodsResourceIndex
#[derive(PartialEq, Clone, Debug, Serialize, Deserialize)]
pub struct IndexedResource {
pub id: ResourceId,
pub path: PathBuf,
pub last_modified: SystemTime,
}
#[derive(PartialEq, Clone, Debug, Serialize, Deserialize)]
pub struct ResourceIndex {
pub resources: Vec<IndexedResource>,
pub root: PathBuf,
}
Technically, it is more sound to perform a full update, even when only a single resource is of interest. Because other resources could have been changed, too. However, it is not always easy for the client to process additional updates.
Single-resource update
function could be convenient. This function should receive path of the resource to update.
Here are some scenarios where this function would be applicable:
Deleted
event.This is an oversight of implementation in #42
See #38 for more context. TL;DR: We use update_one
for cases when a resource by some path changed or was deleted at all. We can't canonicalize a non-existent path, so we fail to call update_one
in this case. This API should receive plain Path
-like type: Path
, PathBuf
or be generic with AsRef<Path>
.
ResourceIndex
should pass complete resources with their details during update
.
Right now, only ResourceId
s are passed. This causes the library clients to reconstruct details again.
See fun compute
in Resource.kt
(https://github.com/ARK-Builders/arklib-android):
https://github.com/ARK-Builders/arklib-android/blob/f95ffc3b97c18e00f0cbed40b6b5c854254cab1c/lib/src/main/java/space/taran/arklib/domain/index/Resource.kt#L22
See RootIndex.kt
in arklib-android:
BindingIndex.update(path)
BindingIndex.store(path)
Right now, we do updating twice if the index is being provided for the first time.
This updating should be done in arklib
during provision, if we already have index instance.
By using a Rust crate we can solve both issues with PDFs in ARK Navigator:
ARK-Builders/ARK-Navigator#153
ARK-Builders/ARK-Navigator#157
Maybe something from here?
Previews should be possible to generate in both low and high quality: the function should accept quality parameter with acceptable values low
, medium
, high
where high
is looking nice on laptop and zoomable. For easier verification, this function should have dedicated command in this tool: https://github.com/ARK-Builders/ark-cli which would accept path to a PDF file and save JPG/PNG to another file.
We introduce machine-uid crate in this PR:
The crate comes with a disclaimer:
In Linux, machine id is a single newline-terminated, hexadecimal, 32-character, lowercase ID. When decoded from hexadecimal, this corresponds to a 16-byte/128-bit value. This ID may not be all zeros. This ID uniquely identifies the host. It should be considered “confidential”, and must not be exposed in untrusted environments. And do note that the machine id can be re-generated by root.
Alternative would be generating a random device id, storing it in app data folder, sharing it privately with other devices when necessary, etc.
We've decided in favor of machine-uid because it's much easier to implement and outside entity can't figure out the ids if secure transport is utilized. But with unique random ids, unencrypted transport could be used. If privacy becomes a concern, we should implement this approach.
Because of this, in arklib-android we have to expect that all resources were just added at startup
And we can't find out what resources were deleted
Usually, for sorting by "date" timestamp "last modified" is used. This timestamp comes from common filesystems and means last time file was modified, but also it can be updated without actual content modification. And in case of such modification, our app considers the resource to be the same due to usage of content-addressing. Our app is also supposed to be used in distributed setup (at this moment, by using external syncing app like Syncthing). When a resource is replicated to other devices, all replicas have different "last modified" attribute.
It makes more sense for a user to think about resource creation time. Some filesystems has "created" timestamp, but such a timestamp would be reset every time the user moves the resource. We can provide semantically similar attribute "first discovered" which would mean the time when the resource was first time indexed by any of our apps on any of the user's devices.
This timestamp should be stored in the (persisted and replicated) index, so it would be propagated to other devices.
Can we benefit from this crate?
https://github.com/notify-rs/notify
Core feature of ARK Navigator is tags-based resources filtering. A resource identified by ResourceId can have multiple tags attached to it. A tag is just a string. This way, tags storage is just a mapping between identifiers and sets of strings.
The storage is persisted using hidden .ark-tags
file. It is assumed, that .ark-tags
file is replicated to other user devices (e.g. phone and tablet can have tags in sync) by using external software like Syncthing. This way, all user devices can have tags in sync.
In this task, loading and persisting of .ark-tags
file must be ported from Kotlin to Rust. Functions of Storage interface must be implemented in the lib. The library must be stateful, keeping data in memory between calls. The library must support having several tag storages in memory simultaneously, so StorageId should be returned upon loading and be used in function calls by client app.
Loading of the storage should be done by passing root folder path from the client app, not by path of .ark-tags
itself since structure of internal files can change in future and the library must automatically locate necessary internal files.
At this stage, it seems to be redundant to port Sharded storage. The most important is to stick storage format to the lib, so apps depending on the lib would always use the same format of tags storage.
We use crc32fast crate to generate ResourceId
. It was one of the fastest hash functions 3 years ago, when blake3 was invented.
Official metrics:
Blake3, AWS c5.metal, 16 KiB input, 1 thread:
6866 MiB/s
CRC32, unknown env and run parameters:
baseline: 1499 MiB/s,
pclmulqdq: 7314 MiB/s
Let's create a small benchmark to compare them in same environment, with same parameters.
Even if blake3
is same performant as crc32
, it would be worth updating arklib
, because blake3
is cryptographic hash function and it means no collisions in the index.
update_one
update_all
Then, we want to verify that:
update_all
can be replaced by multiple update_one
update_all
and update_one
We can abstract ResourceIndex
over the id type and, consequently, over hashing function.
#[derive(Eq, Ord, PartialEq, PartialOrd, Hash, Clone, Debug)]
pub struct IndexEntry<Id> {
pub modified: SystemTime,
pub id: Id,
}
#[derive(PartialEq, Clone, Debug)]
pub struct ResourceIndex<Id> {
pub id2path: HashMap<Id, CanonicalPathBuf>,
pub path2id: HashMap<CanonicalPathBuf, IndexEntry<Id>>,
pub collisions: HashMap<Id, usize>,
root: PathBuf,
}
This way, we can use cryptographic hash functions to test index in simpler cases, when no collisions are present. This can simplify development and debugging. We could also use fake hash functions for testing only collisions.
Cryptographic hash function can also be used for apps where safety is more important than performance. The fast hash functions will be used in experimental "fast mode", it's the most useful for file browser apps.
Depends on ARK-Builders/ARK-Navigator#163
The library must be built in 2 modes: Debug and Release, both must be published and downloadable from GitHub Actions.
ARK Navigator and other dependents should be configured to use Release build by default.
Currently the error on the Rust side are not really managed.The arlkib crate should return a custom error and not anyhow error. It's possible to wrap anyhow to a custom error. With that different errors would be possible
A storage is a subfolder of .ark
, e.g. .ark/index
or .ark/tags
. It represents a mapping from ResourceId
to some T
.
For .ark/index
, the T
is Path
. And for .ark/tags
, the T
is Set<String>
. Each entry can be represented by a file .ark/<storage>/<resource_id>
with a single line content. This kind of storage should give us the least amount of read/write conflicts, but not very efficient for syncing and reading. Old chunks could be batched into bigger multi-line files.
So, chunked storage would be a set of files like this:
.ark/<storage_name>/<batch_id1>
|-- <resource_id1> -> <value1>
|-- <resource_id2> -> <value2>
.ark/<storage_name>/<resource_id3>
|-- <value3>
.ark/<storage_name>/<batch_id2>
|-- <resource_id4> -> <value4>
|-- <resource_id5> -> <value5>
|-- <resource_id6> -> <value6>
.ark/<storage_name>/<resource_id7>
|-- <value7>
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.