jvilk / browserfs Goto Github PK
View Code? Open in Web Editor NEWBrowserFS is an in-browser filesystem that emulates the Node JS filesystem API and supports storing and retrieving files from various backends.
License: Other
BrowserFS is an in-browser filesystem that emulates the Node JS filesystem API and supports storing and retrieving files from various backends.
License: Other
Validate the value to the various buffer write methods.
FileSystems can have custom File
objects.
What API do they need to specify?
What can they optionally specify?
I'm already working on writing this.
Add a tool that emits a JavaScript script that loads a directory of test files into the current file system.
This will allow us to use Node's fixtures.
Note that this will need to be synchronous for now due to how our tests are run, so I'll likely hack it in specifically for the localStorage test. Later on, if we can make the tests asynchronous, we can alter the tool to emit asynchronous loading code that triggers the test start when it completes.
IndexedDB
requires the user to allow the webpage to store data on the current webpage, Dropbox
requires the user to log in...
I think it would be good to get some performance tests added to the test suite before spending any time implementing new methods to improve performance.
This would help us prevent regressions, and even in cases like lavelle/BrowserFS#11 where it's fairly obvious that it will improve performance, it's nice to be able to quantify and measure the improvement.
This would also allow us to make comparisons of the relative speeds of the different backends, to have some quantitative evidence for users when they're weighing up the advantages and drawbacks of each.
I couldn't find any performance tests in the Node repo, but it wouldn't be too difficult to write our own.
Doppio is unusably slow in IE9 due to all of the copying we are doing. To download a binary file using XmlHttpRequest
, we copy the entire file eagerly this many times:
We can skip the third and fourth copies if we instantiate our Buffer directly with the JavaScript array, which is in the same format that our polyfill internally stores the data as. This requires the Buffer implementation to be aware of the polyfill.
I need a simple build system to compile and compress BrowserFS
into a single module.
Support reading 2 byte numbers from odd offsets and such.
We're already storing a header of stat information for files, so it's natural to store a few bytes for directories as well. This allows us to persist empty directories, and it also paves the way for eventual symlink support.
LocalStorage
's API is synchronous, so I should add synchronous support to its file system.
Doing this will require adding synchronous API stubs in node_fs.coffee
, filesystem.coffee
, and file.coffee
, which means this is blocked by #23.
Right now, Karma loads each script into the browser, and they execute immediately.
It would be better if we can dynamically trigger test start so we can better use Jasmine to kick-off tests (and maybe to emulate process.exit
when all callbacks finish...). This would also make it simple to load in test files asynchronously by dogfooding our asynchronous API.
There are two ways to do this -- the good way, and the easy-but-bad way:
window.bfs_tests['test_name'] = function() {...
)The good way would allow us to set up individual Jasmine tests for each Node test, which would be awesome!
This currently depends on an implementation bug where parent directories are generated when files are written to.
This is a requirement for the newly-minted doppio-demo repository.
I can see this happening in two ways:
You'll laugh, but due to a bug, this is how we currently download listings.json
:
listings.json
as text.JSON.parse
on the Buffer. This implicitly runs Buffer.toString
, which converts the buffer contents from the binary representation of UTF8 strings back into a JavaScript string.These conversions are all unnecessary.
And here I was, wondering why Doppio with BFS hangs IE8/IE9 on load! Doppio's listings file is something like 900KB.
I opted to make my Dropbox backend a separate repo, because it seems like it should be an optional plug-in rather than core functionality. The four existing filesystems are universal enough to be part of the core, but I propose that any future ones be modular plugins.
We could create a method for registering a new backend on the BrowserFS namespace, like _.mixin
, so validation can be performed.
Others could then add their own backends without needing to send PRs to this repo.
We could have a wiki page with a list of all backends.
I can then add other cloud storage providers (Box, Google Drive, etc.) this way.
Backends can list this repo as a dependency in their bower.json
.
This would also suggest separating the test suite into its own repo. I'm trying to hook the Dropbox backend up to the Karma test suite but because the tests are in this repo, I'm just doing a messy ../dropbox-fs/dropbox.js
hack in karma.conf.js
.
Since the external API is always the same, only the implementation differs between backends, the one test suite should be enough for all plugins (plugins can obviously have their own test suites as well for testing internal behaviour).
See commit b7d3d41.
Even with the hackfix, it's still possible to get errors on occasion.
Both sides need to set up the message passing interface correctly to handle browser FS requests without disturbing any existing messages.
Ideal solution: User can encapsulate the default supplied BrowserFS send/receive functions how they want, or use them directly. Allows users to use BrowserFS easily, or in place of webworker messages.
This will make it even easier to make a custom BrowserFS file system (and I'd use them, too).
I should start by making templates for the following cases:
We need a stat object to mirror Node's stat.
I need to investigate if this is mutable in node. We can do clever things with getters/setters if needed.
Add an IndexedDB file system that is the following:
I may eventually add symlink support (just to have a symlink-capable file system), but it's much less important right now than the above.
I imagine this will be the most useful browser-local backend for BrowserFS, as IndexedDB is decently supported across browsers and can store large files.
Currently they are executed as
fs.mkdir("./test", mcb);
fs.mkdir("./test/fixtures", mcb);
fs.mkdir("test/fixtures/node", mcb);
fs.mkdir
is an asynchronous method, meaning that, for example, /test/fixtures/
could be created before /test/
, which would trigger an error since the parent of fixtures
would not exist at that point. These calls should be made asynchronous by moving each one into the callback of its predecessor, to avoid these race conditions.
I think we've agreed on keeping things modular and having separate repos for each backend (or each one not currently this repo, at least). To make it easier for users to find all the backends, it would be good to put them all under one account.
This account would have:
[Core + Test suite] <- [Backends]
[Test suite] <- [Backends] <- [Core]
Implement a buffer, which is how node users specify what to write to a file.
Add an XmlHttpRequest
filesystem that grabs files from a server. To start, it will use a JSON-based index file that contains any relevant file properties (mainly: file size, whether or not it's a directory, and potentially permissions information).
I'm working on this now, as it's required for Doppio integration.
Many methods in this library, and in APIs upon which it depends, are asynchronous, which could lead to some messy callback situations down the line.
I know the plan is to move to TypeScript some time. Async support in TypeScript is a while off (the 1.x series at least). Is there any way we can use some nice await
/defer
constructs in the meantime?
ES6's native async support is obviously ages away, but moving to IcedCoffeeScript would be pretty easy (it's a superset).
How do ProxyFile
stubs work when the filesystem uses no specialization?
How can they work with specialization?
Does the filesystem know ahead of time if the request is going to be proxied, or does it only support translating existing FDs into ProxyFiles (which I think is ideal)?
Maybe it would be good to have a FileSystem.GetProxyFd(fd)
function. ๐
Currently the Dropbox and Google Drive apps are registered under my accounts. We should create accounts with these services with some BrowserFS email address, so that we can all log in and access the app consoles, generate new API keys etc.
Any feedback users have would then go to the shared email account too, which is better than it going to my personal account.
PreloadFile
should have a generic implementation for property-related functions, like chmod
.
Since PreloadFile
has a reference to the file system that created it, these implementations should be guarded with:
unless @_fs.supportsProps()
cb new BrowserFS.ApiError BrowserFS.ApiError.NOT_SUPPORTED
I'm not sure if this is possible, but it would make things nicer.
Currently, here's how our tests run:
fs
callbacks are handled.It would be nice to insert a second part of the unit test where the function registered with process.exit
is run. I'll need to look at the Jasmine docs.
The BrowserFS core is meant to validate and normalise paths, so that backends can depend on the paths they receive being valid.
The empty string is not a valid path, however when the stat
method is called with this as the path, the string reaches the backend's method, and that method must perform a check for this and throw an error, as I have done in the Dropbox and HTML5 FS backends.
This should be fixed in the core, and the workarounds removed from these two backends.
Here's the relevant test case that exposes this bug.
You should be able to instantiate the API in some way.
For example, for an easy one-filesystem case:
fs.instantiate(BrowserFS.BrowserStorage())
For multiple in a hierarchy, you'd have to use something more complex.
Currently, PreloadFile
stores no metadata relating to what has changed.
At the least, PreloadFile
should track:
We need a mechanism to say 'run these tests for this particular backend'.
Some example tests:
localStorage
: Fill up localStorage and ensure an appropriate error is thrown. I've heard some versions of IE silently fail; this would be a useful test.FileSystem
: Try to use a FS that isn't appropriately provisioned.Dropbox
: Try to use Dropbox when the user hasn't authenticated.The easiest way to do this: Define a generateUnitTest
function on file systems with specific tests. We include this as part of the testing code; it's not distributed with BFS at all. We pass it the structures it needs to generate its specific tests.
We could even potentially use this as a factory method for constructing backends for testing. Meaning, it's responsible for constructing the backend, generating generic unit tests, generating specific unit tests, etc. We can define generateUnitTest
on FileSystem
which handles generating generic unit tests.
e.g. BrowserStorage
defaults to IndexedDB
if it's available, but fall backs to localStorage
if it's not available.
I need to use a unit testing framework that I can run in the browser, and potentially outside of the browser.
When designing this, maybe I should specify how to write an encrypted file with authentication information / test configuration for various filesystems. It would be password protected, so a password would be required to start the test.
Alternatively: Prompt for username/password before test begins, then maybe a browser-side password saver could handle saving things for convenience.
Right now, error messages thrown by the MFS are created by the mounted file systems themselves, which have no idea where they are mounted.
If we standardize our error messages, then the MFS can interpose on error messages to append the mount point to them.
This is low priority for now.
Write a simple in-memory filesystem that allows you to mount other filesystems at arbitrary locations.
This provides a simple mechanism for interacting with files across persistent storage mechanisms.
What do users need to implement for custom filesystems?
What can they optionally implement?
We currently store files in localStorage like so:
localStorage[/path/to/file] = binary string representation of data
We should be able to convert the path into a binary string:
localStorage[binary string representation of /path/to/file] = binary string representation of data
The keys are counted against the localStorage storage quota, so this could be a decent savings.
We'd still repeat parent directories, though, but it's better than what we currently do.
We only have little endian support atm.
After experimenting with TypeScript with Doppio, I have determined that it is very suitable for projects such as this one.
As a result, sometime later in the summer I will likely move to TypeScript. I do not anticipate that it will take longer than a week of work, and the payments for switching over will be worth the amount of churn required to perform the conversion.
List of benefits:
FileSystem
implementations are appropriately defined to the interface.IE9 passes all of our unit tests, but fails to load classes properly in Doppio (magic number failure). It's also absurdly slow due to the TypedArray implementation; so slow, that I was unable to get IE9's debug mode past preloading.
Solution (for slowness):
We only need a polyfill for DataView, which is what Buffers use. It should be easy to modify the DataView polyfill to directly use methods to convert its internal array of bytes into other forms. Right now, it constructs a new TypedArray and ArrayBuffer for each time you call a get
function -- just for the small segment of the array that you are processing.
They did this to reuse code, but it's very very very slow.
It would be manually intensive, but probably simple, to change the DataView polyfill to use the individual get
and set
functions that the typed arrays use directly to bypass this expensive object construction (which a profile reveals is the most expensive operation that's going on).
I am OK with absolutely ditching the TypedArray polyfills in favor of a shaved-down DataView-only polyfill. In fact, that's my favored solution right now.
This is high priority for the Doppio merger, but low priority for me for now. As far as I can tell, this is the only issue blocking the Doppio merger.
Note to @perimosocordiae: I would be OK with performing the merger as-is and leaving this bug for later fixing.
We should make it possible for file systems to implement a synchronous API.
Like system properties, this will be optional.
The default FileSystem
implementation can be augmented to automatically provide the asynchronous API if the file system implements the synchronous API -- it'll just call the synchronous equivalent and then run the callback.
This should happen after asynchronous support is solid.
Ala http://www.gnu.org/software/libc/manual/html_node/Error-Codes.html
Not a big priority, but it would mean we're more complete.
Currently this is only supported in Chrome, but apparently it's coming in 'near future' versions of all other desktop browsers.
I found a library which wraps it in a UNIX-like API, which should make tying it to Node's own UNIX-like API super easy. Of all the non-cloud backends, this looks like the best, feature-wise, so I think we should plan for the future when this has better support.
Chrome is the most popular browser already, so IMO it would be worth starting on it now, rather than waiting for the other browsers.
The Node API encourages using these when queueing up writes. Should be simple to implement generically as a queue of requests, right?
Certain backends will generate parent directories that don't exist when a file is written.
They should be updated so that writes only succeed when the parent directory of the file in the given path already exists. This will probably be achieved by editing the sync
method of their custom File
subclass.
Backends currently known to have this issue:
Hi,
I'm ran into some build issues with the latest version of doppio and browserfs.
Everything is fine if I used the pre-built git versions of vendor/browserfs/dist
browserfs also passes its tests.
However when I build my own browserfs and copy the new versions from browser/lib into browserfs/dist,
then I can no longer run doppio debug in the browser.
but make doppio test passes! Grrrr. Here's the original symptom.
Line 699
BrowserFS.node.fs.Stats = function() {
Uncaught TypeError: Cannot read property 'node' of undefined
Stats.FILE = 1;
Stats.DIRECTORY = 2;
Stats.SYMLINK = 3;
Stats.SOCKET = 4;
The underlying cause: a build order / dependency issue because Make 'wildcard' does not gauarnantee a build order.
Using Make's sort function is a reaonabale workaround, if you're happy requiring a modern version of Make .
FYI steps to finding the problem.
In the official 'dist' builds Browser.node is defined early ...
grep 'BrowserFS.node.*=' dist-orig/browserfs.js
BrowserFS.node = {};
BrowserFS.node.Buffer = function() {
BrowserFS.node.fs = function() {
BrowserFS.node.fs.Stats = function() {
BrowserFS.node.path = function() {
resolved = BrowserFS.node.path.normalize(cwd + (cwd !== "/" ? path.sep : "") + resolved);
BrowserFS.node.process = function() {
data = BrowserFS.node.Buffer((_ref2 = req.response) != null ? _ref2 : 0);
In my locally built version, BrowserFS.node = {} appears too late...
grep 'BrowserFS.node.*=' dist/browserfs.js
BrowserFS.node.fs.Stats = function() { <-- fs_stats TOO EARLY!
BrowserFS.node.process = function() {
BrowserFS.node.path = function() {
resolved = BrowserFS.node.path.normalize(cwd + (cwd !== "/" ? path.sep : "") + resolved);
BrowserFS.node.Buffer = function() {
BrowserFS.node.fs = function() { <-- SHOULD BE BEFORE fs.Stats above
BrowserFS.node = {}; <-- 000-browserfs.js SHOULD BE FIRST
data = BrowserFS.node.Buffer((_ref2 = req.response) != null ? _ref2 : 0);
My suspicion fell on the following ...
SRCS_CORE := $(wildcard src/core/*.coffee)
FYI On my OSX machine with Make v3.82
SRCS_CORE := $(wildcard src/core/*.coffee)
is not sorted
In fact, a google search shows this is true for 3-.82 onwards, despite what the docs say -
https://bugzilla.redhat.com/show_bug.cgi?id=635607
"The NEWS file for make-3.82 says:
If I'm reading "up to and including this release" right, the order was NOT supposed to change in this version. But it did.
"
Fortunately, Make supports a sort command, so I'll send a pull-request with the following -
#From Make 3.82 onwards, wildcard returns filenames in arbitrary order.
#Alphabetically sorting the values is sufficient for browserfs build dependencies.
SRCS_CORE_UNSORTED := $(wildcard src/core/*.coffee)
SRCS_CORE :=
Alternatively one could split the core, or just explicitly name the files in the Makefile.
Best,
Lawrence.
OK, tests are launched properly now, but they are registered on page load.
If we can figure out how to register tests asynchronously, then we can trigger test registration after loading in any test files for #19.
Right now, the issue is that Karma's Jasmine adapter appears to trigger Jasmine tests once page load finishes. I wonder if there is a way to trigger it manually?
It appears that using the ASCII
string util for the binary
string format is incorrect, and the Node test cases don't stress the issue.
I figured this out when I realized that Doppio used to use BINARY
to get an array of numbers that represents the file. Our ASCII behavior actually removes the highest bit from the data like Node does, as Node defines ASCII as 7-bit ASCII.
I think this is a matter of copying+pasting the ASCII behavior without the truncation, but I'm not completely certain yet. This is low priority for now.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.