Giter Club home page Giter Club logo

Comments (9)

ThomasRooney avatar ThomasRooney commented on June 2, 2024 4

I had this problem as I needed to install this on a machine without admin access (which is needed to install leveldb on windows). Changed the implementation to point at medea instead of leveldb: here's the fork if anyone's interested. https://github.com/ThomasRooney/hard-source-webpack-plugin

from hard-source-webpack-plugin.

wmertens avatar wmertens commented on June 2, 2024 2

I'd just like to point out that sqlite is very small, very fast, battle-tested and can store and query JSON. It stores everything in one file, and it can also store to memory, which is awesome for unit tests.

It can also index arbitrary expressions (e.g. for direct lookups of JSON values), but only one result per row (so if you need an index for an array in a JSON blob, you need to use a helper table).

I'm working on a small wrapper that treats sqlite as a document store with arbitrary expression indexing, if that would be interesting, but I did not release it yet.

from hard-source-webpack-plugin.

mzgoddard avatar mzgoddard commented on June 2, 2024 1

Thank you for opening this issue @bsideup!

I'd be down to go this route. I'm a little annoyed by needing to build leveldb too when I add this to projects. We would need to find a purely NodeJS solution that replaces our primary needs for disk storing.

Those needs are:

  1. Read out all the cache items as quickly as possible.

    LevelDB provides an API for this that iterates every key in its database and it isn't just for show. LevelDB does this quickly.

  2. Write to the database only the changed items.

    After the first build, iterative builds don't need to store that many new items most of the time. Writing the whole database just for a relatively few new entries should be avoided. Databases like LevelDB and SQLite do this well. I don't recall for SQLite but LevelDB does this by being a log-style database. New entries and new values of old entries are appended to a log and occasionally LevelDB will garbage collect to reduce the size of the database.

  3. Use as few files as possible.

    Many modules for webpack projects are pretty small. Reading a lot of tiny files at boot will in most cases lose some of our gains with HardSource that we can try to keep by having a caching store that writes to as few files as possible. This is the opposite of how assets are stored. They are stored as individual files with the current assumption that they are large enough to not easily gain boot time performance.

I think it would be alright if we had a solution that provided the APIs and desired performance with these elements. I want to make LevelDB optional so if we could find or make this and have some amount of the performance window that'd be fine as any who still wants the LevelDB performance could install it.


A bit on HardSource's history. Early on it used a giant single JSON blob. This was slow for reasons:

  • The assets were stored inside so they had to be serialized to and deserialized from JSON. Assets are big so this was really slow.
  • The whole thing had to be rewritten to disk to properly update the JSON format and that entries could be anywhere in the blob.

from hard-source-webpack-plugin.

mzgoddard avatar mzgoddard commented on June 2, 2024

Thanks @wmertens. The main point I took from what @bsideup is bringing up is that LevelDB is a compiled language dependency. While sqlite has those values its also implemented in C like LevelDB. sql.js may be an option, but its less tested in that format and holds the database in memory which could be a problem for large projects.

Really I want to open this up with an API that others can use to define their own backend for the cache. I imagine some devs may use a remote server possibly. There is the Serializer API internally already the API I need to figure out is how what which Serializer with with which options is used by which cache. Part with the Serializer API being very simple, HardSource currently uses multiple LevelDB instances to cache separate cache data sets.

To have a JS implemented default backend I want to take a look at using medea which has a levelup backend so hooking up will be easy, it'll mostly be making sure it works with some test repos. On that there is also SQLDown which may provide an easy avenue for users to replace medea (if that fulfills the default needs) with sqlite.

from hard-source-webpack-plugin.

reconbot avatar reconbot commented on June 2, 2024

A network backed cache scares me a ton. Flexibility however does not scared
me. How big do they get?

On Mon, Nov 21, 2016, 3:22 PM mzgoddard [email protected] wrote:

Thanks @wmertens https://github.com/wmertens. The main point I took
from what @bsideup https://github.com/bsideup is bringing up is that
LevelDB is a compiled language dependency. While sqlite has those values
its also implemented in C like LevelDB. sql.js
https://www.npmjs.com/package/sql.js may be an option, but its less
tested in that format and holds the database in memory which could be a
problem for large projects.

Really I want to open this up with an API that others can use to define
their own backend for the cache. I imagine some devs may use a remote
server possibly. There is the Serializer API internally already the API I
need to figure out is how what which Serializer with with which options is
used by which cache. Part with the Serializer API being very simple,
HardSource currently uses multiple LevelDB instances to cache separate
cache data sets.

To have a JS implemented default backend I want to take a look at using
medea https://github.com/medea/medea which has a levelup backend so
hooking up will be easy, it'll mostly be making sure it works with some
test repos. On that there is also SQLDown
https://github.com/calvinmetcalf/SQLdown which may provide an easy
avenue for users to replace medea (if that fulfills the default needs) with
sqlite.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#53 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABlbirWjfjH9e0yq00Xn-NaYo8ZLSITks5rAf2agaJpZM4KcCPP
.

from hard-source-webpack-plugin.

mzgoddard avatar mzgoddard commented on June 2, 2024

@reconbot I guess as a measure I have two projects with it I pulled some numbers from. One is ~400 modules with a modules cache of 1.1 MB. The other is ~800 modules with a modules cache of 6.6 MB. Both of those values are after a run with any old caches deleted. So those are likely the smallest cache sizes for those projects. Reused caches will grow and shrink as any data store accumulates changes (it'd be extremely inefficient to keep the stores at their smallest possible size all the time.)

With source-maps enabled those grow to 7.9 MB and 10.9 MB.

I think these are fairly small projects, you could easily have way more files. And not all modules being the same you could have those modules be much larger. So I'd figure these are kind of starting points for the size of the cache. Other projects could only be the same size or larger.

from hard-source-webpack-plugin.

thangngoc89 avatar thangngoc89 commented on June 2, 2024

I think you can use levelUp with a pure js adapter for this purpose (like medea or level-js)

from hard-source-webpack-plugin.

mzgoddard avatar mzgoddard commented on June 2, 2024

I'm adding a plugin hook to hard source for customizing the disk-level cache mechanism (#130). That'll be in v0.4 of hard-source. I'll be replacing the default LevelDbSerializer with a MedeaSerializer for v0.5 of hard-source. If anyone opens a PR with that work before I get to it I'll include the MedeaSerializer through a plugin like the HardSourceJsonSerializerPlugin in (#130) in a v0.4 patch version (so it'll be there but not the default).

from hard-source-webpack-plugin.

earnubs avatar earnubs commented on June 2, 2024

Hi, I had a try at a MedeaSerializer to replace LevelDb (master...earnubs:master) but some of the tests fail with larger files (the same test failures can be seen in https://github.com/ThomasRooney/hard-source-webpack-plugin, which uses MedeaDown as backend to LevelUp ... my fork just uses Medea).

If you run npm test then this https://github.com/earnubs/hard-source-webpack-plugin/blob/master/test-case.js you'll see that the value from Medea.get() is truncated.

(node v6.11.1)

from hard-source-webpack-plugin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.