Giter Club home page Giter Club logo

Comments (10)

johnholdun avatar johnholdun commented on May 18, 2024 1

I did some benchmarking:

Running first migration to create 100 tables…
Finished in 88ms
Adding 100000 records to each table…
Finished in 17144ms
Running second migration, no change to schema…
Finished in 1ms

This will get more complicated with indices and so on but even so, I think it'll last a long time. I don't expect a ton of churn in this schema anyway, especially when we can dump a lot of stuff into the settings table.

from postmarks.

johnholdun avatar johnholdun commented on May 18, 2024 1

As evidenced by the mention spam in this thread, I've created new tickets for each of the groups of shippable work outlined above. This ticket remains valid for step 9 in ckolderup's list above—it now depends on #135, #136, and #137.

I don't actually think that "depends on #foo" has any special ability in the way that "fixes #foo" does but I wish it did! I don't know what it would do though.

from postmarks.

johnholdun avatar johnholdun commented on May 18, 2024

I'm going to claim this one if you don't mind!

from postmarks.

ckolderup avatar ckolderup commented on May 18, 2024

@johnholdun I'm okay with that, but I do think it will require some changes to the activitypub.db schema, and we need to figure out a strategy for not only ongoing migrations from this point forward but also an upgrade path for people who have existing installs and need new schema updates applied to their existing db. Does that sound right?

Timing-wise I'm about to head into a 2-week period where my time availability is pretty low; I'll be helping out with an in-law family event from 9/27-10/2 and then traveling for a work event 9/2-9/5. I do trust you to figure this out if you're still up for it, but maybe we should talk on Mon or Tue before I leave?

from postmarks.

johnholdun avatar johnholdun commented on May 18, 2024

No prob! I'm going to write out my current thoughts on a plan here for pondering and/or additional input:

  1. Rework firstTimeSetup in activity-pub-db.js to run every time the server restarts (i.e. merge it with setup). All the schema details in there look like they're idempotent already; the key generation stuff isn't, but that's solved by part 2 of this plan. Part of this would probably be moving the existing CREATE TABLE IF NOT EXISTS statements to a new sql file, like schema.sql, which is read and executed by this function. Any additional changes can be appended to this file to be executed the first time the server is restarted after they're added, and ignored on subsequent runs. I can't reasonably think of a future migration that couldn't be handled by this process; a risk is that this file will get unmanageably long if there's a lot of churn on the schema, but there's always the opportunity for a clean break in the future, especially after point 2 is implemented.
  2. Rework the activitypub.db schema—instead of an actor table that only ever has one record, add a new table of generic key-value pairs to use for this (and future!) uses. I suggest calling it settings, and I imagine the columns of this table being called something like name and value, where name is a unique string and value is JSON. Each column of the accounts table can be ported to a row in that new table. This step is not strictly necessary for this issue's requirements, but we can take advantage of SQLite's upsert syntax to always try to generate a new key pair unless one exists, for example. (This table will also be useful for things like theme options, which are the real reason I wanted to claim this issue.) This table could also be used for things like feature flags and even schema migration generations, for rare instances where idempotent create or alter statements are insufficient.
  3. Rename activitypub.db to something like application.db, to make it more clear that it's storing everything except bookmarks. Not necessary, especially for this issue, but I think this database will continue to drift away from AP-specific data over time. I'm also not sold on bookmarks being in a separate database. I'm guessing this is for portability, but I think that adding some sort of export utility to create a portable bookmarks-only database on demand would be a better solution here for performance and general codebase complexity, especially considering that comments are also stored in there.
  4. Implement UI-driven controls for setting up actor details, potentially with a first-run-only setup wizard-type experience, to populate everything that's currently stored in account.json instead of actually requiring (or looking for) an account.json file. (Step 2 will also include reading data from account.json if it exists, but prioritize existing database values, to remain idempotent but also populate legacy data as needed, so upgrading users won't have to deal with this.)
  5. Now that all relevant data is backed by the database, build a UI and accompanying backend logic to satisfy the requirements of this issue. (I've actually already built the UI for the AP data and the theme settings lol)

This is a lot of detail and makes some assumptions about the future of this project, so as always please steer me elsewhere if you have a different vision. I can also break this down a little if it's too much to do all at once; each step could probably be its own self-contained chunk of work, so long as previous steps are already completed. There's also probably a lot of opportunities to add tests while implementing this stuff.

edited to amend some more database thoughts to points 1 and 3

from postmarks.

ckolderup avatar ckolderup commented on May 18, 2024

Thanks for writing all this up! It definitely feels like an inevitable transition of going from something that I tried to make "developer-friendly" to being...actually just friendly. It's also a lot of work, and I agree that we should find some way to break it into multiple steps. Quoting some specifics below to get into them:

Any additional changes can be appended to this file to be executed the first time the server is restarted after they're added, and ignored on subsequent runs.

Okay, so essentially a one-file version of the ActiveRecord migration directory pattern? How would we tell the server to ignore things on subsequent runs? Would we want to go full ActiveRecord style at that point and just have individual .sql files in a directory, and store the last-processed file in a db table? It feels like that would also be easy to write Github checks for-- don't allow branches to have edits to existing .sql files, only allow new ones?

Rework the activitypub.db schema—instead of an actor table that only ever has one record, add a new table of generic key-value pairs to use for this (and future!) uses.
[...]
Rename activitypub.db to something like application.db
[...]
I'm also not sold on bookmarks being in a separate database

I'm okay with moving away from separate databases-- it would simplify the code a lot, and the original strategy was mostly a stopgap to make it easy for me to copy bookmarks around between test instances. We've got CSV export as of #86 and will have CSV import once #23 is complete, which satisfies my "just bookmarks" data portability as a baseline requirement (porting activitypub-related stuff is kind of fraught since the canonical online version of the data is signed by the private key and tied to that particular actor name + instance version-- but we can discuss whether there are ways to improve that at some point in the future).

So yeah. I think all of these proposed DB changes would be a net good for the project. Order-wise, though, I think we might want to shuffle the order of work around a bit to smooth things out:

  1. Create a new application.db database file with a unified setup process that establishes the new schema + migration process with a barebones schema
  2. Create a process by which the setup process detects existing account.json + activitypub.db actor singleton tables and imports them into an application.db settings table with schema defined in the migration .sql file (or files).
  3. Make the application code read this info from application.db instead of account.json/activitypub.db. Add the admin settings to manage the relevant settings data in application.db to make account.json obsolete. Set things up so that on startup, if all the required settings table key/value pairs are present, the owner is notified that they can remove account.json from their filesystem.
  4. Ship previous steps.
  5. Encode the messages and permissions tables from activitypub.db as schemas in application.db using its migration functionality. Make the setup process detect an existing activitypub.db file and copy all data over from it to the new tables in application.db. Update the code to read from application.db for this data. Notify the owner on startup that they can remove activitypub.db.
  6. Ship that step.
  7. Repeat the activitypub.db obsoletion process from the previous PR, but for bookmarks.db
  8. Ship that step.
  9. Return to the two immediate benefits of this entire refactor, one being the Fediverse Identity stuff that this ticket is meant to address, and the other being the webpage customization stuff that I know you wanted to do-- setting theme colors for low-lift style changes, having a field for custom stylesheet file locations, etc.

My main hope of ordering things like this is to make it so that we only go through the set up for the whole schema/migration process for the new database, rather than doing it for databases that will be made obsolete soon, and (crucially imo) means that none of what we change actually writes to the two existing .db files. This means that if anyone runs into trouble they can always downgrade, report to us what happened, and try again when we've addressed their problems. Does that make sense?

from postmarks.

johnholdun avatar johnholdun commented on May 18, 2024

I like this order, and I think making sure that none of the new stuff ever even touches the old database files is really smart! I'll make some new issues for the sets of steps between shipping statements if you're cool with that.

By "ignored on subsequent runs," I meant that a statement like CREATE TABLE IF NOT EXISTS can be executed multiple times and never create the given table more than once. There are similar approaches for things like altering columns and adding indices. After a good number of these, the first run of a fresh instance with an empty database could have some weird churn—creating columns and immediately dropping them, that sort of thing—but I think the benefits of keeping all the schema statements in one long file outweigh it. If it ever gets way too cumbersome or there's a use case that simply isn't covered by this strategy, we could adopt a more sophisticate migration tool and this file could be migration 1.

from postmarks.

ckolderup avatar ckolderup commented on May 18, 2024

gotcha, thanks! I was aware of IF NOT EXISTS &c but I thought you were referring to some kind of higher-level strategy for skipping over existing changes. I'm fine with the simpler approach, though I guess I'm a bit concerned that it'll slow down the development process-- but I'll withhold my judgment until I see how the performance of that changes over time, and we can always add a start:nomigrate script or something (similar) that lets people skip the entire process if they're just standing the app up to test a frontend change or whatever.

from postmarks.

jeffsikes avatar jeffsikes commented on May 18, 2024

Going back to the profile management piece originally mentioned, something GoToSocial has implemented is the ability for each user to have custom css on their public facing profile page. That would solve an issue I'm having in regards to a docker build of Postmarks. I currently have to run a build every time I change the CSS. If it was stored as part of the profile (a database field) this would no longer be an issue.

Firefish has a similar option as well, tho it only affects that specific user's browser, not what others see.

Just a thought. Thanks for the consideration!

https://docs.gotosocial.org/en/latest/user_guide/custom_css/

image

from postmarks.

ckolderup avatar ckolderup commented on May 18, 2024

Yeah, this is interesting. There's some purist tendency in me that says that storing "code in the database" is scary/bad, which is why I was thinking "settings table key/value for repo fork's standalone CSS file path", but maybe that's being too precious. I can appreciate wanting to make this kind of change without a full build cycle.

from postmarks.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.