Giter Club home page Giter Club logo

stargazer-reborn's Introduction

Stargazer

Stargazer is a flexible vtuber tracker, reborn.

It's the second attempt to rewrite pystargazer. The first attempt is stargazer-rs, of which the development has been stalled for a while.

This new version attempts to utilize a more modern microservice architecture, and is meant to be stable and simple to develop and deploy.

License

This project is licensed under MIT License - see the LICENSE file for details.

Architecture

Note: this is subject to change during development process. For up-to-date graph, see arch.drawio.

2022-03-25T14:04:34Z.png

stargazer-reborn's People

Contributors

george-miao avatar dependabot[bot] avatar photonquantum avatar github-actions[bot] avatar

Stargazers

 avatar Yuuta Liang avatar  avatar

Watchers

 avatar  avatar  avatar

stargazer-reborn's Issues

API design

API Design

API will be in an RPC-like manner. It is used both for bot and frontend.

Terminology

Entity - Represent a VTB
session - A session that is valid in a period of time for users to update their settings and subscriptions, used to interact with API

Request

A Request is a method invocatio. Methods are in URL, and params are serialized into json and sent via request body.

Response

Data transferred from server to client.

struct Response<T> {
  success: bool,
  time: u64 // UNIX timestamp
  data: T,
}

Methods

General methods

  • getUser(user_id) - Get user information
  • delUser(user_id) - User chooses to remove their account
  • getEntities() - Get a table of all entities that can be subscribed
  • newSession() - Create a new session
  • checkSession(session_id) - Check if a session_id is valid
  • authMe - Check if a token is valid and corresponds to specified user_id
  • getSettings(session_id) - Get settings of an User
  • updateSetting(session_id, setting) - Update settings of an User
  • getSubscription(session_id) - Get subscription table of an User
  • updateSubscription(session_id) - Update entities table of an User
  • {new, get, update, del}Entity(entity) - CRUD an Entity, Admin only
  • getUserIndex() - Used to get an Index, which is discussed here

Transportation

All API calls will be transferred through the HTTP Post method, request info are encoded into JSON as HTTP body.

Switch from bincode to json

Bincode is a schemaful format. However, our Value depends on the format itself to provide type info. We may migrate to json. And we may use serde_json::Value instead of our hand-made format.

Entity schema

Design entity schema of mongodb and write basic binding code.

Authentication

Background

There are three centralized component in this system: api, mq, coordinator. We need to add an authentication mechanism to ensure no unauthorized entity may access the resources.

Design

Each client must provide a valid user:pass pair when connecting to any central component after a secure connection is established (through tls). The credential contains a set of permissions of the client. A permission set is represented as a partial function whose domain are central components and codomain are read-only and read-write.

Example:

{
    "API": "ro",
    "MQ": "rw"
}

Note that the example client has read-only permission to API, read-write permission to MQ, and no permission to coordinator.

Bots are supposed to have read-write permission to API and read-only permission to MQ.

Workers are supposed to have read-only permission to coordinator.

Middlewares are supposed to have read-write permission to MQ.

Implementation

Database Schema

Add a new collection auth to MongoDB. Its schema is defined as follows:

{
    "username": <string>,
    "hash": <pbkdf2 derived key>,
    "permissions": {<permissions map>}
}

Only API, coordinator, and rmq auth server have read-only access to this collection.

Authentication Crate

A new crate is implemented to query the db with given credential and return granted permissions.

Coordinator & API

Credentials are attached through HTTP Basic Authentication. Only requests with proper permissions can be accepted and processed.

RabbitMQ

An rmq auth server is implemented to integrate the authentication mechanism into RabbitMQ.

See https://github.com/rabbitmq/rabbitmq-auth-backend-http.

Switch to stable Rust

Currently we are relying on arc_cyclic which is unstable now. It's already implemented in nightly and will be stablized in Rust 1.60. Once 1.60 is released, switch the codebase to Rust 1.60.

Frontend

For now, frontend will be written in Next.Js

And one thing to mention, there will be two different deployment for Mainland China and other areas, for known issue. Currently we decide to simply put all QQ users to Mainland deployment and others will be using the global deployment, which will be on Vercel. In future we may need a "clever" way such as load balancer to redirect user to the correct deployment based on GeoIP.

Worker management

Assign entites to workers using consistent hasing (m:n), kick dead workers out, persist across short restarts.

MongoDB cannot deserialize `Entity` correctly

Minimum reproducible case

#[tokio::test]
async fn test_fetch_entity_from_db() {
    let client = Client::with_uri_str("...")
        .await
        .unwrap();
    let db = client.database("...");
    let col = db.collection::<Entity>("...");

    col
        .find_one(doc! { "meta.name.name": { "$gt": {} } }, None)
        .await
        .unwrap();
}

Error

thread 'server::context::test_fetch_entity_from_db' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: BsonDeserialization(DeserializationError { message: "invalid type: string \"en\", expected enum LanguageCode" }), labels: {}, wire_version: None }'

When meta.name.name is empty, this can be deserialized correctly.

In DB

Data in db:

> db.entities.find({ "meta.name.name": { $gt: {} } }).pretty()
{
	"_id" : ObjectId("623fa43e3e7438502c11c416"),
	"id" : UUID("b40c53a7-742a-4493-9bcf-b6231143666d"),
	"meta" : {
		"name" : {
			"name" : {
				"en" : "Test"
			},
			"default_language" : "en"
		},
		"group" : null
	},
	"tasks" : [
		UUID("1f715edb-1cf0-4c4b-9d94-03787eecf92b")
	]
}

Inconsistency of `uuid::Uuid` and `bson::Uuid`

Currently, there are two UUID types:

  • One from crate uuid
  • One from crate bson

, where bson::Uuid is a wrapper on uuid v4, provides binary compatibility for serializing and deserializing function to bson, and thus interactable with MongoDB.

Our core types are all defined with bson uuid, where API interfaces should be using bare uuid types. However, several API methods return core types directly, which includes bson uuid as a field. Normally, this will not cause any problem since besides serializing into bson, bson::Uuid has the same behavior as uuid::Uuid. But API crate will also be used as an SDK for interacting with the API. Now on the clientside, some of the methods return bson::Uuid and others return uuid::Uuid which is confusing and inconsistent. We will need a solution for this. Temporarily, I have changed all return types of API to use bson::Uuid.

Switch to stable MongoDB crate

The change stream feature only exists in git repository now. Switch to the crates.io version once a new MongoDB version is released.

Database Design

Currently we need three tables:

  • Meta for VTB meta info
  • Tasks for tasks that can be distributed from coordinator to workers
  • User for IM subscribers

Middleware design

Background

Previously when we were developing pystargazer, a middlebox based translation framework was proposed. The general idea was, we might tag a certain event/message to be 'translated', then a middlebox could receive this event and translate it, then put it back to the queue with the message body translated.

There's another use case now. Youtube live event should be scheduled to be sent at a specific time. In pystargazer, we spawn a scheduled task internally. Then we timely persist pending tasks into disk. However, stargazer-reborn now takes a microservice approach, and this internal scheduling implementation is considered obscure and hard to maintain.

Concept

A message can be tagged to be processed by a middleware, and is put into the mq. A message is considered to be not ready if there's any middleware tag attached on it. An IM adapter won't process any message that is not ready.

A middleware receives message tagged with its identifier. It process the message, removes the tag, and put the message back into the mq.

Implementation

  1. We change the main exchange type to topic.
  2. Normal messages conform with the * pattern.
  3. Tags are attached after the original message topic with the pattern #.{identifier}, such as foo.delay and bar.delay.translate.
  4. Special fields are required to pass parameters to middleware. Such as x-delay-at, x-translate-from.

Potential middleware

  1. translate - call deepl to translate twitter body
  2. delay - delayed message (capable of removing previously scheduled message by x-delay-id.
  3. youtube-live-check - delay until x-channel-id live is started.

Youtube worker

Register callback to youtube websub api. Push event to mq once an event is received. There are multiple types of events (e.g., on_live, on_schedule). Scheduling events is not under consideration and should be handled by another middleware.

Coordinator

The coordinator should have access to the entity backend, book-keep different kinds of workers, and schedule entities to workers.

It should keep a connection to workers, and check their health states.
It should use consistent hashing to distribute entities (almost) evenly to workers, and react to entities & workers change, by asking workers to take or abandon specific entity.
It should be stable across restarts and persist last time state at least for one heartbeat cycle.

API

API for management.

For more info see #30

Method implementation

  • getUser - Get user information
  • addUser - Create new user
  • delUser - User chooses to remove their account
  • getEntities - Get a table of all entities that can be subscribed
  • newSession - Create a new session
  • authMe - Check if a token is valid and corresponds to specified user_id
  • updateSetting - Update settings of an User
  • {new, get, update, del}Entity - CRUD an Entity, Admin only

Bot design

Several issues.

  • How will bot receive events, by MQ?
  • Are bots going to interact with the DB that stores session info directly, or should it interact with API?
  • If they do, should we make session related code into a separated submodule?
  • If they don't, does that mean bots will not depend on mongodb, and we should also write a API client?

Coordinator panic in certain conditions

Assume there are some tasks to be scheduled.

When a worker enters the group, the coordinator schedule a task onto this worker. Later, this worker exits, and there's no worker available. Then a worker enters again. Now the coordinator will panic because of inconsistent state.

Problem: didn't set task bound worker to None when the last worker in the group exits due to balance_impl: if ring.empty: return.

Solution: if ring.empty: for task in tasks: task.bound_worker = None.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.