thematters / matters-server Goto Github PK
View Code? Open in Web Editor NEWServer code for Matters.Town
Home Page: https://server.matters.town/playground
License: Apache License 2.0
Server code for Matters.Town
Home Page: https://server.matters.town/playground
License: Apache License 2.0
Is your feature request related to a problem? Please describe.
Display (human-readable) texts for OAuth scope are hard coding at client now. It's hard to maintain.
Describe the solution you'd like
Move to matters-server and use field-level directive or a single file to declare related texts.
Previous permissions:
2 * appreciation + read >= 30
New permissions:
2 *appreciation + read >=10
Currently we are caching translation result for 10 days. This is wasting API calls, since we only need to call detect language once for each article, and call translation once for each language of each article.
We also have the issue of high volume concurrency. If we update cache key schema or cache service, we will have a high volume of translation API calls, which result in many (403) User Rate Limit Exceeded
errors. Although we haven't surpass GCP quotas, it looks like GCP have some hard limit in API call frequency peak. Since we also query Article.language
to determine whether we should show translation button or not, it also slows down SSR when hitting rate limit.
A better long term strategy is that we store language
in database, Translation.title
and Translation.content
in database or even s3.
We can probably progress in several steps according to our need:
Article.language
, so that query returns null and do not block SSR, but still fires API call and fill cache storageWe have discovered that Matters IPFS public gateway was used to pirate video files, and will therefore temporarily remove it.
Is your feature request related to a problem? Please describe.
Server cannot be decoupled from ElasticSearch with this error.
Describe the solution you'd like
As title.
This issue involves:
To make home page looks better, tag description is required for tags presented in home page.
Our GraphQL schema has grown very large, maintenance and optimization are needed. With a better partition between public data and private data, we should be able to fit the need of JAMstack pattern on frontend, simplify private cache pattern on backend, make it easier for developers to start using after we open source, better support future iterations on follow page, and much more.
We can separate public and private data into different types. For example, each Node
type can have a viewer
field, which holds the corresponding type for private/viewer data. Therefore Article
type can be:
Article {
...
viewer: ViewerArticle
}
ViewerArticle {
isBookmarked
appreciationCount
appreciationLeft
...
}
User
type can be:
User {
...
viewer: ViewerUser
}
ViewerUser {
isFollower
isFollowee
isBlocked
...
}
Comment
type can be:
Comment {
...
viewer: ViewerComment
}
ViewerComment {
isCollapsed
...
}
In this way, we do not need to keep a special keyword for CSR on frontend as proposed in thematters/matters-web#1051 (comment), but need to assemble Viewer${NodeType}
fragments with client side query. This could be a cleaner logic, and improves cache hit rate and share rate.
We can also separate private and public data under different root fields, so that we can apply different auth and cache patterns. For example, we can group all private data under viewer
and all public data under system
, and the schema could look like:
query: {
system: {
node(input: {
id
}): Node
article(input: {
mediaHash
dataHash
}): Article
user(input: {
userName
}): User
feeds: {
icymi: ArticleConnection
hottest: ArticleConnection
...
}
},
viewer: {
// followers feed
feeds {
// follower publish feed
articles: ArticleConnection
// follower comment feed, group by article and user
discussions: {
...
edges {
// comment on which article
article: Article
// the grouped comments
comments: CommentConnections
cursor: String
}
}
// follower donation feed, group by article
donations: {
...
edges {
article: Article
users: UserConnection
cursor: String
}
}
}
setting {
language
...
}
status {
...
}
}
}
Visitors and SSR would only need to query system
root field. If we want to be safe and strict, we can even enforce on the backend that queries of type Viewer${NodeType}
and viewer
root field only return when fetched from client.
There should also be other optimization patterns we can apply.
Very curious!
As title.
In one of new user tasks, there will be two new feed types popular tags
and selected tags
for new registered users to follow.
Due to needs of analytics, we need to design a way to store information of card exposure.
An user should automatically follow a tag when s/he:
Currently, tags are presented in tag owner's profile page. To improve user experience, tags that user is helping on maintaining will be displayed in profile tag feed as well.
Describe the bug
Under certain condition, user can appreciate more than 5 times.
Expected behavior
User should not be able to appreciate more than 5 times.
Additional context
We are currently updating appreciation record to database directly. But we should use queue instead to avoid concurrency, and perform database checking before updating, similar to donation and withdraw.
DataServices such as userService.ts
and articleService.ts
have a lot of code in a single file, and as our business logic grows, they become difficult to maintain.
Based on Protecting Amazon S3 Against Object Deletion, we can:
We have been delivering one single version of SSR webpages on CDN, and certain user group refetch data for A/B test. This has slow response time. We can instead write user group in cookie, differentiate user group by cookie in CDN, and implement A/B test behind GraphQL resolvers.
group
to cookie: https://github.com/thematters/matters-server/blob/develop/src/common/utils/cookie.ts#L22cookie.group
on CDN for certain pages (currently only homepage)group
from cookie on server: https://github.com/thematters/matters-server/blob/develop/src/common/utils/getViewer.ts#L31recommendation.hottest
resolverIt seems there are two notice types have logic inconsistency. Below are possible problems based on my understanding:
Before queue trigger inserts notice data into DB, it generates different data objects according to different notice types. Like this:
-------------------------------------------------
File: src/connectors/notificationService/index.ts
-------------------------------------------------
private getNoticeParams = async (params): Promise<any> => {
switch (params.event) {
case 'user_new_follower':
case 'comment_new_upvote':
return {
type: params.event,
recipientId: params.recipientId,
actorId: params.actorId
}
...
}
In above, there is no entities in comment_new_upvote
's returned object. Howerver, entities are required in the query API and they will be displayed in notice digest:
-------------------------------------------------
File: src/common/utils/notice.ts
-------------------------------------------------
const actorsRequired = {
...
}
const entitiesRequired = {
comment_new_upvote: true
...
}
export const filterMissingFieldNoticeEdges = (params): any => {
return edges.filter(({ node: notice }) => {
const noticeType = notice.type
...
// check entities
if (entitiesRequired[noticeType] && _.isEmpty(notice.entities)) {
return false
}
...
return true
})
}
So, there is logic inconsitency in it. Except that, it causes another problem. Since comment_new_upvote
are always filtered out of query result and state still stay unread
, our notice service will try to bundle all comment_new_upvote
as one even upvote targets are different. π€
Pretty similar to previous case:
private getNoticeParams = async (params): Promise<any> => {
switch (params.event) {
case 'comment_pinned':
return {
type: params.event,
recipientId: params.recipientId,
entities: params.entities
}
...
}
Our filter will kick comment_pinned
out of query result becasuse of lacking actors
:
const actorsRequired = {
comment_pinned: true
...
}
export const filterMissingFieldNoticeEdges = (params): any => {
return edges.filter(({ node: notice }) => {
const noticeType = notice.type
// check actors
if (actorsRequired[noticeType] && _.isEmpty(notice.actors)) {
return false
}
...
return true
})
}
And I found we tried to get actor
from comment data so that we don't have to record actor
in DB:
-------------------------------------------------
File: src/queries/notice/index.ts
-------------------------------------------------
...
CommentPinnedNotice: {
id: ({ uuid }) => uuid,
actor: ({ entities }, _: any, { dataSources: { userService } }) => {
const target = entities.target
return userService.dataloader.load(target.authorId)
},
target: ({ entities }) => entities.target
},
...
But the target
here is the comment
, so target_authorId
is the comment creator instead of article
author. In notice digests, they look like comment authors pinned their own comments. π
In one of new user tasks, there will be three new feed types most trendy
, most appreciated
and most active
for new registered users to follow.
Describe the bug
Search API should only record searches in search bar, but is currently recording every search including @ user, connect articles and add article to tag.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Queries from @ user, connect articles or add article to tag should not be included in search history
Additional context
Search API input should include record: boolean
to notate if the current search should be recorded, and frontend should use it accordingly.
In the new design of comment section, 2nd level comments are bundled together based on reply relationships.
We can view the newly added structure as sections within subthreads, or as subthreads of subthreads that are flattened into the same level (level 3 flattened into level 2).
The structure on the left has subthreads with section, so subthreads are lists of comment list. It is more similar with the UI layout.
The structure on the right is fractal, with subthreads having their subthreads. It is more similar with the actually relationship of comments, which is a directed acyclic graph.
With the mental model on the right, we won't need to update our API, since Comment
type has comments
field. But we need to either resolve in API all remaining comments at level 3, or recursively call child comments of comments on frontend util no more comments are returned.
They both have pros and cons, but I think the right one is more concise and closer to the actual relationship of data.
After checking queries listed in card, there are some difficulties on tuning cache. For instance:
query {
user {
articles {
title
isSubscribed
}
}
}
Most fields in the query could be public
, but personalized data isSubscribed
makes entire response private
. Besides that, our digest components like ArticleDigest
, UserDigest
and Comment
more or less include personalized data. More examples:
query {
comment {
author {
name
isBlocking
}
}
}
and
query {
user {
name
isFollower
}
}
As you see, it's quite hard to separate data like Author
from Article
, Comment
and Response (Article | Comment)
in queries we have now. And it also a little bit conflicts GQL philosophy if we try to separate
those fields. π€
Probably, increase TTL might be one temporarily solution for now ?
It's better if we merge our changes for cache plugin to apollo official repo, so we don't have to maintain and keep track. Related discussions: apollographql/apollo-server#3228 apollographql/apollo-server#2437
We might also want to separate cache directives into a separate repo.
We need to reject certain operations if the frequency is higher than given threshold. This might not be useful for preventing spam in general, but is still good practice to mitigate attacks.
Operations currently under discussion include:
type Article {
...
content: JSON!
...
}
Due to content
is require index.html
, use custom scalar (ContentJSON
) to restrict may be better.
Currently, Tag only support Article:
type Tag {
text: String
count: Int
articles: [Article]
}
Our pagination is based on offset
and limit
, but the performance of it will get worse when records accumulated. Because:
offset
always shifts from scratch. (page 1 -> page 2 -> page 3)Take follower as an example:
findFollowers = async (...) =>
this.knex
.select()
.from('action_user')
.where({ targetId, action: USER_ACTION.follow })
.orderBy('id', 'desc')
.offset(offset)
.limit(limit)
In this example, users can follow multiple users so it accumulates records quickly. Now we have 197,989
records, and CPU usage of the query usually stays on Top 5. π€¦π»ββοΈ
In order to fix this, we need to change how we do pagination. An easier solution is to use id
as cursor:
findFollowers = async (...) =>
this.knex
.select()
.from('action_user')
.where({ targetId, action: USER_ACTION.follow })
.andWhere('id', '<', id)
.orderBy('id', 'desc')
.limit(limit)
In this query, the order of id
exactly matches the order of followers. So, we can take advantages of comparing id
:
id
(a number) can have more stable and consistent performance than using offset
.id
(pk or index) to search reversely.I've used PG's explain to analyze, and the cost looks ok. Once I commit the new query, we can observe the result. At the same time, we might need to think rest of queries that cannot apply this fix case.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Hi, I want to make an application for users of matters to manage their article at matters.news, could I get a oauth2 application for user to authorize their article scope for me? For now, my needs is only to read the authorized user's article.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
For now, may be I can get the articles of user through scraping the graphql api, but I don't think it's a good use case.
Additional context
Add any other context or screenshots about the feature request here.
thanks!
We have lots of images uploaded by users: avatars, profile covers, article embedded images, and etc. Based on Lighthouse Report, images have big negative impact for performance score:
In general, there are three sides to be optimzed:
Currently, we will do simple image processing (compressing & resizing) in connectors/aws
, while image uploading. But there are cons:
Lambda To Rescue!
With AWS Lambda, processing image will be asynchronously and separately. There are two ways to implement:
Use Serverless Image Handler, process image on client requests, and cached by CDN.
Pro & Cons:
avatar
: raw, 144wembed
raw, 1080w, 540w, 360w, 144wprofileCover
: raw, 1080w, 540w# AWS S3
/matters-server-stage
βββ 1080w
βΒ Β βββ uuid.jpeg
βΒ Β βββ uuid.webp
βββ 540w
βΒ Β βββ uuid.jpeg
βΒ Β βββ uuid.webp
βββ 360w
βΒ Β βββ uuid.jpeg
βΒ Β βββ uuid.webp
βββ 144w
βΒ Β βββ uuid.jpeg
βΒ Β βββ uuid.webp
βββ uuid.jpeg
<!-- <ArticleDetail.Content> -->
<figure>
<picture>
<source type="image/webp" media="(min-width: 768px)" srcset="https://xxx.cloudfront.net/embed/1080w/uuid.webp" alt="...">
<source type="image/webp" srcset="https://xxx.cloudfront.net/embed/540w/uuid.webp" alt="...">
<source media="(min-width: 768px)" srcset="https://xxx.cloudfront.net/embed/1080w/uuid.jpeg" alt="...">
<img src="https://xxx.cloudfront.net/embed/540w/uuid.jpeg" alt="...">
</picture>
<figcaption>...</figcaption>
</figure>
<!-- <ArticleDigest.Cover>, View Mode = default -->
<picture>
<source type="image/webp" media="(min-width: 768px)" srcset="https://xxx.cloudfront.net/embed/1080w/uuid.webp" alt="...">
<source type="image/webp" srcset="https://xxx.cloudfront.net/embed/540w/uuid.webp" alt="...">
<source media="(min-width: 768px)" srcset="https://xxx.cloudfront.net/embed/1080w/uuid.jpeg" alt="...">
<img src="https://xxx.cloudfront.net/embed/540w/uuid.jpeg" alt="...">
</picture>
<!-- <ArticleDigest.Cover>, View Mode = compact -->
<picture>
<source type="image/webp" media="(min-width: 768px)" srcset="https://xxx.cloudfront.net/embed/360w/uuid.webp" alt="...">
<source type="image/webp" srcset="https://xxx.cloudfront.net/embed/144w/uuid.webp" alt="...">
<source media="(min-width: 768px)" srcset="https://xxx.cloudfront.net/embed/360w/uuid.jpeg" alt="...">
<img src="https://xxx.cloudfront.net/embed/144w/uuid.jpeg" alt="...">
</picture>
<!-- <UserProfile.Cover> -->
<picture>
<source type="image/webp" media="(min-width: 768px)" srcset="https://xxx.cloudfront.net/profileCover/1080w/uuid.webp" alt="...">
<source type="image/webp" srcset="https://xxx.cloudfront.net/profileCover/540w/uuid.webp" alt="...">
<source media="(min-width: 768px)" srcset="https://xxx.cloudfront.net/profileCover/1080w/uuid.jpeg" alt="...">
<img src="https://xxx.cloudfront.net/profileCover/540w/uuid.jpeg" alt="...">
</picture>
<!-- <Avatar> -->
<picture>
<source type="image/webp" srcset="https://xxx.cloudfront.net/avatar/144w/uuid.webp" alt="...">
<img src="https://xxx.cloudfront.net/profileCover/144w/uuid.jpeg" alt="...">
</picture>
[1] We do had set a long cache TTL in CloudFront, but LightHouse doesn't think so.
[2] https://css-tricks.com/responsive-images-css/
[3] https://dev.to/jsco/a-comprehensive-guide-to-responsive-images-picture-srcset-source-etc-4adj
[4] https://css-tricks.com/using-webp-images/#article-header-id-3
Here are flows for preventing concurrent transactions to Postgres:
Couple things to be clarified since flows are in sync style:
Any supplementation to those flows? @robertu7 @guoliu
cc @gyuetong
Is your feature request related to a problem? Please describe.
We are currently using Knex.js without an ORM. This has provided flexibility in the beginning, but as our database schema grows more and more complex, we need a cleaner model for database objects.
Describe the solution you'd like
We can migrate to Prisma, which has matured in the past few years. It automates most of the mapping between GraphQL schema and database schema, which will make our codebase much cleaner.
However, Prisma Migrate for database migration is still experimental. We can take some risk and try it on, or keep knex-migrate for migration and then do introspection separately (see example).
Since Knex is a query builder and Prisma is similar to an ORM, there need to be a change in design pattern. We need to decide what migration method we want to use, and what would be the easiest path switching to Prisma.
Additional context
Related to #897
Is your feature request related to a problem? Please describe.
Currently two home article feeds, γη±ιγandγη±θ°γ, have vague design goals and significant overlap with each other. γη±ιγshould be a collective choice for articles that are worth reading at a given moment, and γη±θ°γshould be a collective choice for articles that are worth discussing at a given moment.
Forγη±ιγ, in the past we do not have a direct measurement of reading time. Now that we are recording read time, we can move away from appreciations/Likes, which does not have any cost for the actor and signifies many different things such as friendship, support or greetings, and move towards read time and donation, the former being a direct measurement of reading and the later with cost and therefore resilient to spams. We can also start recording impressions, which are the times an article card appear to our user, to calculate the efficiency of ready given a number of impressions.
γη±θ°γstill requires more discussions. A general direction might be more focus on number of participates, number of votes on comments, or different ways of measuring weights of commenters.
We need to display certain data visualizations for current and potential users, and need to evaluate if it is secure to embed metabase public dashboard as iframe. Another option is to display visualization as static image, which is secure but will involve manual updates.
After first glance of current open security issues on metabase, it seems that none is related to public dashboard, but further evaluation is still needed.
Some measures we can take:
data.matters.news/about
and data.matters.news/community
?)refs:
[1] https://blog.apollographql.com/securing-your-graphql-api-from-malicious-queries-16130a324a6b
[2] https://codeburst.io/use-custom-directives-to-protect-your-graphql-apis-a78cbbe17355
Pros
Cons
Discussion context: https://mattersnews.slack.com/archives/CF78WGNNM/p1582930152004800
We need a mechanism to reduce re-registration of banned users. We still need to confirm with lawyers for legal regulations, but here's some initial ideas.
When a banned user logon, backend record ip, canvas fingerprint and email in blacklist table. Fingerprint can be passed to backend either through a mutation or through a particular header.
In this way, we don't have to track the fingerprint of every user, but only for the banned user.
When requesting verification code during registration, frontend also send canvas fingerprint to backend. If backend finds a match in fingerprint, ip or email, it adds the other two to blacklist and decline sending verification code.
In this way, a banned user is likely to inherit his/her banned state across agents. For example, when a banned user use the same browser or ip to register with a new email, verification code is not send and new email is also banned; when he/she changed browser and ip to try again, the email is matched and the new ip and canvas fingerprint is also banned. This makes it harder to crack.
Since we're refactoring structure of draft and article data, we have to reorganize and cleanup their assets.
@proformatters reported that transaction page displays duplicate transaction with different state.
We will implement article edit functionality on the frontend. On the backend, we need to implement a minimal version control. The idea is using draft
table as versions, and article
table as pointer pointing to the newest version.
Current article doesn't record draft id when publishing. We need to backfill these ids before implementing version control of article.
As LikeCoin's requests, we will pass X-LIKECOIN-USER-AGENT
in HTTP header while calling the LikeCoin API
To simplify registration process, we will provide an email attached a link in order to confirm and activate THE following steps.
We will need:
Since we are reshaping our feature from calling Medium API to uploading Medium source files, couple things need to discuss.
The packed files downloaded from Medium is quite big because it has lots of irrelevant files. Below are unpacked files:
βββ blocks
βΒ Β βββ blocked-users-0001.html
βΒ Β
βββ bookmarks
βΒ Β βββ bookmarks-0001.html
βΒ Β
βββ claps
βΒ Β βββ claps-0001.html
βΒ Β
βββ highlights
βΒ Β βββ highlights-0001.html
βΒ Β
βββ interests
βΒ Β βββ publications.html
βΒ Β βββ tags.html
βΒ Β βββ topics.html
βΒ Β βββ writers.html
βΒ Β
βββ ips
βΒ Β βββ ips-0001.html
βΒ Β
βββ posts
βΒ Β βββ 2018-04-02_-----------Arendt----51bc52c880f3.html
βΒ Β βββ 2018-04-12_-----------------3a905851316e.html
βΒ Β βββ 2018-04-12_------------1253c94fa6ac.html
βΒ Β βββ 2018-05-31_Matters--------------ae9f9aa98249.html
βΒ Β βββ 2018-11-10_-Matters--------------70c1ab6d47e2.html
βΒ Β βββ 2019-03-22_Matters-------------------6dc72e6753f9.html
βΒ Β βββ 2019-03-27_---------------------c4336ab683df.html
βΒ Β βββ 2019-04-02_------------------12bdf59fe4a9.html
βΒ Β βββ 2019-04-24_----------------------2fd7c25b0934.html
βΒ Β βββ draft_nn-813be4d2bd80.html
βΒ Β
βββ profile
βΒ Β βββ memberships.html
βΒ Β βββ profile.html
βΒ Β βββ publications.html
βΒ Β
βββ pubs-following
βΒ Β βββ pubs-following-0001.html
βΒ Β
βββ sessions
βΒ Β βββ sessions-0001.html
βΒ Β
βββ topics-following
βΒ Β βββ topics-following-0001.html
βΒ Β
βββ users-following
βββ users-following-0001.html
βββ users-following-0002.html
As you see, there are some user information and settings. All we need is posts
folder. Comments are considered as posts from Medium view, so comments are also packed in posts
folder. Do we want user to upload one package? Or just upload real posts
by picking ?
Possible process flows are here:
Based on current design, uploading packed files would be the easiest way for user but not for us. Also, some comments' contents will be listed in drafts. In opposite, uploading multi-files would be the simplest way for us and we could get the right uploaded files (real posts), but users might need to drop files couple times.
FYI, editor has an upload button on the right sidebar.
Love to hear you guys ideas π§π»βπ»
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.