rudderlabs / rudder-server Goto Github PK
View Code? Open in Web Editor NEWPrivacy and Security focused Segment-alternative, in Golang and React
Home Page: https://www.rudderstack.com/
License: Other
Privacy and Security focused Segment-alternative, in Golang and React
Home Page: https://www.rudderstack.com/
License: Other
We are initializing bugsnag from our main package. This should be via abstraction and should be able to write custom plugins for Sentry, NewRelic etc.
Using the Docker setup instructions here:
https://github.com/rudderlabs/rudder-server#setup-instructions-docker
docker-compose up --build
throws this error
ERROR: Version in "./docker-compose.yml" is unsupported.
for me with docker-compose version 1.17.1
changing the version from "3.7" to "3" in
rudder-server/build/docker-compose.yml
allows the build process to continue.
Right now, backend polls the config-backend every n secs and forwards the latest config to all subscribers (eg. router, processor). We would eventually want to move to sockets to handle the changes instead of polling.
backend-config module should only notify changes to subscribers. It should also expose an API to return the complete config. This API would help subscribers who are coming online for the first time or if it missed handling some changes.
Currently, we support S3 as a destination for events. We should extend the support MinIO.
Following the Docker Setup Instructions will produce a docker image for backend that will crash due to sql.NullTime
not existing.
It looks like sql.NullTime
was added in golang 1.13, but the build/Dockerfile-dev
is using golang 1.12.
I was able to get the backend to run by switching the base image to golang:1.14-alpine
in build/Dockerfile-dev
. It does get it running on my local machine, but don't know if everything is fine with a newer version of golang.
Hello there!
It's a bit counter-intuitive to me that self-hosting this application would require creating an account on the project's web page. I'm wondering whether is a method for running the rudder server without an account-based workspace token, or otherwise to know what the purpose of this token is. The docs don't really explain its purpose, but they all state that it's a required part of setup.
Keeping in mind that I haven't read the code yet nor do I have a specific example of a failure mode, the need for an externally-generated key makes me a bit apprehensive about placing potentially sensitive information through the system. Just to give an idea of where my concerns are coming from.
Thanks for any clarification!
Segment's documentation recommends setting an Authorization header
rudder-server/gateway/gateway.go
Line 147 in b834143
However, there is at least one other way to supply a write key that is supported by Segment's API.
Supplying a write key in the writeKey
property of the event body will register as a valid event of the specified type. The writeKey in the event body will override the write key in the Authorization header if present.
Also... Each item in a Batch may contain a different write key from its parent. iirc, Segment will use the batch write key by default and overrides with event specific write key for each event if present.
This API has been designed for maximum interoperability; best to make all possible accommodations.
We never know who might have rolled their own client relying on some quirk like this (they exist).
in misc package GetIPFromReq returns the ip address unchanged. strings.Replace method doesn't replace the value by reference but returns a copy of a string.
Hi !
I started using the rudder server locally but when adding the BigQuery destination I noticed that the "Location" configuration isn't being used ? (My DataSet is always created in the US)
When looking at the code I noticed that this location configuration wasn't used ? I've made a fix to create the Dataset with the correct location using this configuration so I could open a pull request with this change ?
Thanks :)
Need to fix this
https://github.com/segmentio/analytics-go
Hi,
I wanted to give rudder a try with our hosted postgres db (via DigitalOcean) and we need to connect with sslmode=require. Currently it is hardcoded to disable at jobsdb.go
Would it be an idea to support JOBS_DB_SSLMODE and default it to disable?
If I'm planning on sending in data via HTTP, what is the correct way to configure my source?
Currently, we support S3 as a destination for events. We should extend the support to Azure blob.
Hi, I was wondering if there were plans to support TimescaleDB. TimescaleDB is implemented as a PostgreSQL extension, so it might be a matter of tweaking your PostgreSQL connector to be TimescaleDB-aware. It looks like this is the approach Grafana may have taken.
UI when adding a PostgreSQL sources in Grafana
Thanks!
Has anyone tried?
The usual thing that breaks with postgresql compatibility is triggers.
Our current dumping to files only works if the server is running on the same machine as DB.
Fix the query to read the data into memory and write it back to a local file efficiently.
Hi, events with a null userId
are being stored in Snowflake with the string value <nil>
rather than the null
value.
Could we configure the hint ("us-east-1") via env/config or even workspaceconfig?
Anonymous ID is appearing null when I am making track call using Analytics.Net code
I am using source code from - https://github.com/segmentio/Analytics.NET
Here if I am not passing this parameter then it is creating a new object at runtime and that have anonymous ID as null - https://github.com/segmentio/Analytics.NET/blob/master/Analytics/Model/BaseAction.cs#L34
Also the SDK is not setting this Anonymous ID
I am using it at server side (.Net Application)
@Team Please let me know by when I can expect it to be resolved
Hi,
I am trying to run the application supported with aws rds managed database. I verified that the aws postgres database is accessible on my machine but when I tried to pass the db endpoint in the environment variable of rudder-docker.yaml. It doesn't seems to work.
entrypoint: sh -c '/wait-for aws-rds-postgres.amazonaws.com:5432 -- /rudder-server'
ports:
- "8080:8080"
environment:
- JOBS_DB_HOST=aws-rds-postgres.amazonaws.com
- JOBS_DB_USER=rudder
- JOBS_DB_PORT=5432
- JOBS_DB_DB_NAME=jobsdb
- JOBS_DB_PASSWORD=password
Please suggest what I need to do in order to run it in docker or kubernetes?
Hi, I have a Java source (using rudder-sdk-java
) connected to an Amplitude destination, and I'm getting the following error in the destination:
Source ID | Attempt No. | Job State | Error Code | Error Response |
---|---|---|---|---|
1fIIb8hISkVlmckcwy54umPI0A0 | 1 | aborted | 400 | { "error": "Cannot read property 'name' of undefined" } |
We are using GinkGo for integration tests.
Right now, we have different test suites for the following:
We need to add a test to see if MinIO destination is working as expected.
Try to deploy rudder by using this tutorial: https://docs.rudderstack.com/get-started/installing-and-setting-up-rudderstack/docker
On docker-compose up -d got error in backend service:
--
-- wh_schemas
--
CREATE TABLE IF NOT EXISTS wh_schemas (
id BIGSERIAL PRIMARY KEY,
wh_upload_id BIGSERIAL,
source_id VARCHAR(64) NOT NULL,
namespace VARCHAR(64) NOT NULL,
destination_id VARCHAR(64) NOT NULL,
destination_type VARCHAR(64) NOT NULL,
schema JSONB NOT NULL,
error TEXT,
created_at TIMESTAMP NOT NULL);
DROP INDEX IF EXISTS wh_schemas_source_destination_id_index;
CREATE INDEX IF NOT EXISTS wh_schemas_destination_id_namespace_index ON wh_schemas (destination_id, namespace); (details: read tcp 172.21.0.5:52578->172.21.0.4:5432: read: connection reset by peer)
* driver: bad connection in line 0: SELECT pg_advisory_unlock($1)
goroutine 27 [running]:
github.com/bugsnag/bugsnag-go.AutoNotify(0xc0005a5d60, 0x3, 0x3)
/codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/vendor/github.com/bugsnag/bugsnag-go/bugsnag.go:109 +0x2bc
panic(0x1449880, 0xc0001490c0)
/root/.goenv/versions/1.13.8/src/runtime/panic.go:679 +0x1b2
github.com/rudderlabs/rudder-server/rruntime.Go.func1.1(0x1a22ea0, 0xc00023acf0)
/codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/rruntime/goroutine-factory.go:37 +0x33a
panic(0x1449880, 0xc0001490c0)
/root/.goenv/versions/1.13.8/src/runtime/panic.go:679 +0x1b2
github.com/rudderlabs/rudder-server/warehouse.setupTables(0xc000678000)
/codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/warehouse/warehouse.go:1491 +0xfb
github.com/rudderlabs/rudder-server/warehouse.Start()
/codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/warehouse/warehouse.go:1664 +0x119
main.startWarehouseService(...)
/codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/main.go:153
main.main.func5()
/codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/main.go:352 +0x21
github.com/rudderlabs/rudder-server/rruntime.Go.func1(0x170eba0)
/codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/rruntime/goroutine-factory.go:40 +0x81
created by github.com/rudderlabs/rudder-server/rruntime.Go
/codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/rruntime/goroutine-factory.go:26 +0x3f
Batch timeout logic measures time between events rather than time between batches.
As a result, the default configuration of maxBatchSize=32
and batchTimeout=20ms
can add up to 600ms of latency to a request.
Example: https://play.golang.org/p/nKCpiwIMxFs
See gateway.webRequestBatcher
:
rudder-server/gateway/gateway.go
Line 246 in 995e9ca
Hi there! I wanted to propose adding the following badge to the README to indicate how many TODO
comments are in this codebase:
The badge links to tickgit.com
which is a free service that indexes and displays TODO comments in public github repos. It can help surface latent work and be a way for contributors to find areas of code to improve, that might not be otherwise documented.
The markdown is:
[![TODOs](https://badgen.net/https/api.tickgit.com/badgen/github.com/rudderlabs/rudder-server)](https://www.tickgit.com/browse?repo=github.com/rudderlabs/rudder-server)
Thanks for considering, feel free to close this issue if it's not appropriate or you prefer not to!
(full disclosure, I am the creator/maintainer of tickgit
)
This is a duplicate of the closed issue #163
Nevertheless, I tried to run rudder-server
with CockroachDB today but got an error -
ERROR Rudder server needs postgres version >= 10. Exiting.
Understand from the CockroachDB docs that it is compatible with Postgres v9.5
onwards -
https://www.cockroachlabs.com/docs/v20.1/postgresql-compatibility.html
Does rudder-server use postgres specific features like store-procs, functions or triggers? (If not, it would be nice to have the scale-out features of CRDB for the backend).
Hi Rudder team,
I've got a tableau install on-prem, i'm looking for an event processing service that will bucket events sent from my app and then be able to present the bucketed/transformed data to Tableau as a data source. Is Rudder a good fit for this?
Taken from a message in the chat
Rudder for Bigquery creates a different BQ dataset for each source. So if you have a single project you're tracking from two different sources, it will be stored in two different data sets.
We should be able to take the schema as an input in our UI7
We have Terraform scripts to deploy rudder-server and it's dependencies (i.e., PostgreSQL, transformer) in AWS in the repo https://github.com/rudderlabs/rudder-terraform
We would need help to set up on Azure.
README notes to install and start config-gen. There is no such directory.
https://github.com/rudderlabs/rudder-server/blob/master/README.md#setup
Checkout the config-gen git checkout -b config-gen cd utils/config-gen npm install npm start
Hi, when I send a screen
event via the HTTP API, the screen name
is missing in both Amplitude and Snowflake destinations.
In Amplitude, the following fields are incorrect:
key | rudder | segment |
---|---|---|
display_name | screenview | Viewed <name> Screen |
event_type | screenview | Viewed <name> Screen |
event_properties.name | <name> |
In Snowflake, the Name column in the Screens table is null.
Hi, it looks like the client IP address in my destinations (Amplitude and Snowflake) matches the IP of the rudder-server node rather than the actual client.
Tested with the latest version here deployed via rudderstack-helm. As an aside, is there a way to check the version or git hash of rudder-server instance?
I have created an Angular Service which can be used if someone is already using Angulartics2Segment in their Angular2+ project.
After injecting the rudder js script in index.html, by this service in place, I just needed to rename the injected class from Angulartics2Segment to AngularRudderService and import the AngularRudderService everywhere.
Might be helpful if you guys can create AngularticsRudder package which can be installed through npm.
import {Injectable} from '@angular/core';
@Injectable({
providedIn: 'root'
})
export class AngularRudderService {
constructor() {
}
pageTrack(path) {
try {
if (window.rudderanalytics) {
window.rudderanalytics.page(path);
}
} catch (e) {
if (!(e instanceof ReferenceError)) {
throw e;
}
}
}
eventTrack(action, properties) {
try {
if (window.rudderanalytics) {
window.rudderanalytics.track(action, properties);
}
} catch (e) {
if (!(e instanceof ReferenceError)) {
throw e;
}
}
}
setUserProperties(properties) {
try {
if (window.rudderanalytics) {
if (properties.userId) {
window.rudderanalytics.identify(properties.userId, properties);
} else {
window.rudderanalytics.identify(properties);
}
}
} catch (e) {
if (!(e instanceof ReferenceError)) {
throw e;
}
}
}
setAlias(alias) {
try {
if (window.rudderanalytics) {
window.rudderanalytics.alias(alias);
}
} catch (e) {
if (!(e instanceof ReferenceError)) {
throw e;
}
}
}
}
I was trying to start rudder-server but it just failed silently. It turned out that my port 8080 was already in use on my local machine.
I'm not sure how I can replay my S3 backups for new destinations? Is this something that we can create docs for?
API Requests to rudder-server will only ever return a status code of 200 or 400.
rudder-server/gateway/gateway.go
Lines 323 to 329 in 995e9ca
This is problematic for at least 2 reasons
Services should differentiate between client errors and server errors. If the database becomes unavailable, you want the service to return 500 so that your alerting system can page the OPS team.
Some clients will retry events when they receive a 500 and remove events when they receive a 400.
The service must return 5xx on server error in order to minimize event loss during service outage.
Follow instructions on the repo
Since rudderstack aims to be the one SDK that rules them all, and since performance is hindered by so many SDKs such us GA, FB pixel, etc..
I would like to ask to add the cloud support of the FB pixel so we don't have to load it in the DOM
Here is the documentation of server API
https://developers.facebook.com/docs/marketing-api/server-side-api/using-the-api/
Thank you!
I think the Bugsnag API Key should be removed from the code and be externalised.
Should everyone trying the project get their own key?
Managed database providers like DigitalOcean's managed postgres offering requires that client to be connected securely using sslmode set to at least require
level. But right now, sslmode parameter is hard coded to disable
in rudder-server jobsdb.
Your website link on the GitHub description gives a 404 error.
Shouldn't it be changed to https://rudderlabs.com/?
Hello, I tried setting up rudderstack with docker following the official documentation (https://docs.rudderstack.com/get-started/installing-and-setting-up-rudderstack/docker)
The only change I made to the docker-compose file was changing WORKSPACE_TOKEN
with the correct value.
The server starts and the events get dispatched correctly, however the CPU usage at idle, with no events whatsoever is about 20-25% on my machine (i7-8550U 3.9GHz). I got similar results on a different machine too. I am running Linux and Docker 19.03.12.
All containers are on the "latest" tag, like in the provided docker-compose file (rudderlabs/rudder-server's hash is b0cf66d1817c
and rudderlabs/rudder-transformer's hash is 4bb81602b25f
)
Is this normal? If so, is there a way to reduce the CPU usage by tweaking the config file?
Thanks in advance!
Hi,
I just read that event data would be deleted from the PostgreSQL after the event is send to destination.
I want to back up the events on my own S3 bucket.
Let me know how to customise that.
We have Terraform scripts to deploy rudder-server and it's dependencies (i.e., PostgreSQL, transformer) in AWS in the repo https://github.com/rudderlabs/rudder-terraform
We would need help to set up on GoogleCloud.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.