Giter Club home page Giter Club logo

rudderlabs / rudder-server Goto Github PK

View Code? Open in Web Editor NEW
4.0K 61.0 293.0 311.15 MB

Privacy and Security focused Segment-alternative, in Golang and React

Home Page: https://www.rudderstack.com/

License: Other

Go 85.24% Lua 0.03% Dockerfile 0.03% Shell 0.07% Makefile 0.09% PLpgSQL 0.14% HTML 14.41%
golang hybrid-cloud privacy security warehouse-management data-warehouse rudderstack customer-data customer-data-pipeline customer-data-platform

rudder-server's Issues

Create CrashReporting abstraction

We are initializing bugsnag from our main package. This should be via abstraction and should be able to write custom plugins for Sentry, NewRelic etc.

OnChange notifiers for backend config

Right now, backend polls the config-backend every n secs and forwards the latest config to all subscribers (eg. router, processor). We would eventually want to move to sockets to handle the changes instead of polling.

backend-config module should only notify changes to subscribers. It should also expose an API to return the complete config. This API would help subscribers who are coming online for the first time or if it missed handling some changes.

Following Setup Instructions for Docker Causes a Crash of the Backend

Following the Docker Setup Instructions will produce a docker image for backend that will crash due to sql.NullTime not existing.

It looks like sql.NullTime was added in golang 1.13, but the build/Dockerfile-dev is using golang 1.12.

I was able to get the backend to run by switching the base image to golang:1.14-alpine in build/Dockerfile-dev. It does get it running on my local machine, but don't know if everything is fine with a newer version of golang.

Running rudder-server without a rudderstack account?

Hello there!

It's a bit counter-intuitive to me that self-hosting this application would require creating an account on the project's web page. I'm wondering whether is a method for running the rudder server without an account-based workspace token, or otherwise to know what the purpose of this token is. The docs don't really explain its purpose, but they all state that it's a required part of setup.

Keeping in mind that I haven't read the code yet nor do I have a specific example of a failure mode, the need for an externally-generated key makes me a bit apprehensive about placing potentially sensitive information through the system. Just to give an idea of where my concerns are coming from.

Thanks for any clarification!

Support writeKey in event body

Segment's documentation recommends setting an Authorization header

writeKey, _, ok := req.request.BasicAuth()

However, there is at least one other way to supply a write key that is supported by Segment's API.

  1. Supplying a write key in the writeKey property of the event body will register as a valid event of the specified type. The writeKey in the event body will override the write key in the Authorization header if present.

  2. Also... Each item in a Batch may contain a different write key from its parent. iirc, Segment will use the batch write key by default and overrides with event specific write key for each event if present.


This API has been designed for maximum interoperability; best to make all possible accommodations.
We never know who might have rolled their own client relying on some quirk like this (they exist).

BigQuery Warehouse - DataSet Location

Hi !

I started using the rudder server locally but when adding the BigQuery destination I noticed that the "Location" configuration isn't being used ? (My DataSet is always created in the US)

When looking at the code I noticed that this location configuration wasn't used ? I've made a fix to create the Dataset with the correct location using this configuration so I could open a pull request with this change ?

Thanks :)

Test segment's Go SDK with Rudder

Bring Up Rudder

  1. Go to the dashboard https://app.rudderlabs.com and set up your account. Copy your <workspace_token> from top of the home page.
  2. git clone --single-branch --branch segment-api https://github.com/rudderlabs/rudder-server.git
  3. git submodule init
  4. git submodule update
  5. cd rudder-transformer
  6. git checkout --track origin/segment-api
  7. replace <workspace_token> in build/docker.env with the above token.
  8. docker-compose up

Setup Source/Dest

  1. Login to app.rudderlabs.com
  2. On Create one source (Android or iOS) and configure a Google Analytics with the tracking ID

Check Segment's Go SDK

Need to fix this
https://github.com/segmentio/analytics-go

Test segment's python SDK with Rudder

Bring Up Rudder

  1. Go to the dashboard https://app.rudderlabs.com and set up your account. Copy your <workspace_token> from top of the home page.
  2. git clone --single-branch --branch segment-api https://github.com/rudderlabs/rudder-server.git
  3. git submodule init
  4. git submodule update
  5. cd rudder-transformer
  6. git checkout --track origin/segment-api
  7. replace <workspace_token> in build/docker.env with the above token.
  8. docker-compose up

Setup Source/Dest

  1. Login to app.rudderlabs.com
  2. On Create one source (Android or iOS) and configure a Google Analytics with the tracking ID

Check Segment's Python SDK

  1. git clone https://github.com/segmentio/analytics-python.git
  2. In analytics/init.py replace "host=None" with "host='http://localhost:8080'"
  3. python simulator.py --type track --event track --writeKey <write_key> --userId abcd --anonymousId abcd
    (replace write_key with correct key from dashboard)

TimescaleDB support?

Hi, I was wondering if there were plans to support TimescaleDB. TimescaleDB is implemented as a PostgreSQL extension, so it might be a matter of tweaking your PostgreSQL connector to be TimescaleDB-aware. It looks like this is the approach Grafana may have taken.

UI when adding a PostgreSQL sources in Grafana
image

Thanks!

Anonymous ID appearing as null when making track call using Analytics.Net code

Anonymous ID is appearing null when I am making track call using Analytics.Net code
unknown

I am using source code from - https://github.com/segmentio/Analytics.NET
Here if I am not passing this parameter then it is creating a new object at runtime and that have anonymous ID as null - https://github.com/segmentio/Analytics.NET/blob/master/Analytics/Model/BaseAction.cs#L34
Also the SDK is not setting this Anonymous ID
I am using it at server side (.Net Application)

@Team Please let me know by when I can expect it to be resolved

How to connect rudder-server with aws rds managed database instead of running database on container?

Hi,

I am trying to run the application supported with aws rds managed database. I verified that the aws postgres database is accessible on my machine but when I tried to pass the db endpoint in the environment variable of rudder-docker.yaml. It doesn't seems to work.

    entrypoint: sh -c '/wait-for aws-rds-postgres.amazonaws.com:5432 -- /rudder-server'
    ports:
      - "8080:8080"
    environment:
      - JOBS_DB_HOST=aws-rds-postgres.amazonaws.com
      - JOBS_DB_USER=rudder
      - JOBS_DB_PORT=5432
      - JOBS_DB_DB_NAME=jobsdb
      - JOBS_DB_PASSWORD=password

Please suggest what I need to do in order to run it in docker or kubernetes?

Amplitude error: Cannot read property 'name' of undefined

Hi, I have a Java source (using rudder-sdk-java) connected to an Amplitude destination, and I'm getting the following error in the destination:

Source ID Attempt No. Job State Error Code Error Response
1fIIb8hISkVlmckcwy54umPI0A0 1 aborted 400 { "error": "Cannot read property 'name' of undefined" }

Create a test that verifies MinIO/S3 destination is working

We are using GinkGo for integration tests.
Right now, we have different test suites for the following:

  1. Events are being sent to router tables after all the required transformations
  2. Uploading of event schema to config-backend is working as expected
  3. Tables are being migrated/deleted/backed up after successful completion.

We need to add a test to see if MinIO destination is working as expected.

Docker setup is broken

Try to deploy rudder by using this tutorial: https://docs.rudderstack.com/get-started/installing-and-setting-up-rudderstack/docker
On docker-compose up -d got error in backend service:

--
-- wh_schemas
--

CREATE TABLE IF NOT EXISTS wh_schemas (
    id BIGSERIAL PRIMARY KEY,
    wh_upload_id BIGSERIAL,
    source_id VARCHAR(64) NOT NULL,
    namespace VARCHAR(64) NOT NULL,
    destination_id VARCHAR(64) NOT NULL,
    destination_type VARCHAR(64) NOT NULL,
    schema JSONB NOT NULL,
    error TEXT,
    created_at TIMESTAMP NOT NULL);

DROP INDEX IF EXISTS wh_schemas_source_destination_id_index;

CREATE INDEX IF NOT EXISTS wh_schemas_destination_id_namespace_index ON wh_schemas (destination_id, namespace); (details: read tcp 172.21.0.5:52578->172.21.0.4:5432: read: connection reset by peer)
        * driver: bad connection in line 0: SELECT pg_advisory_unlock($1)



goroutine 27 [running]:
github.com/bugsnag/bugsnag-go.AutoNotify(0xc0005a5d60, 0x3, 0x3)
        /codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/vendor/github.com/bugsnag/bugsnag-go/bugsnag.go:109 +0x2bc
panic(0x1449880, 0xc0001490c0)
        /root/.goenv/versions/1.13.8/src/runtime/panic.go:679 +0x1b2
github.com/rudderlabs/rudder-server/rruntime.Go.func1.1(0x1a22ea0, 0xc00023acf0)
        /codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/rruntime/goroutine-factory.go:37 +0x33a
panic(0x1449880, 0xc0001490c0)
        /root/.goenv/versions/1.13.8/src/runtime/panic.go:679 +0x1b2
github.com/rudderlabs/rudder-server/warehouse.setupTables(0xc000678000)
        /codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/warehouse/warehouse.go:1491 +0xfb
github.com/rudderlabs/rudder-server/warehouse.Start()
        /codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/warehouse/warehouse.go:1664 +0x119
main.startWarehouseService(...)
        /codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/main.go:153
main.main.func5()
        /codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/main.go:352 +0x21
github.com/rudderlabs/rudder-server/rruntime.Go.func1(0x170eba0)
        /codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/rruntime/goroutine-factory.go:40 +0x81
created by github.com/rudderlabs/rudder-server/rruntime.Go
        /codebuild/output/src839591307/src/github.com/rudderlabs/rudder-server/rruntime/goroutine-factory.go:26 +0x3f

Add TODOs Badge to README

Hi there! I wanted to propose adding the following badge to the README to indicate how many TODO comments are in this codebase:

TODOs

The badge links to tickgit.com which is a free service that indexes and displays TODO comments in public github repos. It can help surface latent work and be a way for contributors to find areas of code to improve, that might not be otherwise documented.

The markdown is:

[![TODOs](https://badgen.net/https/api.tickgit.com/badgen/github.com/rudderlabs/rudder-server)](https://www.tickgit.com/browse?repo=github.com/rudderlabs/rudder-server)

Thanks for considering, feel free to close this issue if it's not appropriate or you prefer not to!

(full disclosure, I am the creator/maintainer of tickgit)

Compatibility with CockroachDB

This is a duplicate of the closed issue #163

Nevertheless, I tried to run rudder-server with CockroachDB today but got an error -

ERROR   Rudder server needs postgres version >= 10. Exiting.

Understand from the CockroachDB docs that it is compatible with Postgres v9.5 onwards -
https://www.cockroachlabs.com/docs/v20.1/postgresql-compatibility.html

Does rudder-server use postgres specific features like store-procs, functions or triggers? (If not, it would be nice to have the scale-out features of CRDB for the backend).

Rudder as a datasource for Tableau?

Hi Rudder team,

I've got a tableau install on-prem, i'm looking for an event processing service that will bucket events sent from my app and then be able to present the bucketed/transformed data to Tableau as a data source. Is Rudder a good fit for this?

Screen event name missing in destinations

Hi, when I send a screen event via the HTTP API, the screen name is missing in both Amplitude and Snowflake destinations.

In Amplitude, the following fields are incorrect:

key rudder segment
display_name screenview Viewed <name> Screen
event_type screenview Viewed <name> Screen
event_properties.name <name>

In Snowflake, the Name column in the Screens table is null.

Client IP in destination is incorrect

Hi, it looks like the client IP address in my destinations (Amplitude and Snowflake) matches the IP of the rudder-server node rather than the actual client.

Tested with the latest version here deployed via rudderstack-helm. As an aside, is there a way to check the version or git hash of rudder-server instance?

It's not an issue.

I have created an Angular Service which can be used if someone is already using Angulartics2Segment in their Angular2+ project.
After injecting the rudder js script in index.html, by this service in place, I just needed to rename the injected class from Angulartics2Segment to AngularRudderService and import the AngularRudderService everywhere.
Might be helpful if you guys can create AngularticsRudder package which can be installed through npm.

import {Injectable} from '@angular/core';

@Injectable({
providedIn: 'root'
})
export class AngularRudderService {
constructor() {
}

pageTrack(path) {
try {
if (window.rudderanalytics) {
window.rudderanalytics.page(path);
}
} catch (e) {
if (!(e instanceof ReferenceError)) {
throw e;
}
}
}

eventTrack(action, properties) {
try {
if (window.rudderanalytics) {
window.rudderanalytics.track(action, properties);
}
} catch (e) {
if (!(e instanceof ReferenceError)) {
throw e;
}
}
}

setUserProperties(properties) {
try {
if (window.rudderanalytics) {
if (properties.userId) {
window.rudderanalytics.identify(properties.userId, properties);
} else {
window.rudderanalytics.identify(properties);
}
}
} catch (e) {
if (!(e instanceof ReferenceError)) {
throw e;
}
}
}

setAlias(alias) {
try {
if (window.rudderanalytics) {
window.rudderanalytics.alias(alias);
}
} catch (e) {
if (!(e instanceof ReferenceError)) {
throw e;
}
}
}
}

S3 Replay Docs

I'm not sure how I can replay my S3 backups for new destinations? Is this something that we can create docs for?

Gateway never returns 500

API Requests to rudder-server will only ever return a status code of 200 or 400.

if errorMessage != "" {
logger.Debug(errorMessage)
http.Error(w, errorMessage, 400)
} else {
logger.Debug(respMessage)
w.Write([]byte(respMessage))
}

This is problematic for at least 2 reasons

1) Lack of differentiation between 400 and 500 reduces visibility into service operation.

Services should differentiate between client errors and server errors. If the database becomes unavailable, you want the service to return 500 so that your alerting system can page the OPS team.

2) Clients treat 4xx and 5xx differently

Some clients will retry events when they receive a 500 and remove events when they receive a 400.

https://github.com/segmentio/analytics-android/blob/47f7341d81766b1b4a101ef69f491835d11f7532/analytics/src/main/java/com/segment/analytics/SegmentIntegration.java#L386-L394

The service must return 5xx on server error in order to minimize event loss during service outage.

Bugsnag Configuration

I think the Bugsnag API Key should be removed from the code and be externalised.
Should everyone trying the project get their own key?

"High" CPU usage at idle with official Docker setup

Hello, I tried setting up rudderstack with docker following the official documentation (https://docs.rudderstack.com/get-started/installing-and-setting-up-rudderstack/docker)
The only change I made to the docker-compose file was changing WORKSPACE_TOKEN with the correct value.
The server starts and the events get dispatched correctly, however the CPU usage at idle, with no events whatsoever is about 20-25% on my machine (i7-8550U 3.9GHz). I got similar results on a different machine too. I am running Linux and Docker 19.03.12.
All containers are on the "latest" tag, like in the provided docker-compose file (rudderlabs/rudder-server's hash is b0cf66d1817c and rudderlabs/rudder-transformer's hash is 4bb81602b25f)
Is this normal? If so, is there a way to reduce the CPU usage by tweaking the config file?

Thanks in advance!

How to take BackUp on S3

Hi,
I just read that event data would be deleted from the PostgreSQL after the event is send to destination.
I want to back up the events on my own S3 bucket.
Let me know how to customise that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.