codeyourfuture / immersive-go-course Goto Github PK

View Code? Open in Web Editor NEW

88.0 5.0 81.0 7.9 MB

An immersive, introductory course to backend software engineering using go.

Home Page: https://systems.codeyourfuture.io/

License: Creative Commons Zero v1.0 Universal

Dockerfile 1.74% Makefile 1.58% Go 44.26% Shell 5.89% PLpgSQL 2.23% JavaScript 1.32% SCSS 26.93% HTML 16.05%

backend distributed-systems golang hacktoberfest

immersive-go-course's Introduction

Let's go!

An immersive, introductory course to backend software engineering using go.

This course is split into two strands: workbooks to read, and projects to build. In the future, there will also be troubleshooting exercises to do, and potentially product-focused work.

You can view all of the projects in the projects section, and all of the reading in the primers section. We recommend following through both in parallel.

Don't worry about all of the details in the primers - they're more intended as introductions to topics which you may want to reference back in the future as you run into specific issues. Trying to write code and solve the problems solved by topics in the primer will help you understand their contents a lot better.

Requirements

Before you start this course, make sure you've read and followed all of the instructions in the prep document. This will get you set up and explain how to work through the projects.

Remember: you can always Google or ask for help if you get stuck.

Contributing

Having your help to improve this course would be amazing. See CONTRIBUTING.md for a guide on how to get involved.

Feel free to ask for help in the CYF Slack!

immersive-go-course's People

Contributors

Stargazers

Watchers

Forkers

illicitonion bandya2003 berkeli ritaglushkova anza-azam mandeepsangha mbarrin sallymcgrath alexander-scott ibrahimparvez2 francineblanc pseudomera jasonpunzalan alphadia reversearrow chinmoyacharjee khunkyawtunwin rrichards jawkha markfarmiloe angelisek nczempin daqsa jalbers90 david-neumann atoyegbe olabamipetaiwo ndiberaymond joel-onojason quan194 yabacoder bgdn-r bskimball dee-d-dev dissurender williampepple1 fkhjoy tanvirraj origho-precious takanome-dev sodakang oreoluwa-bs neojunsiang benkoben hamptokr ara-o bernardkwesi gabrielfu ceoehis gary-x-li xuche123 prajnasatryass mskrzypietz d-yakubov osobotu med8bra mladen40 nsaimk spamlawlz chettriyuvraj marangaa ktran1005 abassadeyemi vijayksingh saadiaelf alirezabg leilafaez farzaneh-haghani annafyz chidimmaofodum bazmurphy olhadanylevska davappler underfalsename andsemenov radha13 tednoob17 abubakar-meigag dani-r-36 shurc1val nahratzah

immersive-go-course's Issues

Add some reference material on golang Context

Both of our Cohort 2 trainees have raised existential questions around Context. We may want to find some useful references or write up some material about this.

We may even want to make a "implement WithTimeout using WithValue" exercise to show that there's no magic here, it's really just a fancy key-value store.

(I've also gone on a tangent around persistent data structures when talking through this, unclear whether that's actually useful/worthwhile :))

Suggestion: PR templates

It's important to embed good commit discipline in our trainees. Can we possibly set some guidelines around

small commits
clear explanations in and titling of PRs
asking for code review
making changes

Some example templates in this repo might be useful. Would also be amazing if PRs that don't meet defined standards are automatically rejected (with explanation).

Pull out extension work into a consistent place in each project README

Action from sync on 12 Oct.

Extensions to the project READMEs are hard to find — we should pull them out to a consistent place.

Typo in the Sentence

Description

I found a typo in the statement in the document #2 of the distributed-software-systems-architecture

https://github.com/CodeYourFuture/immersive-go-course/blob/main/primers/distributed-software-systems-architecture/state.md#cache-invalidation-cache-invalidation

Expected Behavior

There are three hard things in computer science: cache invalidation, naming things, and off-by-one errors.

The statement should say three instead of two.

Actual Behavior

There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.

The statement says two instead of three.

Set expectations better around paper reading

We should call out in the sprint workbook / distributed systems primer that there's a "read some papers and book time to discuss them" element which may take disproportionate time, and needs planning
We should call out how many papers we expect the fellows to read - sprint 2 currently has 5 pretty beefy papers, and this is also probably our fellows' first experience of reading academic papers, so we should expectation set how many of them we expect them to read (and how much depth to go into)

Write a project around versioning and automating of database schemas

Manually setting up databases on your machine for testing is annoying and fiddly.

Having to use a single shared database for testing servers is annoying and lacks isolation.

We should consider introducing checking in schemas and automating creation / migration.

Repository needs a contributing guide

We need a CONTRIBUTING.md file that covers:

Project structure (implementation branch → IMPLEMENTATION.md → README.md on main)
How to add and update a project
Preferences on how we teach
Code of conduct

Other?

Add a clarification in the gRPC project that our server is being a client of other servers

Specifically around the text "How should we deal with errors, e.g. if the endpoint isn't found, or says the server is in an error state?" - it's not immediately clear that we're talking about the server we're probing, rather than our prober server itself.

Troubleshooting exercise 2: Consider Running the process as a non-root user

Rather than requiring the trainees sudo, it could be interesting to run the process as their user. This mimics many production set-ups where processes run as service-specific users (which teams may be able to ssh as) to avoid needing to give many folks root.

Add a project around parsing text files

It's useful to be comfortable reading and parsing files in assorted formats. In particular, parsing a custom hand-rolled format that isn't just passing to an existing parser, would be useful.

Let's put together a project which exercises this.

Maybe: Read several files which contain game score data (repeated name + score pairs) in different formats, aggregate them together, and output some aggregate statistics (highest score, most-games-played player, ...).

Let's parse some JSON, YAML, CSV, and a custom hand-rolled format of some kind (maybe something that looks like proto textformat).

Add some further Kubernetes to curriculum

there's one small thing with hpa on minikube; more kube would be helpful.
perhaps including writing a controller - modifying an existing controller could cover the 'more modification of existing code' goal also

Add documentation extensions to a few project

Encouraged spaced repetition of the concepts by asking the trainee to document what they’ve done for someone who is going to work with them on it.

Broken images

Images in the primers aren't showing up.

http://localhost:1313/primers/distributed-software-systems-architecture/reliable-rpcs/ is broken image but https://github.com/CodeYourFuture/immersive-go-course/blob/main/primers/distributed-software-systems-architecture/reliable-rpcs.md has the image

Looking now to understand the issue

Scrolling is really slow and jerky/jittery in Chrome on macOS

Trying to use any page in systems.codeyourfuture.io in Chrome on macOS, scrolling is really slow and jittery.

See this video of me trying to scroll reasonably smoothly and quickly:

scrolling-chrome.mov

Safari seems much smoother:

scrolling-safari.mov

System details:
Chrome Version 120.0.6099.234 (Official Build) (arm64)
macOS 13.5.2 (22G91)
2022 macBook Air M2
Incognito tab with no extensions active

Fix outstanding TODO in server-database project

Prompt specific consideration of trade-offs

Ideally our fellows should be able to fluently answer questions, e.g. "what trade-offs are involved with increasing this timeout" - "more likely to succeed in face of network disruption, but slower throughput", or "what trade-offs are involved with setting this cache TTL longer".

We aren't currently setting up very clear frameworks of thought around this.

Rework curriculum to use Envoy LB

Currently some projects use nginx and the free version has some limitations.
Also writing an Envoy plugin would be a super project.

Replace "just run Postgres" with a more meaningful MySQL set-up in multiple-servers

This project: https://github.com/CodeYourFuture/immersive-go-course/tree/main/projects/multiple-servers skims over the database piece and just says "Run a Postgres"

Let's rework this a little to involve setting up a MySQL from scratch, or maybe giving a broken configured MySQL and have our trainees debug why the MySQL isn't working.

Write up example of a protobuf migration

In motivating why protobuf (aside from efficient binary encoding), one of the big ideas is around API evolution, and simplifying coordinating rollouts.

An example I've been using to explain this is the example of migrating a field. For instance with this protobuf:

message PersonResponse {
    string name = 1;
}

to this protobuf:

message PersonResponse {
    string full_name = 1;
    string display_name = 2;
}

and maybe even then to this one:

message PersonResponse {
    reserved 1; // Used to be full_name
    string display_name = 2;
}

(an alternate I've used is making a field repeated)

and having three services (a client, a server, and a proxy in the middle), having to think about things like what order these changes are deployed. Comparing this to a JSON world where these field renames (and possibly not preserving unknown fields when round-tripping in the proxy) introduce a lot more deployment coordination requirement.

We should maybe write up such an example, showing all of the ways things can go wrong if you're not coordinating these rollouts well (including e.g. needing to coordinate rollbacks!), and showing how protobuf (which still requiring thought, as well as following best practices) reduces the space of things that can go wrong.

Add a mandatory intro to tracing to the end of kafka-cron

kafka-cron introduces prometheus metrics in a really useful way.

The following project, raft, has a lot of new concepts and design space for the fellows.

There are some simple, meaningful tracing points we could add to kafka cron to introduce them to the basic concepts of tracing, so there are fewer new concepts for them to learn all at once in raft; I think we should consider doing so.

This could fall into sprint 5 (as an "intro to tracing"), rather than sprint 4 (as a "rest of kafka-cron"), if we wanted.

Develop complex batch-processing project

Use glow framework https://blog.gopheracademy.com/advent-2015/glow-map-reduce-for-golang/

Inconsistent use of we vs us

The project readmes make inconsistent use of "we" and "you":

"In this project you're going to..."
"In your command line/terminal, make sure your..."
"We've been using single-quotes..."
"In this exercise, we chose to make some of our requests fail fast..."

We should make this consistent: always use "we".

We should also always use active voice ("We are going to build a server..." vs "A server will be built...")

This issue applies to existing README.md files on main: fixing impl branched is out of scope.

One exception will be if the trainee has to figure it out for themselves. In this case it is OK: "This will be up to you to figure out!"

Demos/Presentation skills: Prep more for considering different audiences

In iteration 1 of the course, all of the demos were being given to fairly similar, technically minded folks with high context of the projects being demo'd.

Considering different audiences, their pre-existing knowledge, and what they're going to find interesting/engaging from a demo is an important skill to develop.

We should help our trainees practice this.

Develop project to introduce Docker and cloud deployment

Develop consul project

Introduce a pre-cobra CLI project

I think we should add a no-cobra no-dependencies project which exercises:

stdout vs stderr
Reporting errors to the user (i.e. having a main wrap an error-returning function and separating concerns of user output vs functionality
Intro to testing CLI app components
Implementing an interface on a novel struct

Possibly this could take the form of two exercises along the lines of:

Write all args comma-separated to a file
Implement your own version of bytes.Buffer (or a slightly cut-down API thereof)

Revise Project Learning Objectives

Overview

This course currently has a few projects that have learning objectives defined. However, they can be improved. This ticket will list the changes we want to make and where we want to make them.

Marking tasks as Core and Stretch

Currently it is not clear when a project is done, versus what are stretch goals.

We should define each task as being core and stretch. So those doing it know when they can consider the work done, as opposed to pushing themselves.

Extract core objectives as success criteria

Once we have marked what is core versus what is stretch, we should define the success criteria based on the core objectives. This will provide people with a good overview of what they need to learn from the project, and reinforce the stretch goals as being not required to take away the key learnings.

Projects

immersive-go-course

Develop terraform project

Build on the ideas from docker-cloud to terraform the infrastructure required.

Learning objectives:

What is IaC and why do we use it?
How does IaC fit with CI/CD?
What is terraform & how do we use it?
How do we run terraform within a CI/CD process?
How do we deploy one or more servers using terraform?

Introduce preparing a post-incident review as part of troubleshooting

One thing we encourage our trainees to do is write up a log of what they did in their troubleshooting exercises.

We should maybe frame this as preparing for a post-incident review to be a more realistic deliverable goal that transfers into work to give it a bit more structure and motivation.

Add guidance to trainees to send out a PR for one project at a time

Our first trainees did multiple projects before sending out a PR, where lessons learnt from the first could have been applied to the subsequent ones, rather than getting similar comments to apply to several projects.

Add links to memcached and gRPC projects in the main README

They're missing right now.

Console input/output format is not explained

It isn't super clear that the following means "run curl foo and expect the output Hello"

> curl foo
Hello

We should explain this in the first exercise where it shows up.

Troubleshooting exercises: Make runnable without instructor intervention

Currently we have a lot of manual steps, and hard-or-impossible-to-find AMIs.

If random folks on the internet wanted to pick up the course, they probably wouldn't be able to do any of the troubleshooting at all.

We should fix this.

Develop simple batch-processing project

Image processing. Consider using channels to build a concurrent architecture.

Learning objectives:

What is batch processing and how does it differ from building servers?
How do we build resilience into batch processing software?
How do we use Go to use run existing software to complete tasks?
What is cloud storage technology and how do we read from it & upload data to it?
How do we deploy batch processing tasks in the cloud?

Amazon Lambda might give us a better model for this, but we'll build up to that.

The task:

Run: go run ./cmd/process --file images.csv
Run within Docker
Download each image in the file, make it monochrome, and upload it to S3
Log what is done
Don't process the same image twice
Support reading from S3: go run ./cmd/process --file-remote [s3 url]

Libraries to use:

Develop cache project

Develop "buggy app" project

Debug and fix a buggy application.

Learning objectives:

How can we quickly read, understand and fix existing application code?
How do we QA running code by thinking about security, edge cases and performance?
How do we ensure safe data access with authentication & authorisation?
What are some common architectures for services in tech companies?

The following may be covered by previous projects:

How do we run multiple services and dependencies locally?
How can services interact beyond HTTP?

Plan

This will be a simple three-service application: API service, auth service and database.

The API service will pull in a client module for the API service, and communicate with it over gRPC.

Architecture

               ┌───────────────────────────────────────┐      ┌─────────────────┐
               │              API Service              │      │       DB        │
               │                                       │      │                 │
               │ ┌────────────┐           ┌─────────┐  │      │                 │
     ┌────┐    │ │            │           │         │  │      │ ┌─────────────┐ │
     │HTTP│    │ │            │           │         │  │      │ │             │ │
─────┴────┴────┼▶│    Auth    │──────────▶│  Notes  │──┼──────┼▶│    Notes    │ │
               │ │            │           │         │  │      │ │             │ │
               │ │            │           │         │  │      │ └─────────────┘ │
               │ └────────────┘           └─────────┘  │      │                 │
               │        ▲                              │      │                 │
               └────────┼──────────────────────────────┘      │                 │
                        ├────┐                                │                 │
                        │gRPC│                                │                 │
                        ├────┘                                │                 │
                        ▼                                     │                 │
                ┌──────────────┐                              │ ┌─────────────┐ │
                │              │                              │ │             │ │
                │ Auth Service │──────────────────────────────┼▶│    Users    │ │
                │              │                              │ │             │ │
                └──────────────┘                              │ └─────────────┘ │
                                                              └─────────────────┘

HTTP request hits API service with HTTP simple auth, and auth layer first
Auth client calls Auth Service over gRPC, which verifies credentials against Users table
Once validated, auth client allows request to continue to the Notes module, which returns the data

Possibly there would be a simple frontend that interacts with the API.

API

GET /1/my/notes -- Get all notes for the authenticated user
GET /1/my/notes/:id -- Get a specific note for the authenticated user

The Notes model will return a "tags" field with the content of the note, by looking for #hashtag.

DB

The database will be Postgres.

User:

id: primary key
status: active | inactive
password: bcrypt
created: timestamp

Note:

id: primary key
owner: fkey into User
content: text
created: timestamp

File structure

This will follow a mono-repo structure, with api and auth at the root.

The auth package will expose a client that the api will depend on.

The services will be coordinated locally with docker-compose.

Challenge

This application will contain some bugs, as follows:

⚠️ Don't read this if you are working through the project!

Bugs

The Notes model will allow access to any note regardless of the authenticated user
The Notes model will query all Notes, and then filter them by ID on the server-side
The Notes DB tables will be missing an index on owner
The Auth Client will not check if the user is active or inactive
The Auth Client will cache authentication results in-memory with no TTL
The Notes tags implementation will have a buggy regex that is too eager (#this will will be a tag up to punctuation)

The instructions will be that there are at least N bugs, and the use the application to find and fix them.

TODO

... some stuff that came before this TODO list ...
Connect the auth service to the DB in a testable way & build the auth logic
Auth client talks to auth service to check verification
Verify the authentication works end-to-end
Add schema for notes & build the notes API layer
Add auth client caching features
Add "tags" field with the content of the note, by looking for #hashtag.
Switch status to string from int
Document DB migrations and link to some resources about them

Buggy app: Document expected outputs

I'd suggest the minimum is: A high quality bug-report per issue (as you would report for an open source project), sufficient for the project owners to understand the issue, its impact, and be able to fix it without further questions.

As an extension, putting together PRs fixing the issues.

We should work out where we want these. Maybe as issue reports on the fellow's fork? Or a PR to their fork with the issues?

Develop gRPC project

Two servers communicating over gRPC.

Consider introducing simple IaC (Terraform) for deployment.

server-database doesn't error or recover from a closed connection

Steps to repro:

git checkout impl/server-database
Setup the database according to README instructions (in particular, go-server-database database)
DATABASE_URL='postgres://localhost:5432/go-server-database' go run .
curl "http://localhost:8080/images.json" -iL
turn off the database
curl "http://localhost:8080/images.json" -iL

Expected:

An error or
An automatically reopened connection
Error message in the logs

Actual:

null HTTP response

Add more gRPC practice projects

Our fellows should have written so many gRPC servers and clients that they can spin up a new one from scratch in 15 minutes.

Right now we only have one gRPC project, and it lacks:

Writing the proto definitions from scratch
Hosting multiple services (both on the same port, and on separate ports)
Single binaries acting as both clients and servers

We should bulk out the gRPC experience a bit with some repetition so that by the time they're implementing raft, this stuff is second nature.

Add cloud deployment implementation to multiple-servers

Find the largest file on the filesystem

Description

An exercise for the candidates to find the largest files on the filesystem on Linux.

Steps to setup

create a big file, there are many options - truncate or dd are most common but not limited.

dd if=/dev/zero of=/path/to/largefile bs=4k count=10G

truncate -s 10G /path/to/largefile
(s for size)

Find the largest file

The simplest command to do that is to run du -h /*. The user can add sort -n to get the highest.

One can also us ls -lshR to search recursively.
(R for recursive, l for long list format, s for size, h for human readable)

Note: I didn't use df -h to find the partition first because it might the case there are multiple medium sized file in a partition all adding up to 11G but that doesn't mean this partition has the largest single file with size 10G

Intro transaction concept

We talk about txns but don't explain them

immersive-go-course/primers/distributed-software-systems-architecture/state.md

Line 40 in 8df2ee4

 In the first case, your primary datastore can handle the increased read load and the only consequence for your application will be an increase in duration of requests served. Your application will also consume additional system resources: it will hold more open transactions to your primary datastore than usual, and it will have more requests in flight than usual. Over time your cache service fills with a useful working set of data and everything returns to normal. Some systems offer a feature called _[cache warmup](https://github.com/facebook/mcrouter/wiki/Cold-cache-warm-up-setup)_ to speed this process up. 

provide a starter with all the mechanics of rpcs etc defined, but some methods to be filled in
provide a full implementation but without the otel tracing element. include some bug or bugs that can be tracked down with tracing.

codeyourfuture / immersive-go-course Goto Github PK

immersive-go-course's Introduction

Let's go!

Requirements

Contributing

immersive-go-course's People

Contributors

Stargazers

Watchers

Forkers

immersive-go-course's Issues

Description

Expected Behavior

Actual Behavior

Overview

Marking tasks as Core and Stretch

Extract core objectives as success criteria

Projects

Plan

Architecture

API

DB

File structure

Challenge

TODO

Description

Steps to setup

Find the largest file

Recommend Projects

Recommend Topics

Recommend Org