Giter Club home page Giter Club logo

nrelabs-curriculum's Introduction

❗ NOTE - this project has been archived. Please see this blog post for more details. ❗

The NRE Labs Curriculum

This repository houses the curriculum for NRE Labs, a site for teaching next-generation network engineering skills in the browser powered by on-demand, interactive virtual environments, and compelling real-world scenarios.

If you are interested in contributing to this curriculum - great! We would love to have you. Head on over to the contribution guide to get started.

nrelabs-curriculum's People

Contributors

arsonistgopher avatar bakenekonote avatar cloudtoad avatar dependabot[bot] avatar dgarros avatar hellt avatar ipvsean avatar jameskellynet avatar jnpr-raylam avatar jweidley avatar lara29 avatar mierdin avatar mwiget avatar olberger avatar pklimai avatar saimkhan92 avatar shahbhoomi avatar shrutivpawaskar avatar skondvilkar avatar smk4664 avatar sudhiram9 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nrelabs-curriculum's Issues

copy+paste in terminal

This is a known difficulty with guacamole. Since I'm implementing the terminal myself, I may be able to do this with some basic keystroke captures

"curl https://google.com" from linux1 terminal hangs

I just tried NRE labs lesson "1. Your first API call"

The very first step ("curl https://google.com") just hangs (I waited for 2+ minutes)

After typing "Enter / Return" there is zero output, and I don't get my prompt back, ever.

The terminal is in that state right now (as of Wed 17-Oct-2018 9:20am Pacific time).

I will leave the window open for a few hours.

Syringe pod in CrashLoopBackOff state

Upon running ./anti-up.sh, the syringe pod is seen to be in CrashLoopBackOff state

lrajan$ kubectl get pods
NAME READY STATUS RESTARTS AGE
antidote-web 2/2 Running 0 2h
nginx-ingress-controller-b685f5586-7hj25 1/1 Running 0 2h
syringe-74bd67cf64-fh7z9 0/1 CrashLoopBackOff 33 2h

lrajan$ kubectl logs syringe-74bd67cf64-fh7z9
time="2018-10-22T20:59:21Z" level=error msg="SYRINGE_TIER is a required variable."
time="2018-10-22T20:59:21Z" level=fatal msg="Invalid configuration. Please re-run Syringe with appropriate env variables"

Create the Lab template

We need a lab template written by lesson contributors and consumed by Syringe, so that contributors don't need to understand the inner-workings of Antidote and its components.

Templates will take certain documented node types like:

  • vJunos (a Junos VM in a container)
  • Linux black-box (contributor configured container without a terminal)
  • Linux box with terminal (guacamole session and tab for the node)
  • Jupyter (the lab's notebook)

...this can be extended in the future of course

and need to take in a lesson guide (text) to get converted to HTML in the right pane of the NRE Labs lab step. We'll need to decide what is optional and what is mandatory and the max count for types in this first incarnation of the template.

We will need a way to associate Lab templates into Lessons and into Course topics.

The networking piece is one that requires some addition detail. We need to be able to specify either a) to use the read-only singleton node, or b) some kind of topology to start in Antidote. In the case of a topology we need to know the nodes and edges to create and the default config for them.

Feedback - NAPALM Lesson

  1. Describe what a Jupyter notebook is, again that it’s open source as well. For some reason I can see people confusing Juniper and Jupyter just because they start with the same letters.
  2. The lab already starts on the Jupyter notebook tab, so clicking the button doesn’t really do anything which may be confusing to people too.
  3. Describe what NAPALM is as well.
  4. I don’t really know what I’m supposed to do with this lab. Are we meant to just read it and never actually run anything?

Remove static NRE Labs references

This will require a LOT of parameterization, including URLs and FQDNs. There are two main reasons for this:

  • If the goal is truly to let others run this themselves, obviously that won't work unless they own the domain which they don't.
  • Parameterizing domains also allows us to easily spin up a second test project in parallel. Useful for A/B testing.

New Lesson: Unified VRRP view

Look for the last status change across the VRRP group.

Topology: two network devices. can be read-only w/ VRRP setup or else this is spun up on-demand.

Labs:

  1. Get VRRP state from two nodes in parallel
  2. Issue pings between devices in same VRRP group
  3. Show interface stats on VRRP interfaces between participants
  4. Show VRRP state / last state transition

Enable LLDP

  • Enable LLDP in the new vqfx image by remounting sysfs as RW and setting the group_fwd_mask
  • Enable LLDP on the host bridge using a custom CNI plugin.
  • Configure LLDP in all network device configurations

New Lesson: Get host properties across an inventory of devices

We can use NAPALM to get all the Junos versions across your inventory of devices and sort the list from oldest to newest. We can get all the NTP servers and assert that they are the same as intended.

Topology: could be a read-only topology but probably not a singleton, maybe the triangle or any other multi-node topology. Can be an on-demand r/w topology also, but not required.

Labs in this Lessons:

  1. Run the end-to-end workflow across multi-node
  2. Get the inventory from netbox and iterate over it, printing it out.
  3. For all device, use the “getter” function for the version, and sort them by version, print out the devices+version in sorted order.
  4. Move on to another host property of interest...

New lesson: Check and set host properties across inventory

Change the motd or other host properties on devices.

Topology: netbox + a few devices. Requires an on-demand r/w topology

Labs in the lessons:

  1. Run the end-to-end workflow across multi-node
  2. Get the inventory from netbox
  3. Update banner off-box push
  4. Update banner with on-box pull

New Lesson: Path scrubbing between two points

Collect interface stats between two endpoints in search of errors or to assert there are none.

Topology: Two linux black-boxes, 3 network devices

Labs:

  1. Run end-to-end workflow
  2. Locate one of the endpoints on the network (reuse of prior lab)
  3. Hop-by-hop from that edge, record interface/queue stats as you find path to other endpoint
  4. Repeat in opposite direction
  5. Create input/output stats report

vqfx image changes

  • Use qemu monitor savevm command to save a snapshot of a fully booted and bootstrap config'd image to speed up boot times

  • Modify vqfx image to use front panel ports instead of management Probably a different PCI index, just haven't had a chance to play with it yet (should ask tim mcarthy)

  • Fix MAC address issue and stop using multiple images workaround

Release mechanisms for all components

Build basic test and release mechanism for all components. Then, switch ALL image references in all repos to a docker image tag, instead of latest.

Should also build a dev namespace, and a prod namespace. The former can use latest, so we can continue to see what the latest looks like, and then when we feel it's stable, we can run the release workflow to tag everything and update the prod namespace deployments.

See nre-learning/antidote-web#5 for start on doing this in JS

TODOs

  • Create PTR deployments
  • Create Release Scripts
  • Create CHANGELOG files
  • Automate the update of PTR deployments on merges to master
  • Docs updates on all this
  • Would be nice if the web UI would contain commits for each project
  • Need to move everything out of the default namespace

Release Workflow

  • Tag and push all docker images
  • Rotate CHANGELOG files
  • Tag all git repositories

Reviewers Doc

Need a doc for reviewers. Not only syntax but the spirit of the project. Focusing on the workflow rather than just "automating the network". Being neat with diagrams and examples. Not only for reviewers but also so contributors know what to expect. Convey that our goal is to make this the first impression for automators, so it has to preserve that.

Ingress tightening

Centralize ingresses and firm up security checks. Rate-limiting for sure, and possibly also client-IP matching.

Feedback for Fundamentals Lessons - REST APIs, Python, YAML

1.) I can’t seem to scroll up in the terminal window.
2.) You might want to tell people that they can type the commands in, but if they want to they can just run the snippet. Some people prefer to try it manually…it helps to drive home what you’re doing.
3.) It’s possible people won’t know what libraries are. In stage 2, I know you’re just trying to familiarize folks with stuff, but telling “why you might be importing libraries and what they do for you might be helpful. Otherwise it might seem a little overwhelming. Also, terminology like “python data type” might be intimidating if they don’t know what that is or why we need it. I know there is a link to the Tech Library, and you can’t include EVERYTHING, but the less people have to go back and forth, and the more they know why they’re doing a specific thing, the better their experience will be.
4.) Even words like JSON and XML may be meaningless and overwhelming if they are brand new
5.) After running the “import json” command at the end it runs “print…blah blah blah” automatically. Then it prints out of course “There are 28 interfaces in this device.” It might be helpful to put a screen shot in the lesson and tell people what you’re doing, especially since it’s the first lesson. For example, talk about how you’re using the print command to print to the screen, and then how the % sign is actually a place holder, and the number of interfaces gets queried and that’s why “28” shows up in the end.
6.) Finally, it’s a bit confusing that the “Go to the next stage in this lesson!” is just grayed out. Obviously it’s the end of the lesson, but I had to go check in the dropdown box. This is a bit nitpicky, but it would be better if you could just have a button that said “The End” or something. I thought the ultimate goal might be to gamify and have people submit what they’ve done, but I know there’s no signing in either…so that would be hard to do. Perhaps when you click “The End” or whatever, it shows a summary and maybe some more reference material? Doesn’t have to be a Juniper Call To Action or anything.

Finish st2 lesson

Cover the basics in a few stages, then go into workflows with things like the NAPALM pack

Lock down VPC

Especially the firewall configuration, it's pretty much wide open atm

New Lesson: Identify rogue devices

Discover rogue devices like Netgear/D-Link devices plugged into your network.

Topology: several hosts, 1 netbox host, 1 network device

Labs:

  1. Retrieve IEEE OUI listing(s) via HTTP call. Show all OUIs for Netgear.
  2. Retrieve inventory from Netbox.
  3. Retrieve ARP/MAC tables from devices
  4. Compare and report any Netgear OUIs in tables

dependency map

Build metadata into each lesson and say if it depends on an earlier fundamentals lesson. Pass through to the UI

Will need Syringe and Antidote-web changes.

Feedback - SIP Phone Troubleshooting Lesson

I think it might be beneficial to show the scripts somewhere. Or even better, teach people about vi (even though I’m a nano girl, but that doesn’t work) and tell them how to read through the scripts in the terminal window. Maybe talk through some of the stuff in the scripts as well with screen shots pointing out exactly what you're talking about.

New Lesson: Locate an endpoint on the network

Traverse a path in search of an endpoint where you may not even know the IP address. We use finding a SIP phone endpoint as an example in this lesson.

Topology: SIP IP phones (black-box Linux), asterisk (black-box Linux), two or three nodes in series

Labs in Lesson:

  1. Run the end-to-end workflow
  2. get SIP IP <#> (query asterix)
  3. Iterate over a route / pathfinding
  4. find source (recursion over show IP route to get leaf node, crawl across LLDP neighbors, show IP arp)
  5. Modify output of “show” command to include information about the SIP endpoint on-box
  6. Get edge devices from netbox
  7. Collect connected phones information from list of edge devices

Create a configuration file for the NRE Labs courses

The web front end should pull the course list from a configuration file, and Syringe should also use this to load the courses it knows about, so that when it loads labs, associated with lessons it can also check that a known course has been specified.

This way folks aren't contributing/testing new lessons/labs that aren't in pre-existing courses, and we can easily catch when someone tries to add a course.

Cleanup the stale running labs

After a user is done with a lab and after some time, they haven't come back to it, we need to kill the resources for that lab on the cluster. @Mierdin had the idea to do this with time-series data about the events inside the lab nodes so we can determine true activity and lack thereof.

I don't know if we want to let the inactivity time or half-life of the Syringe-injected lab be configurable in the lab template, but that can be considered with this issue or broken off separately if not implemented.

The scrollbar for the terminal window does not work

The scrollbar for the terminal window (linux1, vqfx1, vqfx2, ...) does not work
I see the slider on the right-hand side, but I cannot drag it.
I could not discover any way to scroll back beyond the currently visible screen.
This is a major issue because most of the output is larger than one screen's worth of data.

Allow multiple Antidote clusters

Today we have FQDNs in the codebase, so the project is always deployed assuming the same DNS config and reachable at the same place. This need to be parameterized for a few reasons:

  • We need to be able to stage and test new commits and stress the system outside of prod in a test/stage environment
  • Contributors need to be able to test their lessons
  • We'd like to blue-green deploy clusters to roll back in case of catastrophic outages

Fork bridge CNI plugin and add enhancements

  • Are the networks namespace-aware? Does the weave veth pair also need to have ns in the name?

  • DNS and IP addressing for each namespace. Need to document how this is laid out. DNS in the management network has to work, at a minimum. The DNS docs for k8s mostly lay this out but good to recap and link to this

  • Do you NEED to specify subnet? Maybe you don't. Not having to specify this would keep the UX better, and also prevent subnet collisions when people try to use the same subnet
    Obviously for non network devices you want auto-addressing so you prob need IPAM but for network device we're already setting IP addresses, so we just want dumb bridge.
    We should definitely test two different bridges using the same subnets, and if that doesn't work, figure out how to make it namespaced

  • Automatically enable features like allowing LLDP, etc

Error handling and display

  • Need to pop-up an error modal in antidote-web
  • Need to make sure error conditions are surfaced properly to the API in syringe
  • There are a lot of endless loops in syringe, need to add limits to these and return exception up to the API when exceeded.

Header change: add icons

Let's add GitHub, Slack, and Twitter icons up top in the NRE Labs page header to encourage more eyes on the project and more Slack engagement.

"Success" testing

Longer term, would be cool to automatically, and in the background, detect whether or not the goal of the tutorial has been followed. Especially for those lessons that start incomplete, and the user must do something to finish the lesson. Maybe something with JSNAPY, or napalm-verify?

New Lesson - Robot Framework

@lara29 and I are planning to contribute a tutorial(primer) on the Robot Framework(wrt automated network verification). Please let us know if the tutorial outline seems appropriate.

@Mierdin @cloudtoad @jameskellynet

Robot Framework tutorial tentative outline

Chapter 1
  • Introduction
  • Installation
  • Types of libraries/keywords
  • Robot file format/syntax (introduction)
  • Basic Example 1 (without vSRX interaction)
Chapter 2
  • Project organization/directory structure
  • Robot file format/syntax (continued)
  • Basic example 2 (with vSRX interaction)
Chapter 3
  • Tags/Setup/teardown
  • Detailed example3 + Explanation (with vSRX interaction)

Security and Scale

as in, not being able to run random shit in our containers. Shut off access to the internet, auth as non-root, etc.

  • Stop auth-ing as root, esp to utility containers, but also others
  • Firm up GCE ACL
  • Centralize ingresses and firm up security checks. Rate-limiting for sure, and possibly also client-IP matching.
  • Convert antidote-web to a deployment with multiple replicas.
  • Create NetworkPolicy for weave to restrict pod internet access. Ideally you could secure all pod networks too, like bridges, etc.

Finish Intro to git

Need to add a gitlab endpoint and add it as an iframe resource. Walk folks through pull requests.

Feedback - JSNAPY Lesson

  1. Can you make the JSNAPy hyperlink in the first paragraph automatically open a new tab?
  2. Also, you might want to explicitly tell people to read the README for more information, because they might not be familiar GitHub or how projects are set up.
  3. I think you mean “come up to speed” in the last sentence of the first paragraph
  4. This should be apparent, but better to be explicit. You tell people to run things on the Ubuntu host, but you might also specify the “linux1” tab.
  5. I imagine they’ll know what cd and cat means, but will they know what a .yaml file is? There should definitely be an explanation of that, and why/how it’s showing us information on our vQFXs.
  6. I’m not sure a review is actually needed in part 2, as there are only two parts and the first part only took a couple minutes. It’s actually just a little confusing…am I supposed to run it again? Why is that there, I just ran it (at least that’s what I thought)?
  7. Perhaps in part 2 we can take some time to explain more about how JSNAPy works, maybe even help them set up yaml files for a really simple network….or direct them to open the yaml files and do a bit of a review with screenshots.
  8. When you go back to running commands in part two you need to explicitly say where they should be run. (show bgp summary on vqfx 1 and jsnapy command on linux1). For the jsnapy command I think you should also tell them explicitly which folder to go back into.

More generic endpoint for iframes

No real need to keep things specific to jupyter. Need to make the existing NOTEBOOK endpoint in Syringe more generic, and support protocol and URI.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.