Giter Club home page Giter Club logo

support's Introduction

Operate First Support

Website: https://www.operate-first.cloud/

GitHub Support Repository: https://github.com/operate-first/support

For any questions, concerns, and/or requests, please use the appropriate Issue Template to open an issue on our GitHub. If no template fits your requirement, then please make a regular GitHub issue here.

End User Support

We have a community slack channel where we post announcements, general information, and more. If you have any questions, feel free to post a message to the #support channel.

Documentation

Operate First documentation can be found here.

Meet Our Clusters

If you would like to learn more about our clusters click here

support's People

Contributors

billburnseh avatar bryanmontalvan avatar dystewart avatar fridex avatar harshad16 avatar hemajv avatar humairak avatar larsks avatar margarethaley avatar martinpovolny avatar mightynerderic avatar oindrillac avatar rbadagandi avatar schwesig avatar tumido avatar vrutkovs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

support's Issues

Latest metrics for Jupyterhub

There seems to be new monitoring metrics supported for Jupyterhub such as: https://github.com/jupyterhub/jupyterhub/blob/master/jupyterhub/metrics.py#L38

Some of these new metrics would be useful in monitoring our jupyterhub service such as:

  • RUNNING_SERVERS - the number of user servers currently running
  • TOTAL_USERS - total number of users
  • HUB_STARTUP_DURATION_SECONDS - time taken for hub to start

However, this is supported in the newest Jupyterhub version (1.2.1) whereas the current ODH Jupyterhub version seems to be behind (0.9.4)

cc @anishasthana @HumairAK

web site http://www.operate-first.cloud/ is down ???

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behaviour
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.
image

[TASK] Request for the Prometheus connection details.

User story
We thoth-devops, would like to update our gitops related manifest file with correct Prometheus details corresponding to the moc cnv cluster, so that we can have our deployment can send metrics successfully to the Prometheus endpoint and push-gateway endpoint.

Additional context
Project Thoth components require endpoint to send metrics and to store them.

Acceptance Criteria

  • Provide Prometheus URL and push-gateway URL.
  • Provide Thanos certs.

Linked epics / issues
Related-To: #16

Ceph storage for hosting publicly available datasets

Is your feature request related to a problem? Please describe.

We currently do not have a place where we can publicly host the datasets required to ensure reproducibility of our data science work. Some smaller datasets are currently being stored in github repo's for the time being, but that is only a temporary and non-ideal solution.

Describe the solution you'd like

A ceph bucket that can be used by the operate first team to host relevant datasets. We should be able to open a PR to request read-write credentials as team members.

There should be a set of publicly available read-only credentials that can be used by anyone to access data for reproducing work in notebooks.

Describe alternatives you've considered
Please see this issue in the ai-coe data science workflow repo where this has been discussed in greater detail.

But in short these are the ideas we have considered:

  • Use a public facing s3 bucket
  • Use git lfs within each repo
  • Host data on kaggle (similar to some thoth datasets)
  • Host data on Backblaze B2

[TASK] Request for the cnv ceph cluster creds

User story
We thoth-devops, would like to update our gitops related manifest file with correct ceph credentials corresponding to the moc cnv cluster, so that we can have our deployment run successfully and smoothly in the cnv operate-first cluster.

Additional context
Thoth component needs/requires ceph credentials to establish a connection with the ceph component for data storage.

Acceptance Criteria

  • Provide ceph credentials like access-key and access-secret-key

Linked epics / issues
Related-To: #16

Request for kafka on operate-first

Is your feature request related to a problem? Please describe.
Thoth-station component uses Kafka messaging.It is a required component for the proper functioning of thoth-station system.
we would like to request Kafka setup on operate-first so that component can connect to it via SSL or SASL connection.

Describe the solution you'd like
Use amq-stream operator or strimzi operator to deploy a 2 kafka broker , 3 zookeeper node instance.

Describe alternatives you've considered
get the kafka via odh-operator

rekor as an operate first service

hello, project rekor is interested in becoming an operate first service to allow argocd deployments to MOG / GKE cluster. This issue will act as a placeholder and means to track / reference.

We are working on an k8s operator and plan to explore monitoring integration.

rekor is a golang based project, it provides a web front end (rest API) and has a backend using the project google/trillian. Trillian is already supported to run in a k8s cluster and also has prometheus monitoring files available.

ODH Dashboard shows "Not Installed" for all applications.

Describe the bug
Although the ODH applications are installed and accessible, for some users the dashboard page indicates that they are all unavailable. And applications show the "application not available page".

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://odh-dashboard-opf-dashboard.apps.cnv.massopen.cloud/

  2. See image below
    image

  3. Go to https://superset-opf-superset.apps.cnv.massopen.cloud/login/

  4. See image below
    image

  5. After clearing browser cache or using an incognito window, Dashabord shows correctly and applications become available.
    image

Expected behaviour
Should not need to clear cache or use incognito windows to access correct odh dashboard and live applications.

notebook content of an image deployed on JH is incorrect

When deploying a Thoth-built image to OpenShift 4 into JH then the notebooks from the source repository are not present.

This is a placeholder, before I figure out where to put the bugreport of figure a problem on my side.

Describe the bug

While experimenting with: #2 I have successfully built an image.
The image is here: https://quay.io/repository/martin_povolny/colaboratory?tab=history

The image is correctly exposed to JH using an image stream and is available in the drop-down menu in when starting a Host in JH.

However in the image I do not see the notebooks:

jh-missing-image-2020-11-10_10-12

When I restart the Host, date-time named directories are being added. When I work in the instance, my stuff is preserved. However the content of the repo that I built from does not appear anywhere.

This is the source repo that I used:

https://github.com/martinpovolny/colaboratory

When I tried running the image locally:
podman run -p 8080:8080 quay.io/martin_povolny/colaboratory:latest start-singleuser.sh --ip="0.0.0.0" --port=8080

Then opened http://127.0.0.1:8080/?token=a.... I got what was expected:

jh-local-2020-11-10_10-16

So the files are present in the image.

To Reproduce

Following the steps in #2 (I am actually trying to write a HOWTO) got me to this situation.

Expected behavior

A running jupyterhub with the correct data from users repo is what I expect.

Additional Info

I have tried to delete the PV and PVC and start a new host in JH. It did not help.

FYI: @harshad16, @tumido

MOC JH gives similar names to every selected projects

Is your feature request related to a problem? Please describe.
Currently, we have different projects in MOC JH that we can select from the Spawn screen.

But, I select any of them, I get a generic name with the format notebook-yyyy-mm-dd-hh-mm
If I am working on multiple project's its really difficult to find the project I am interested in, In the following picture I opened four different projects but all of them have similar names.
image

Describe the solution you'd like
You can give a name such as project_name-yyyy-mm-dd-hh-mm

Missing runtimes list in ODH Elyra

As Data Scientists,

when I run my Elyra image on ODH and try looking for available runtimes, I don't fin any available.

when I create an AI pipeline and try to assign one image to the step in the pipeline, there is a list of defaults runtimes.

This behaviour I think is a bit confusing. Using upstream image of Elyra from https://github.com/elyra-ai/elyra. This behaviour is not present.

On ODH Elyra I expect to have a list of runtimes available.

Data Science Users: How should we be deploying our work to this environment?

Description

@durandom, you mentioned "the DS folks have notebooks and workflows" and also "We should only deploy stuff via argocd or PRs to the gitlab repo"; I'm not sure I understand how to deploy stuff via a PR, should we be getting some of our existing notebooks added by PR? Is there documentation somewhere for how to do this?

Also, on @hemajv 's note, should DS folks be using the services deployed by others? or deploying our own (as we all start to get something running in this environment)? My dashboard shows no projects, so I am only aware that there is jupyterhub instance since Hema announced it via email. Shouldn't we be able to see all the projects on our dashboards?

thanks!

Add thoth-ci files

Similar to the other repositories in this org, we should add the thoth-ci files so we can enable linting / checks on pr's.

Cannot authenticate on JH

Describe the bug
Logging with gmail on JH give authentication error

To Reproduce
Steps to reproduce the behavior:

  1. Try logging with gmail on JH

Expected behavior
Logging in JH after authentication.

Screenshots

{"error":"server_error","error_description":"The authorization server encountered an unexpected condition that prevented it from fulfilling the request.",

Additional context
Add any other context about the problem here.

Create a dashboards directory with README

Add a dashboards folder for placing all the Grafana dashboard JSON files for each of the monitored application/service. Include a README describing the purpose of these dashboards.

Update Jupyter Hub spawner landing page to reflect GPU availability

Description
Not sure if this is the right place for this question.

Since we don't have GPU's available yet, but the UI contains the option to request one, if you select "1" GPU or any number, it will prevent the pod from spawning due to insufficient GPU's available. Should this option be removed until we have this kind of resource available? My concern is that this could be confusing/ off-putting to new users.

Instead of removing the GPU field, could it be made into a drop down that shows the currently available GPU's? In future, when we do have them, this would also prevent people from requesting a GPU if they are all already allocated elsewhere.

Additional context

image

image

Request to integrate Mesh for Data with ODH

Hello,

IBM Mesh for Data(M4D) is interested in integrating with OpenDataHub. After speaking with @durandom and @tumido , we agree that M4D would be a great extension of features providing data cataloguing and data governance. We hope to specifically integrate with JupyterHub and Elyra in ODH.

This issue will act as a placeholder and means to track / reference.

I have submitted a request to access the MOC CNV cluster here

Request for the cnv cluster ceph details and prometheus details

Description
we liked to request the details about the ceph and prometheus on cnv cluster, so that we can get to know how we can request the credentials. These details are required for project thoth to be able to create an overlay for thoth deployment in cnv cluster.
How and where to request the details for these services?

Additional context
This would modified/enable our application to function on the cnv clusters.

[Docs] Add documentation on how to setup GitHub alertmanager receiver

This follows up on #954, where we configured our Prometheus alerts to be routed to the GitHub alertmanager receiver, which automatically opens an issue in the configured repository for the alerts triggered.

Include documentation highlighting the steps to setup the GitHub alertmanager receiver and configure Prometheus alerts.

Requirements to run AI Pipeline through Elyra on ODH on Op1st (more docs?)

As Data Scientist,

I would like to run AI Pipeline using Elyra. As runtime need to be added to each step and ODH currently supports Kubeflow.
There are some required information missing for the user.

How do I find out which is:

  • Kubeflow Pipelines API Endpoint
  • Cloud Object Storage Endpoint

do we get user credentials for Ceph? To run AI Pipeline also:

  • Cloud Object Storage Username (required)
  • Cloud Object Storage Password (required)

and also is there any default bucket assigned to user?

  • Cloud Object Storage Bucket Name (required)

Reference: https://elyra.readthedocs.io/en/latest/user_guide/runtime-conf.html#using-elyra-runtimes-user-interfac

User interaction with Jupyterhub

I'm creating this issue to understand and perhaps refine user interaction with JH on Operate first.

Currently, this is the workflow for me if I'm the user:

  1. I read a blog on the website and it takes me to an instance of Jupyter hub
  2. I login using my google account
  3. I select the image corresponding to the blog I read
  4. A container is spawned and I see the notebook directory in my PVC

A few days later, I read another blog on operate first and it takes me to another notebook image.

  1. I login using my google account
  2. I'm no longer shown a spawn screen because the container is already running for my account
  3. I stop my server, start it again and select the new image
  4. Now, I see two notebook directories in my PVC.
  5. I select the directory with the name "notebook-latest-date"

Do we want to refine this workflow?
Some ideas:

  • Instead of naming the directory "notebook-timestamp" we could call them "image_name-timestamp"
  • Do we shut down containers and free up PVC as the user exits? Do we want the user interaction to be temporary or do we want to keep a user profile?

Making this repo a central location for all operate-first docs

Hey guys. So I feel as though our documentation is getting a bit disorganized, it's not clear to me when something should go in the repos: cd, apps, argocd-apps, odh-moc-support, sre etc.

2 solutions come to mind:

  1. come up with clear distinctions on what type of documentation goes in which repo
  2. rename this repo from odh-moc-support to just support so it becomes operate-first/support and we put all our docs from the repos above, in here, for the remaining repos we keep a readme with a brief description of the repo and a link to the appropriate docs in this repo.

Upstream/Downstream:
We also need a way to organize upstream docs vs downstream docs, for example in operate-first/apps/docs the documentation is getting a bit MOC specific (e.g. how to request kafkatopics for moc kafka). This could be as simple as just splitting into sub directories like docs/upstream docs/downstream.

What do you guys think? Definitely open to other suggestions/feedback here.

Create a runbook template

Create a runbook template that can be used for outlining/defining the runbook procedures for all the services and include it in the runbooks folder

How to create a Jupyterhub image and deploy it on ODH?

As a data scientist, I want to share my notebooks with others and receive feedback. To make the notebooks more visible, How can I create a Jupyterhub image and deploy it on ODH so that they are available in the JH dropdown?

Is there any documentation regarding this?

Acceptance criteria -

  • Configuration file analysis project available in the drop-down for Jupyterhub images on ODH.

  • Documentation on how to make JH images available on ODH

Missing defaults runtimes in ODH Elyra

As Data Scientists,

when I run my Elyra image on ODH and try looking for available runtimes, I don't fin any available.

when I create an AI pipeline and try to assign one image to the step in the pipeline, there is a list of defaults runtimes.

This behaviour I think is a bit confusing. Using upstream image of Elyra from https://github.com/elyra-ai/elyra. This behaviour is not present.

On ODH Elyra I expect to have a list of runtimes available.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.