m88i / nexus-operator Goto Github PK
View Code? Open in Web Editor NEWSonatype Nexus OSS Kubernetes Operator based on Operator SDK
Home Page: http://operatorhub.io/operator/nexus-operator-m88i
License: Apache License 2.0
Sonatype Nexus OSS Kubernetes Operator based on Operator SDK
Home Page: http://operatorhub.io/operator/nexus-operator-m88i
License: Apache License 2.0
We already have OLM Verify, we are missing test and install.
See:
https://github.com/operator-framework/community-operators/blob/master/docs/using-scripts.md
Ingress comparison is not working as expected due to 2 issues:
k8s.io/api/extensions/v1beta1.Ingress
instead of k8s.io/api/networking/v1beta1.Ingress
Currently the comparator is not using custom comp functions, which leads to changes in Ingress (which doesn't have a default comparator) to be left unaddressed. We could wrap the comparator with a builder to point to our own comp functions, such as in the Kogito Operator.
This also increases our control over this part of the reconcile loop, which can prove to be useful in the long run.
OperatorHub needs CVS files and other metadata files that can be generated via operator-sdk
cli. Those files should be presented in the project to be able to send a PR to the OLM repo and publish it at Operatorhub.io catalog.
In the status page, the Use Red Hat Image
attribute is displaying true
even though the property is set to false
FUP for #58
Since now we have a TOC, would be nice to automate its generation on make vet
command. Adding markups like <!-- begin toc -->
and <!-- end toc -->
would ease the automation.
TOC being generated by https://github.com/ekalinin/github-markdown-toc
Is your feature request related to a problem? Please describe.
Today the health check just makes a HTTP request to the default port, instead the Nexus Server offers a standard health check endpoint on /service/rest/v1/status
.
Describe the solution you'd like
The Nexus deployment probes to check the default health endpoint instead of the default port only.
Describe alternatives you've considered
None
Additional context
See the documentation: https://help.sonatype.com/repomanager3/rest-and-integration-api/status-api#StatusAPI-Status
Would you be able to assist in testing this feature if implemented?
Yes
Issue to track our capability level. We should be able to do minor upgrades of the installed Nexus Server. This will probably impact the way we handle the application image, since we are using latest
tag by default, which is not a good practice. Today, if the user provides an image for the Nexus server, the operator will stick with it, unless the user manually change it later in the CR.
Ideally, users should be able to define the major Nexus version and the operator should upgrade minor versions automatically. We will reach another maturity level with this approach. See: https://sdk.operatorframework.io/docs/operator-capabilities/#level-2---seamless-upgrades
On OpenShift environments, a Route
will be created to expose the Nexus service. For Kubernetes environment, a custom ingress controller.
Exposure on Kubernetes environments will be tackled in future versions.
We could adopt the same pattern used in #52 to simplify the resource life cycle management. We could introduce an ìnterface
along the lines of:
type ResourceManager interface {
GetRequiredResources() (map[reflect.Type][]resource.KubernetesResource, error)
GetDeployedResources() (map[reflect.Type][]resource.KubernetesResource, error)
GetComparator(t reflect.Type) func(deployed resource.KubernetesResource, requested resource.KubernetesResource)
}
Objects that implement this interface would be responsible for managing a domain of resources life cycle.
It could also simplify the logic contained in pkg/controller/nexus/resource/resources.go
by storing a set of ResourceManager
objects which could be iterated over. In pseudo-code, for example:
GetDeployedResources() {
for manager in ResourceManagers {
deployedResources += manager.GetDeployedResources()
}
return deployedResources
}
The package naming is not in sync with the OLM specification: https://operator-sdk.netlify.app/docs/olm-integration/user-guide/#creating-a-bundle
When attempting to deploy to an OCP 3.x cluster the following error pops up:
Error creating: pods "nexus3-695b67564f-" is forbidden: unable to validate against any security context constraint: [fsGroup: Invalid value: []int64{200}: 200 is not an allowed group spec.containers[0].securityContext.securityContext.runAsUser: Invalid value: 200: must be in the ranges: [1000160000, 1000169999]]
As Nexus must run using this UID the cluster administrator needs to create a scc to work around this. It would be nice if we could supply this scc and have this documented.
It would be nice to have e2e tests for the multiple networking scenarios we support.
The Nexus Operator is on unstable state, duo to old Kube API access, need to be updated ASAP.
https://github.com/operator-framework/operator-sdk/releases/tag/v0.17.0
Is your feature request related to a problem? Please describe.
Today when the user clones the project and just run make install
the nexus deployment fails as the image with 0.3.0 doesn't exist on quay, user can edit the deployment and change the image tag to run the pod
Describe the solution you'd like
User should not edit anything just running the make install
should get the deployment running
Describe alternatives you've considered
Would you be able to assist in testing this feature if implemented?
Yes
When creating a new resource manager today we return its interface:
func NewManager(nexus v1alpha1.Nexus, client client.Client) infra.Manager {
While this is fine at the moment, it goes against the "Accept Interfaces Return Struct" rule of thumb. Returning an interface brings no additional benefit, but it can reduce the code's flexibility by removing all of the object's behavior that's not explicitly defined in the interface.
For example, consider the following interface:
type Person interface {
Name() string
Age() int
}
One implementation of that interface could be:
type Gamer struct {
name string
age int
games []Game
}
func (g Gamer) Name() string {
return g.name
}
func (g Gamer) Age() int {
return g.age
}
func (g Gamer) Games() []Game {
return g.games
}
If the function that creates a new gamer returned Person
, we could never access this additional behavior defined in Games()
. If it returned *Gamer
, we'd still be able to use functions which receive a Person
as parameter, as Gamer
implements that interface AND we'd still be able to make use of this type's specific behavior.
When starting the Operator on Openshift 3.x the following error occurs:
2020-05-02T19:57:51.816-0300 ERROR cmd Manager exited non-zero {"error": "no matches for kind \"Ingress\" in version \"networking.k8s.io/v1beta1\""}
github.com/go-logr/zapr.(*zapLogger).Error
/home/lcaparel/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
main.main
/home/lcaparel/gitrepos/nexus-operator/cmd/manager/main.go:155
runtime.main
/usr/local/go/src/runtime/proc.go:203
FATA[0012] Failed to run operator locally: failed to run operator locally: failed to exec []string{"build/_output/bin/nexus-operator-local"}: exit status 1
This is due to the fact that we only compare group names when assessing which resources are available in pkg/framework/controller_watcher.go
:
for _, object := range watchedObjects {
// core resources
if object.AddToScheme == nil {
desiredObjects = append(desiredObjects, object)
} else {
found := false
for _, serverGroup := range serverGroups.Groups {
if strings.Contains(serverGroup.Name, object.GroupVersion.Group) {
addToScheme = append(addToScheme, object.AddToScheme)
desiredObjects = append(desiredObjects, object)
found = true
delete(c.groupsNotWatched, object.GroupVersion.Group)
break
}
}
The Ingress is part of networking.k8s.io/v1beta1. Openshift 3.x supports networking.k8s.io/v1, but not v1beta1. As we compare the group name but not the version, found
is set to true
, which eventually leads to the panic.
Add unit test coverage reports with Coveralls: https://github.com/marketplace/actions/coveralls-github-action
Nexus 3.12 introduces support for S3 blob stores. The Nexus Operator should consider supporting an S3 blob store via an ObjectBucketClaim offered by the lib-bucket-provisioner operator (see https://github.com/kube-object-storage/lib-bucket-provisioner) and its various implementation operators.
When I deploy Nexus without persistence then the Nexus pod fails with "Permission denied" issues.
Example CR:
apiVersion: apps.m88i.io/v1alpha1
kind: Nexus
metadata:
name: nexus3
spec:
replicas: 1
useRedHatImage: false
resources:
limits:
cpu: "2"
memory: "2Gi"
requests:
cpu: "1"
memory: "2Gi"
persistence:
persistent: false
networking:
expose: true
In the log I can see:
id: cannot find name for user ID 1000650000
Warning: Cannot open log file: ../sonatype-work/nexus3/log/jvm.log
Warning: Forcing option -XX:LogFile=/tmp/jvm.log
OpenJDK 64-Bit Server VM warning: Cannot open file ../sonatype-work/nexus3/log/jvm.log due to Permission denied
java.io.FileNotFoundException: ../sonatype-work/nexus3/tmp/i4j_tA0O_LqRVFNhWb_IlDQiAGNa5vA=.lock (Permission denied)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
at com.install4j.runtime.launcher.util.SingleInstance.check(SingleInstance.java:72)
at com.install4j.runtime.launcher.util.SingleInstance.checkForCurrentLauncher(SingleInstance.java:31)
at com.install4j.runtime.launcher.UnixLauncher.checkSingleInstance(UnixLauncher.java:88)
at com.install4j.runtime.launcher.UnixLauncher.main(UnixLauncher.java:67)
...
Is your feature request related to a problem? Please describe.
At the moment the Pull Policy is hardcoded as "Always", meaning the user has no choice whether or not to pull an image.
Describe the solution you'd like
It would be nice if the user could select the policy as they see fit.
Describe alternatives you've considered
There could be a new field in the Nexus CRD to control this.
Additional context
N/A
Would you be able to assist in testing this feature if implemented?
Yes 😁
When it’s first created, Nexus 3 writes a file in /nexus-data
directory containing the temporary password for admin.
A pre hook script that writes to this file a pre defined password can be set so users could easily grab it in the Operator Status screen to do the first login.
See: #63
This is a regression from #41 fix.
In version 0.2.0 users will see the following message in the logs:
E0515 16:51:07.295411 1 reflector.go:153] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:105: Failed to list *v1.ServiceAccount: serviceaccounts is forbidden: User "system:serviceaccount:nexus:nexus-operator" cannot list resource "serviceaccounts" in API group "" in the namespace "nexus"
To workaround this issue the nexus-operator
service account must have permissions on the object serviceAccount
. We control this kind of permission in the roles.yaml
file. See #41.
There are some other things to polish before to fix this bug. That's why we opened this issue.
Today the operator only supports version 3, but version 2 stills in development. Would be nice to support both.
Reference https://github.com/sonatype/docker-nexus
Nexus 3 has support for high availability. This issue is to track this requirement and analyse the possibilities to implement it. We could consider deploying Nexus as a statefulset
instead of deployment
in such scenarios.
Nexus has a built in backup capability. Would be interesting to also have a this feature supported by the operator by providing a persistent storage to it. See: https://help.sonatype.com/repomanager3/backup-and-restore/configure-and-run-the-backup-task
The Nexus CRD interface would have a switch to turn backup on/off. If "on", the operator would create a PVC for it, and call the internal Nexus API to create this task for the admin, setting the backup path to the volume mount.
Also, the "notification" e-mail should be added to the interface: an attribute describing the adminEmail
and a backup
structure with the notificationEmail
on it. If the former is empty, we would take the adminEmail
.
Structure suggestion:
apiVersion: apps.m88i.io/v1alpha1
kind: Nexus
metadata:
name: nexus3
spec:
(...)
adminEmail: [email protected]
backup:
enabled: true
notificationEmail: [email protected]
# ideally greater than the one set for the service
volumeSize: 10Gi
(...)
Default proxy repositories created by the Operator:
Those repos should be grouped together in the maven-public
group.
This is the basic functionality for the Nexus Operator. Users will have the option to create a new Nexus
instance based on a given image (version 3.x)
After deployed, users could see the deployment status based on the Deployment
resource like:
oc describe nexus
If there's no image input. the latest
tag will be deployed.
Is your feature request related to a problem? Please describe.
At this moment validations are spread out in the reconcile loop and in resource-generating functions. This makes the code harder to maintain as we need to worry about validation in more than one place. Additionally, projects importing this are not able to send a Nexus CR to the managers without making additional checks themselves.
Describe the solution you'd like
Make the validation take place in a single place behind exposed API, improving code maintainability and usability.
Describe alternatives you've considered
Shift the validation responsibility to the Resource Managers. The reconcile loop would use them as would any users importing us as a library. The resource Managers would then be responsible for:
If the Manager finds an error it can't recover from when validating the Nexus CR it should return an error and die.
Additional context
N/A
Would you be able to assist in testing this feature if implemented?
Absolutely. 😁
Is your feature request related to a problem? Please describe.
In the CONTRIBUTING.md if one clicks on the the go.mod file
link it reroutes requests to a link which is not available
Describe the solution you'd like
On clicking the link, it should be rerouted to the go.mod file on the repo
Describe alternatives you've considered
Changing the path on link should fix this issue
Additional context
N/A
Would you be able to assist in testing this feature if implemented?
Yes
Instead of relying on the image to set a random password and then have the user to cat
the generated password in the container, would be better if we just set the default password on installations. The password can be changed later by the user manually.
Today we had an outage in Quay repository, which had a great impact in our CI since we pull the OLM image from it.
Would be better to cache this image with Docker service and load it every time instead of pulling it. Will decrease our CI processing time and will also prevent our workflow to face problems like this in the future.
Link for the outage: https://status.quay.io/incidents/kw2627bsdwd9
When deploying on OpenShift, would be better to have support for Image Streams, this way we could create a new stream on openshift namespace to fetch from Red Hat Catalog certifies images, otherwise users would have to set their own tokens within Nexus namespace to use Red Hat certified images.
We are missing basic e2e tests: https://operator-sdk.netlify.app/docs/golang/e2e-tests/
This issue follows up #32
At this moment we have two security issues in our latest image, impacting our users. We should rebuild to take a fresh ubi8
base image that has these fixes.
Also, we will create a tag 0.2
pointing to the latest micro version. This way administrators can choose whether or not deploy the operator with the latest micro versions.
More information about the security issues:
https://access.redhat.com/errata/RHSA-2020:2637
https://access.redhat.com/errata/RHSA-2020:1998
Thanks @Kaitou786 for reporting it.
Would be safer to follow the INSTALL4J_ADD_VM_PARAMS
with the Resources Requirements set for the Deployment
of Nexus 3
Let's try to stay in their release pace: https://github.com/operator-framework/operator-sdk/releases/tag/v0.18.0
Since we are on 0.17.1, this upgrade should be seamless.
Nexus Operator should open the default liveness and readiness probes for users to customize them.
See: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
On OpenShift, we can safely use Routes
, but on Kubernetes we should rely on Ingress
.
Reference:
https://kubernetes.io/docs/concepts/services-networking/ingress/
In current master this is the nexus spec object supporting the following parameters.
type NexusSpec struct {
Replicas int32 `json:"replicas"`
Image string `json:"image,omitempty"`
Resources corev1.ResourceRequirements `json:"resources,omitempty"`
Persistence NexusPersistence `json:"persistence"`
UseRedHatImage bool `json:"useRedHatImage"`
Networking NexusNetworking `json:"networking,omitempty"`
}
I would like to suggest an enhancement to this spec following all the deployment parameters such as taints,tolerations, liveliness, affinity , pod disruption budgets to be added.
Ill raise a PR for this soon.
Possible to add this on Projects as ticket enhancement.
Since we proved that KIND works for OLM checks, we could use the same platform to run our e2e tests.
Container hangs on pod initialization problem because of lack of permissions on /nexus-data
directory:
mkdir: cannot create directory '../sonatype-work/nexus3/log': Permission denied
mkdir: cannot create directory '../sonatype-work/nexus3/tmp': Permission denied
OpenJDK 64-Bit Server VM warning: Cannot open file ../sonatype-work/nexus3/log/jvm.log due to No such file or directory
Warning: Cannot open log file: ../sonatype-work/nexus3/log/jvm.log
Warning: Forcing option -XX:LogFile=/tmp/jvm.log
java.io.FileNotFoundException: ../sonatype-work/nexus3/tmp/i4j_ZTDnGON8hezynsMX2ZCYAVDtQog=.lock (No such file or directory)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
at com.install4j.runtime.launcher.util.SingleInstance.check(SingleInstance.java:72)
at com.install4j.runtime.launcher.util.SingleInstance.checkForCurrentLauncher(SingleInstance.java:31)
at com.install4j.runtime.launcher.UnixLauncher.checkSingleInstance(UnixLauncher.java:88)
at com.install4j.runtime.launcher.UnixLauncher.main(UnixLauncher.java:67)
java.io.FileNotFoundException: /nexus-data/karaf.pid (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:101)
at org.apache.karaf.main.InstanceHelper.writePid(InstanceHelper.java:126)
at org.apache.karaf.main.Main.launch(Main.java:243)
at org.sonatype.nexus.karaf.NexusMain.launch(NexusMain.java:113)
at org.sonatype.nexus.karaf.NexusMain.main(NexusMain.java:52)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.exe4j.runtime.LauncherEngine.launch(LauncherEngine.java:85)
at com.install4j.runtime.launcher.UnixLauncher.main(UnixLauncher.java:69)
java.lang.RuntimeException: /nexus-data/log/karaf.log (No such file or directory)
at org.apache.karaf.main.util.BootstrapLogManager.getDefaultHandlerInternal(BootstrapLogManager.java:102)
at org.apache.karaf.main.util.BootstrapLogManager.getDefaultHandlersInternal(BootstrapLogManager.java:137)
at org.apache.karaf.main.util.BootstrapLogManager.getDefaultHandlers(BootstrapLogManager.java:70)
at org.apache.karaf.main.util.BootstrapLogManager.configureLogger(BootstrapLogManager.java:75)
at org.apache.karaf.main.Main.launch(Main.java:244)
at org.sonatype.nexus.karaf.NexusMain.launch(NexusMain.java:113)
at org.sonatype.nexus.karaf.NexusMain.main(NexusMain.java:52)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.exe4j.runtime.LauncherEngine.launch(LauncherEngine.java:85)
at com.install4j.runtime.launcher.UnixLauncher.main(UnixLauncher.java:69)
Caused by: java.io.FileNotFoundException: /nexus-data/log/karaf.log (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at org.apache.karaf.main.util.BootstrapLogManager$SimpleFileHandler.open(BootstrapLogManager.java:193)
at org.apache.karaf.main.util.BootstrapLogManager$SimpleFileHandler.<init>(BootstrapLogManager.java:182)
at org.apache.karaf.main.util.BootstrapLogManager.getDefaultHandlerInternal(BootstrapLogManager.java:100)
... 12 more
Error creating bundle cache.
Unable to update instance pid: Unable to create directory /nexus-data/instances
Looks like master
is a racism term. I'd like to hear from you guys more about it:
https://www.hanselman.com/blog/EasilyRenameYourGitDefaultBranchFromMasterToMain.aspx
There are no unit tests for the Resource Managers, it would be nice to have some.
Users can update the Spec.Image
field. In this case, a new pod should be created and deployed to the cluster preserving it's volume, secrets, configMaps and so on.
Data incompatibilities won't be treated for now.
Having the operator's Service Account using a restrictive SCC would improve the operator's security.
I have an initial implementation of this that is failing to build due to some dependency issues. The libraries we're using seem to be somewhat incompatible as they are now, let's keep a close watch to continue this as soon as possible.
At the moment the cluster admin must add an SCC to the Service Account in order to be able to start pods correctly in OCP 3.x. (#41) and if this was implemented it wouldn't be necessary.
On OpenShift, rely on Routes. For Kubernetes, open a spec to add crt and key to be injected into the Ingress
Today we have the following implementation for the fake client used in testing, taken from pkg/test/client.go
:
// NewFakeClient will create a new fake client with all needed schemas
func NewFakeClient(initObjs ...runtime.Object) client.Client {
return fake.NewFakeClientWithScheme(GetSchema(), initObjs...)
}
// GetSchema gets the needed schema for fake tests
func GetSchema() *runtime.Scheme {
s := scheme.Scheme
s.AddKnownTypes(v1alpha1.SchemeGroupVersion, &v1alpha1.Nexus{})
s.AddKnownTypes(routev1.GroupVersion, &routev1.Route{}, &routev1.RouteList{})
return s
}
While this is nice and really useful, it doesn't allow us to mock specific responses to certain events/actions. That feature would come in pretty handy when testing the resource managers' ability to handle/report errors when the server responds, for example, with a 500 status code.
Digging up some docs I found that k8s.io/client-go/testing
(which is already a dependency for the project) has a nice fake client implementation that allow us to insert "interceptors" (called Reactor
here) that allow us to define how the fake client should respond to certain actions.
Check out its godoc for all the good stuff and this issue for an example of working usage.
Also noteworthy is that our fake discovery client uses the k8s.io/client-go/discovery/fake
package, which is built on top of the same Fake
implementation.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.