danlabici / taskcluster-worker-checker Goto Github PK
View Code? Open in Web Editor NEWScript to check if any releng worker is missing in TC
Script to check if any releng worker is missing in TC
Currently when we have to work with Windows and/or Linux and we need to access the machine via ILO, we always have to take the connection information from within the Google Sheet (Moonshot Master Inventory) then continue our work as usual.
This is quite inefficient and prone to mistake, this request will fix this, but also give us extra information about a machine when we need it, such as:
ISSUE : doubled entries only for osx machines
When: this happens when running
python client.py -v -l 2 or python client.py -v -l 1 ... etc
and at the menu, selecting :
11
This doesn't occurs on menu selection : 14
We need to add Mozilla's License and QT license to the repository code.
Whenever we want to load, read and decode the JSON response from TaskCluster, it happens that workers is empty.
Currently we are doing a try/except method which didn't fix the issue, but at least we get the data after a few retries.
With TaskCluster Worker Checker growing to be more than just a tool used by CiDuty, we have to rethink how we handle logic, defs, code and many areas of the tool.
We also have a lot of areas that we can now remove and retire. Below will be a rough outline of what we need to do:
All work will be done on branch twc1.0
master
code under version tag 0.9Currently we are generating a hard-coded list based on ranges that could change at any time!
We should be using Mozilla's ServiceNow API and grab all the machines we are interested on.
What to pay attention to:
We currently don't exclude the loaner/dev machines in the final output.
This should be an easy first fix!
Simply populate: ignore_ms_* with the machines we don't need, we could simply do our look-up/pop before a = set(workersList)
happens.
doing this:
python3 client.py -w WORKER_TYPE -u LDAP_USERNAME | cat >> missing.txt
can be replaced with:
python3 client.py -w WORKER_TYPE -u LDAP_USERNAME > missing.txt
The use of "| cat >>" is a bit over-engineered.
With the latest commit 5827b3f, I introduce a lot of new variables and added complexity on how we remove all the machines that shouldn't be shown in the final output.
The implementation did by me is also ugly and doesn't really offer to much information.
Instead of using windows_pxe
, linux_pxe
, osx_other_problems
, etc as standalone variables, experiment using dictionaries as we can hold more information
problem_machines = {
linux: {
pxe_issues: {
"t-linux64-ms-001": "BUG ID HERE",
"t-linux64-ms-002": "BUG ID HERE"
},
hdd_issues: {
"t-linux64-ms-001": "BUG ID HERE",
"t-linux64-ms-002": "BUG ID HERE"
},
other_issues: {
"t-linux64-ms-001": "BUG ID HERE",
"t-linux64-ms-002": "BUG ID HERE"
},
loaner: {
"t-linux64-ms-001": "BUG ID HERE",
"t-linux64-ms-002": "BUG ID HERE"
},
},
windows: {
pxe_issues: {
"t-w1064-ms-001": "BUG ID HERE",
"t-w1064-ms-002": "BUG ID HERE"
},
hdd_issues: {
"t-w1064-ms-001": "BUG ID HERE",
"t-w1064-ms-002": "BUG ID HERE"
},
other_issues: {
"t-w1064-ms-001": "BUG ID HERE",
"t-w1064-ms-002": "BUG ID HERE"
},
loaner: {
"t-linux64-ms-001": "BUG ID HERE",
"t-linux64-ms-002": "BUG ID HERE"
},
},
}
This will let us use code that is much cleaner and easier to use. One simple dictionary is easier to maintain and can store much more useful information, such as BUG IDs, via keys and values.
This change is up for grabs, but will require quite some changes around the script.
In the logic at line168 we generate the FQDN wrong.
Current Output:
ssh [email protected]
Expected Output:
ssh [email protected]
A lot of hardcoded values and default variable values can be moved to a user_settings.json file.
This will give users the ability to customize TWC to accept their own PC configuration.
Using pygui.locateOnScreen("screenshot", confidence=1.0)
we can detect the iLO app and than look in the top left corner of the screen for the " _ " list that tells us we are successfully connected to it.
This will remove the "wait 5-10 seconds" after we click on the connect button, than wait for the connection and hope that in the specified time we gonna be connected, part of the code which ain't very pretty.
We could add a function that will ping hosts that are idle > 6 hours and returns results. What's your opinion , would this be useful ?
At the moment a new pool of ~30 machines has been introduced to the pool (23 machines successfully re-imaged and new machines are on their way)
This issue will help progress #133
Would be nice to have an integrated script for running commands (like reboot) on known machines that our current script returned as being problematic.
I have been working on a piece of code to do so but I've got stuck at the point we have to provide our DUO for logging in.
I've asked Aki for help. It's nice to have it there as well so anybody can come with ideas.
For now we have the following:
import paramiko
ssh = paramiko.SSHClient()
passPhrase = raw_input("What's your passphrase for private key?") #asks to provide password for passphrase
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
privkey = paramiko.RSAKey.from_private_key_file('/home/rolandmutter/.ssh/id_rsa', password=passPhrase) #locates private key and unlocks it
ssh.connect('t-yosemite-r7-230.test.releng.mdc2.mozilla.com', username='root', pkey=privkey) #main fuction to connect to host , that is hardcoded for now for test purposes
#paramiko.util.log_to_file(os.path.expanduser('/home/rolandmutter/paramiko.log'), logging.DEBUG)
stdin, stdout, stderr = ssh.exec_command('ls') # command to execute, again, hardcoded
print stdout.readlines() # prints what is shown in the machine bash
ssh.close()` # closes connection
New clones need to be run with -v first time, otherwise we are getting a "No such file or directory" error.
In the master inventory sheet, we have a column CiDuty CLI # of Actions Taken this column should show how many times TWC has taken action on a set machine.
For now the only action that we can do is automated reboot, but in the future we are looking at ways to implement automated re-image also.
While trying to modify the script to accept the new machines, I observed that atm we are using "mdc2_range" from linux to windows thus, providing misleading information.
Use case:
print("Total of missing server : {}".format(len(missing_machines) - len(mdc2_range)))
I also tried to correct this but for some reason, the "mdc_range" from generating windows it's not accessible.
This issue is for the GUI branch.
Currently we show all the machines if we don't have IDLE and/or Ignored? selected.
By default the filter logic should go as described below, even if nothing is selected (IDLE/Ignored) in the UI:
Basically we are missing "default values" which will be re-written by the custom input that the user sets in the UI.
This issue will help us streamline the process of installing TWC.
ISSUE : No entries while checking status on single type of machines
When : This occurs only while checking status of only one type of machines, on windows or linux.
It doesn't occurs with osx machines.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.