spacetelescope / braindump Goto Github PK

View Code? Open in Web Editor NEW

2.0 19.0 5.0 26.42 MB

A place to collect notes from code discussions

Jupyter Notebook 100.00%

astronomy jwst hst wfirst

braindump's People

Contributors

Stargazers

Watchers

Forkers

nden sosey stscieisenhamer jdavies-st kmacdonald-stsci

braindump's Issues

Workflow playground

Do you want to use this repo as a workflow playground? I.E - should we all fork this to our own accounts and update with pull requests?

Blind leading the blind: Chase the memory leak

Upvote to listen to the mind-numbing story of ~~resolving~~doing an end-around of the Spec2Pipeline memory leak. Punchline: the reason is still unknown, but one of the beast's heads have been removed.

Git: should we rebase or pull/merge?

Discussion of when pulling/merging is better/worse than rebasing. @eteq can frame the discussion and his attitude about this.

tox

I nominated @drdavella to give a brain dump on tox.

Just testing tasklist

https://docs.github.com/en/issues/tracking-your-work-with-issues/about-tasklists

Tasks

Beta Give feedback

indented tasks (i.e. nested tasklist; support for this is coming!)
an empty task (i.e. - [ ] on a line by itself)
any empty new lines (before/after the tasklist or in between tasks)
duplicate issue/pr links
a draft task exceeds 512 characters

Please check the tasklists documentation for more information. 🔗 Feedback Discussion

Thank you for participating in the Private Beta ❤️

Python asyncio

@jbcurtin volunteered

JWQL demo/talk

@bourque would like to present JWST Quick Look tool and get some perspective from outside of INS.

Moving JWST to github

May 19th, noon, S322

A suggested list of topics to discuss. Please add other topics as you see necessary.

What to move from SVN to github (we don't have to move everything)
How to organise the code

The current organization uses namespaces. IIRC this was necessary because of a requirement that
steps are to be installed as a separate packages as well as as one package. Is this still a good idea?
What is the current nightly SSBDEV procedure, i.e. after the code is moved to github what should we
expect from the build scripts?
What workflow to adopt?

Confluence notes

astropy
Testing
Documentation
Releases

Fun with Assembly

@jhunkeler and @rendinam

Refactoring regression tests

Currently, SSB regression tests live in a private and hard to access SVN repository. Cron jobs are set up on certain machines (Mac OSX, RHEL6, and RHEL5) to update test scripts from that repo once a day and run with pandokia, compare results with "truth" images/tables, and generate HTML reports. Developers then look at the reports and can choose to "okify" failed tests (i.e., make the new results the new "truth").

Pros

It is already working.
It is familiar to us.
pandokia is developed in-house (free, no licensing issues).

Cons

pandokia has limited documentation. Some internal wiki instructions but that's about it.
pandokia lacks Python 3 support, as it was written about 10 years ago.
pandokia is not flexible in letting tests reading in big data from an arbitrary path. For example, CALWF3 input data are required to be on the current machine in the current directory.
SVN repository not always in-sync across machines. For example, test updated on a machine and pushed to SVN repository might not get picked up by the other machine. This problem is suspected to be caused by SVN version is too old on one or more of the machines (but upgrading the SVN version would cause a different kind of problem).
This system has a single point of failure, in the sense that only Joe or Christine (and perhaps Vicki if she has time) knows how to fix it. Okay, up to three points of failure.
Not all exceptions are caught properly by pandokia. For example, a syntax error will result in the test being omitted from final report entirely (not to be confused with being reporting as missing in the report).
Tests are disconnected from the actual codes that they are testing, in the sense that they are not under the same version control.
A test messing up can affect unrelated tests. That is, the whole regression test system can fail because of a single test.
Regression test codes are invisible to anyone outside SSB.
It is difficult to access the current SVN (e.g., no Trac site).

Proposed Changes

Move test codes out of the hidden and hard-to-access SVN repository back to the respective code repositories. For example, pysynphot tests go back to pysynphot GitHub repo. An exception can be made for HSTCAL (because it's written in C but its tests are in Python) and legacy codes (e.g., PyRAF, pytools). I am willing to absorb CALACS tests into acstools, but CALWF3 has opposing view. There is no reason why tests for all the different packages need to be in the same repository. It is trivial to skip/xfail a test requiring big data that is not present.
Replace pandokia with a new test system. For example, a private version of Travis CI or Jenkins CI.
Replace SVN with git, to be consistent with our recent move to GitHub. Also, this way, we do not need a special global account to modify the tests. Ideally, we can open pull requests and merge like we do with "real" codes.
Switch from nose to pytest. As we move forward with more and more codes depending on Astropy or its template, this is unavoidable.
Create a new test workflow. There is no reason for tests to run every night for all the packages. Tests should only be run when there are changes relevant to the codes being tested (e.g., a new commit or a new data file). Also, they must be able to be run easily manually as needed. For example, when there is a new pull request, we can checkout the codes from the pull request and just type py.test packagename [args] or python setup.py test packagename [args].
Figure out a cleaner way to store input and output test data. While smaller data can live in the code repository (see first point above), big data should be somewhere else. Also we should address things like: Do we need to version control input data? Do we need to keep old "truths" once we "okify" new ones? How will the "okify" process work with Travis or Jenkins CI?
While we're at it, we can review the affected tests and discard those that are outdated and do not make sense anymore. This will reduce maintenance costs going forward and lessen total run time.
Replace RHEL5 and RHEL6 with RHEL7. Or at the very least, get rid of RHEL5. Also, if applicable, upgrade Mac OSX test machine.
Do we even want to consider a Windows test machine? Maybe Windows 10?

c/c: @sosey @nden @cdsontag @jhunkeler @vglaidler @stsci-hack @justincely and whoever else in SSB that is interested

p/s: This is how I envision the change -- https://www.youtube.com/watch?v=mZ6_0wGGsuY

Concurrent AWS Lambda calls with bert-etl

When DSII TESS Lambda project wraps up, if there is enough interested, @jbcurtin and I can jointly demo this one. Should invite @mustaric as she is the PI of this project.

Refactored (py)synphot using Astropy

@philhodge suggested that I post this here in case it does not make it to SSB meeting slot. Thanks, Phil!

http://synphot.readthedocs.io/en/latest/

https://github.com/spacetelescope/synphot_refactor

Netflix Polynote?

It would be good to have a session on Netflix's Jupyter notebook alternative, Polynote. Maybe just a hack session if no one has yet tried it?

https://towardsdatascience.com/what-you-need-to-know-about-netflixs-jupyter-killer-polynote-dbe7106145f5

Mocking HTTP RESTful services: Testing JWST Engineering DB access

A thrilling topic for sure, but since I spent (way too much) time on this, is there any interested in a discussion?

Celery (not the relatively useless vegetable)

I have been playing with celery (distributed processing Python package) a bit and find it an interesting package. I am very new to it but would be happy to come up with something to show what it can do.

Regular Expressions I Have Known and Loathed

Whenever I mention regular expressions, I get the impression that some people are uncomfortable with them. Perhaps this is too basic, but I can give a talk on regular expressions, using as an example parsing words out of a line of text, building up from a simple regular expression to something that is hideous to contemplate.

Project boards?

So many choices. Trello, Airtable, Waffle.io, Emacs... Which one to use? There's JIRA too but apparently we can't use it to track arbitrary things.

(Feel free to close this if it is not a good braindump topic.)

c/c @hcferguson

Meld demo

Maybe someone can show us about meld (http://meldmerge.org/) -- @nden ?

p.s. Not to be confused with Vulcan mind meld.

How to make software citable

Zenodo DOI, etc etc

@sosey volunteered

Python 3.8 walrus operator

Koo koo kachu

JWST Associations

Jonathan will tell us all about them.

Thu, Aug 11, noon
Cafecon

New data types from Google etc

@hcferguson volunteered. Something about 64-bit compute using 32-bit only?

Should we test for code coverage?

Is this out of topic here? Not sure.

Python type hinting discussion

Be great to have a discussion of the if/how/when/why of using the new Python type hinting. Even if its the short "nope".

Hubble machine learning

I volunteer @brechmos-stsci to talk about creating HAL... I mean, machine learning using Hubble data.

JWST data models

Any interest in discussing data models?
Any issues that need to be discussed?

Present/ talk about code coverage

I've been learning a ton about how code coverage is calculated and all the graph math that's happening on the innards of things like coveralls in my Software Testing class, and it seems like it would make an interesting brain dump. At least, I find it super interesting and fun.

Jenkins

There is interest from INS Software Engineers to learn about how to use Jenkins for their work.

Update the help

Tyler Desjardins mentions that we should consider moving emails from help[at]stsci.edu to point to the web portal where possible and appropriate. For HST (or any non-JWST), it is https://hsthelp.stsci.edu . For JWST, it is https://jwsthelp.stsci.edu . Please update info in setup.py, setup.cfg, documentation, etc as appropriate.

Please close this issue if it is irrelevant to your repository. This is an automated issue. If this is opened in error, please let pllim know!

xref spacetelescope/hstcal#317

ASDF chunked array demo

A demo of the new zarr chunked array extension for ASDF
https://github.com/asdf-format/asdf-zarr/blob/main/notebooks/ASDF_array_storage_intro.ipynb

Jenkins examples

It'd be nice to hear how to setup Jenkins on a repo and see some examples of converting pandokia tests to Jenkins.

Sara Test

@SaraOgaz

Advanced git workflow

I see this topic as distinct from github workflows, although there will probably be some overlap. I think it would be useful to talk about how git can be used locally to aid a development workflow, and some of the more 'advanced' git features that not all users may be aware of, including:

git gui
git grep
git reset
git rebase
git reflog
git rerere

It might also be useful to talk about git integration with shell environments, including plugins that allow for tab completion of git commands and commits.

What's new in Python and Numpy

Now that build 7 is done (is it?) it looks like there's interest in a session (or two) on the above topic.
Is (Thu) Dec 15 a good day for this (please show 👍 or 👎 )?

Also we need two volunteers to prepare the two topics. Please volunteer here or send me an email.

Diagnosing memory issues in Python extensions

In case it is of interest, I have prepared short demos on the following topics:

detecting memory leaks in Python extensions using valgrind
detecting buffer overruns in Python extensions using address sanitizer

Let me know if there are other topics that would be of interest. These might include using gdb to debug Python extensions, detecting undefined C behavior in extensions, etc.

Development Tools

Things people use to help with their development:

flake8 (pyflakes + pep8)
pylint
autopep8
Valgrind

Some of these can be encorperated into your editor for on the fly style checking

Discuss ASDF

Apr 14, 2016

asdf-standard

Python implementation

ASDF versioning is documented here:

https://github.com/spacetelescope/asdf-standard/blob/master/source/versioning.rst

The #ASDF line refers to the file format -- how blocks are laid out, how offsets are calculated, etc. Basically anything a reader would need to know to separate all of the blocks in the file, but not necessarily the meaning of the tags in the YAML portion.

The #ASDF_STANDARD refers to the ASDF standard, including all of the YAML tags and their meanings. While each YAML tag is individually versioned, the #ASDF_STANDARD groups those up into a single version that can be easily checked such that a reader could say "I don't understand this version of the spec, but I'll do my best to load it anyway...".

Also there is some discussion about versioning in this link:

asdf-format/asdf-standard#90

Should we test on Windows platform? How?

(I want to see how fast this one is closed as "wont fix".)

New pytest plugins showcase

@drdavella turned some astropy test helpers into actual pytest plugins outside of astropy. I would be interested to learn how to use these for my own projects!

Documentation: State-of-art, expectations, and gittn' it done.

Since the jwst package will be needing a hefty overhaul of its documentation before the mythical 1.0 release, helped by some recent work, #8, seems a review of state-of-art and a discussion of what we vs. users can expect would be in order in the near-ish future.

A recent interview with the Matplotlib Lead Developer title Matplotlib Lead Developer Explains Why He Can’t Fix the Docs—But You Can brought this to mind.

Python typing

https://docs.python.org/3/library/typing.html and https://github.com/python/mypy

I volunteer @jbcurtin

Temporal Logic of Actions

Since it came up in the lunch today, here are couple links to the "Temporal Logic of Actions" literature.

Glupyter feedback

I asked about these during the demo (https://github.com/spacetelescope/braindump/tree/master/glupyter_20180101) and was asked to post them as issue, so here they are:

Glue API could use a to_table method (instead of re-indexing)
Expose irregular brush or lines API (e.g., for user to arbitrarily draw a line on the CMD and select stars in/near the line)
Bug: Duplicate tab when linking im to obj with subset already created before

(I hope these still make sense.)

mmap (memory mapping)

@drdavella was volunteered.

Configure VS code for Python

It would be good to hear about configuration people use for VS code.

Using custom STScI template locally and on RTD - works!

I got the custom template for our docs working locally and on RTD. Here's an example of what it looks like:

http://wfc3tools.readthedocs.io/

If you want to use it with the repo that you manage, edit your conf.py to include:

html_theme = 'sphinx_rtd_theme'
html_theme_options = {
"collapse_navigation": True
}
html_logo = '_static/stsci_pri_combo_mark_white.png'
html_static_path = ['_static']
html_context = {
'css_files': [
'_static/css/custom.css',
],
}

html_last_updated_fmt = '%b %d, %Y'
html_sidebars = {'**': ['globaltoc.html', 'relations.html', 'searchbox.html']}

You can copy the corresponding logo and custom.css files from the wfc3tools package.

Recent developments on de-blending astronomical sources

Tools like SExtractor and DAOphot have been the industry standard for source detection and image segmentation. Big-survey projects like LSST and WFIRST and Euclid are investigating ways to move beyond these and in particular to fold in multi-wavelength information. A year ago I would have said LSST was barking up the wrong tree on what they were pursuing. But within the last few months they have made real progress on using the color information and Non-negative Matrix Factorization to separate overlapping objects. I can review what I heard at the last LSST all-hands meeting. If we wait a few months, I might get an update on this from a series of telecons that are just starting.

Emacs org mode

@stscieisenhamer , please show us the way!