greenelab / xswap-manuscript Goto Github PK

View Code? Open in Web Editor NEW

3.0 7.0 5.0 25.5 MB

Manuscript on XSwap network permutation and hetnet node degrees

Home Page: https://greenelab.github.io/xswap-manuscript/

License: Other

HTML 86.51% Shell 13.49%

xswap networks permutation manuscript manubot hetnets

xswap-manuscript's Introduction

XSwap Manuscript Source

This repository contains the manuscript source for the XSwap project. This manuscript is now published at:

The probability of edge existence due to node degree: a baseline for network-based predictions
Michael Zietz, Daniel Himmelstein, Kyle Kloster, Christopher Williams, Michael Nagle, Casey Greene
GigaScience (2024-02-07) https://doi.org/gtcbks
DOI: 10.1093/gigascience/giae001 · PMID: 38323677 · PMCID: PMC10848215

Manuscript description

This repository is a template manuscript (a.k.a. rootstock). Actual manuscript instances will clone this repository (see SETUP.md) and replace this paragraph with a description of their manuscript.

Manubot

Manubot is a system for writing scholarly manuscripts via GitHub. Manubot automates citations and references, versions manuscripts using git, and enables collaborative writing via GitHub. An overview manuscript presents the benefits of collaborative writing with Manubot and its unique features. The rootstock repository is a general purpose template for creating new Manubot instances, as detailed in SETUP.md. See USAGE.md for documentation how to write a manuscript.

Please open an issue for questions related to Manubot usage, bug reports, or general inquiries.

Repository directories & files

The directories are as follows:

content contains the manuscript source, which includes markdown files as well as inputs for citations and references. See USAGE.md for more information.
output contains the outputs (generated files) from Manubot including the resulting manuscripts. You should not edit these files manually, because they will get overwritten.
webpage is a directory meant to be rendered as a static webpage for viewing the HTML manuscript.
build contains commands and tools for building the manuscript.
ci contains files necessary for deployment via continuous integration.

Local execution

The easiest way to run Manubot is to use continuous integration to rebuild the manuscript when the content changes. If you want to build a Manubot manuscript locally, install the conda environment as described in build. Then, you can build the manuscript on POSIX systems by running the following commands from this root directory.

# Activate the manubot conda environment (assumes conda version >= 4.4)
conda activate manubot

# Build the manuscript, saving outputs to the output directory
bash build/build.sh

# At this point, the HTML & PDF outputs will have been created. The remaining
# commands are for serving the webpage to view the HTML manuscript locally.
# This is required to view local images in the HTML output.

# Configure the webpage directory
manubot webpage

# You can now open the manuscript webpage/index.html in a web browser.
# Alternatively, open a local webserver at http://localhost:8000/ with the
# following commands.
cd webpage
python -m http.server

Sometimes it's helpful to monitor the content directory and automatically rebuild the manuscript when a change is detected. The following command, while running, will trigger both the build.sh script and manubot webpage command upon content changes:

bash build/autobuild.sh

Continuous Integration

Whenever a pull request is opened, CI (continuous integration) will test whether the changes break the build process to generate a formatted manuscript. The build process aims to detect common errors, such as invalid citations. If your pull request build fails, see the CI logs for the cause of failure and revise your pull request accordingly.

When a commit to the main branch occurs (for example, when a pull request is merged), CI builds the manuscript and writes the results to the gh-pages and output branches. The gh-pages branch uses GitHub Pages to host the following URLs:

HTML manuscript at https://greenelab.github.io/xswap-manuscript/
PDF manuscript at https://greenelab.github.io/xswap-manuscript/manuscript.pdf

For continuous integration configuration details, see .github/workflows/manubot.yaml.

License

Except when noted otherwise, the entirety of this repository is licensed under a CC BY 4.0 License (LICENSE.md), which allows reuse with attribution. Please attribute by linking to https://github.com/greenelab/xswap-manuscript.

Since CC BY is not ideal for code and data, certain repository components are also released under the CC0 1.0 public domain dedication (LICENSE-CC0.md). All files matched by the following glob patterns are dual licensed under CC BY 4.0 and CC0 1.0:

*.sh
*.py
*.yml / *.yaml
*.json
*.bib
*.tsv
.gitignore

All other files are only available under CC BY 4.0, including:

*.md
*.html
*.pdf
*.docx

Please open an issue for any question related to licensing.

xswap-manuscript's People

Contributors

Stargazers

Watchers

Forkers

naglem cgreene dhimmel kkloste paulmichaelsmith

xswap-manuscript's Issues

Manuscript feedback from afar

It's hard for me to type, because I'm traveling with no laptop, so I decided to leave a video review of the in progress manuscript @zietzm printed for me. It's split into three videos available on the following album https://photos.app.goo.gl/VYuWFFMMVCXBU47c8

As always, feel free to disregard any advice that you disagree with. I think the manuscript has improved a lot, and with a bit more focus on nailing home the main points, this will be a strong study on a challenging topic!

Edge Prior Appendix questions

Second, we assumed that the XSwap process is stationary. This assumption also does not actually hold, but it was made because it significantly simplifies the problem.

@zietzm Why do we say the xswap process isn't stationary? This is not obvious to me.

Prepare for Gigascience publication

Good news in email from 2023-11-01:

GIGA-D-23-00002R1
The probability of edge existence due to node degree: a baseline for network-based predictions
Michael Zietz; Daniel S. Himmelstein; Kyle Kloster; Christopher Williams; Michael W. Nagle; Casey S. Greene
GigaScience

Dear Prof. Greene,

Appreciate your patience.

Peer review for your manuscript has been completed and your paper is provisionally accepted pending any final minor changes, please work with the curators to prepare a GigaDB DOI for your datasets, as this will need to be cited in the paper before we begin the production process.

Our curators have been informed and they will contact you within a week.

Please note we cannot formally accept your manuscript (and send it to production) until the GigaDB DOI citation is finalized and added to the manuscript.

Warm regards,
Hongfang

I'll note some things here that we'll want to do before the final DOCX submission to GigaScience based on what we learned from the connectivity-search-manuscript proofing process.

Data availability for GigaScience submission

We've submitted this manuscript to GigaScience (GIGA-D-23-00002). Part of the submission process is an editorial check to ensure data availability. Super excited to see a journal taking such proactive measures.

The check identified some data that we need to better document, export, and upload to GigaDB. Quoting from an email from Yannan Fan on 2023-01-11:

Dear Casey S. Greene,

At GigaScience Press we have strict transparency and reproducibility policies; to ensure this is maintained it is imperative that all data and tools required to reproduce your manuscript are available to the reviewers and ultimately the readers. To facilitate this we have developed GigaDB, which is the repository used to host data and tools associated with articles in GigaScience and GigaByte. You can see a complete workflow of the combined submission process here. This email represents the step labeled as "provide authors with private FTP login".

I have had a quick read through your manuscript. To enable full reproducibility and transparency please follow these instructions on how to make your data available:

Please read the guidelines for submission of software data: http://gigadb.org/site/guidesoftware

You should include a list of md5sum values for all files uploaded so that we can confirm file integrity after transfer.

Please also include a readme.txt file listing all files with a 1 line description of each.

Please provide the raw data underlying Figure 4A-B, Figure 5, Figure 6A-C, Figure 7, and Figure 9.

I've opened this issue so we can coordinate and track progress fulfilling this data request.

Comments on manuscript

1765_001.pdf

Old-school markup attached, that should probably be broken into multiple issues. Mixture of fine-grained (e.g. typos and concerns about figure legends/clarity) and broader-scale concerns (e.g. lack of definitions of technical terms, organization, and seemingly overly-explanatory language "justifying" the approach/results).

Overall, I have some qualms about the tone of the paper (e.g. what is being explained, where, and at what level of detail), and would be happy to have a call to discuss this more broadly. I also feel like the paper suffers from a lack of technical specificity in much of the narrative. I'm still not 100% sure I know what you algorithm you're using when you say "edge reconstruction", which worries me.

Let me know how you'd like to approach things from here. Thanks!

Relevant background literature

This issue is for commenting with relevant literature or prior work. For substantive discussion about a prior work, consider opening a new issue.

Results Paragraph Tuning

Since the edge prior is based only on degree, it is unsurprising that it exhibits weak performance in predicting a network with a different degree distribution.
We have considered the edge prior as a baseline edge predictor, whose performance indicates the utility of degree for a specific prediction task.
The edge prior's low performance in the third task indicates that degree is less helpful for edge prediction tasks in which training and testing networks do not share their degree distributions.
Moreover, we believe such between-distribution prediction may be a relatively common task, with examples given by the networks in Figure {@fig:degree-bias}.

I feel like this paragraph could use some tuning. It would help to give the take-home message more clearly to aid readers. I think that it's there, but it's hidden a bit between the lines.

Update Manubot and fix duplicate CI

Manubot is out of date.

Also seems that we're building on both:

Ideally we should migrate to travis-ci.com and also consider switching the the encrypted variable SSH key method.

Inviting contributors as manuscript coauthors

@zietzm is currently working in the Greene Lab for the summer before he departs to start his PhD at Columbia University in the fall. @zietzm's work from his time in the lab will likely be divided into two manuscripts:

This manuscript in greenelab/xswap-manuscript, viewable here, which covers creating randomized networks via the XSwap permutation algorithm to calculate a prior probability of edge existence based on node degree (termed the "edge prior"). Repositories related to this work include https://github.com/greenelab/xswap-analysis, https://github.com/hetio/xswap, and discussion in greenelab/connectivity-search-analyses#134.
The hetnet connectivity search manuscript regarding the webapp at https://het.io/search/. This is a separate manuscript which we hope to begin writing soon. This will cover work like matrix DWPC computation, DWPC null distributions, etcetera. We will address authorship for this work separately.

@cgreene, @zietzm, and I discussed contributors to this manuscript and have identified the involvement of the following individuals (in provisional authorship order):

@zietzm
@dhimmel
@kkloste -- deriving an analytic method for the edge prior
@chrsunwil -- assisting with a C implementation of xswap
@naglem -- discussions of relevance to applications
@bdsullivan -- participation in site visit and phone discussion
@cgreene

If you are listed above and would like to be a coauthor, please open a pull request to add your information by going here. Adding yourself as an author indicates that you agree to ICMJE author guidelines and will stay involved in reviewing and contributing to the manuscript as needed. Finally, author order and membership is provisional and may be reevaluated at any time to account for the latest contributions.

Languange Formality

I have noticed as I'm reading that we use "picked up on" a fair bit. I don't think that is going to be formal enough for academic writing. We should look for instances of this after we merge things and replace them.

Update this paper's thumbnail

Hello Manubot user.

If you were not already aware, your paper is being showcased in the Manubot Catalog.

Previously, we were making our own thumbnails for each paper, and hosting them in the same repository as manubot.org. But now we are moving to a system of having each paper store its thumbnail in its own repository, and linking to it from there.

Here is our suggested thumbnail for your paper that follows the new thumbnail guidelines:

Please place this image in the root of your repository with the name thumbnail.png. Once you have, let us know in this issue and we'll do the rest to update the catalog.