Giter Club home page Giter Club logo

Comments (3)

peekxc avatar peekxc commented on June 15, 2024

Hey @corybrunson,

Thanks, this was a fun problem.

In short, if I'm not mistaken, there's actually a perfectly idiomatic and efficient (in the output-sensitive sense) operation that simplifies doing this. Unfortunately, it involves a C++ function that isn't directly accessible from the R-side, since I never made a traversal for it. It is nonetheless used in computing the nerve and flag complexes.

I never made this operation a traversal because it appeared non-trivial to convert the function to its range/iterator form, from which traversals are based. In C++20, this will be theoretically simplified a great deal with coroutine generators.... so... stay tuned for an R version....

Anyways, here's my attempt at it, which may be off if my interpretation of the problem is wrong. Sampling the vertices is easy.

n <- 25
prob <- c(1.0, 0.8, 0.5, 0.1)
st <- simplex_tree()

## Re-assign n after retaining each w/ probability p0
n <- as.integer(ifelse(length(prob) > 0, rbinom(n = 1L, size = n, prob = prob[1L]), n))
st %>% insert(as.list(seq(n)))

Onto the edges, the paper says to 'connect every pair of retained vertices by an edge with probability p_1'. So like you did, I sample from the binomial distribution to simulate performing n independent trials, retaining edges with some success probability. To actually determine the edges themselves, I'm using a great utility function included in the package called nat_to_sub, which provides an bijection between the set of natural numbers [1, n choose k] and the set of (n choose k) combinations, lexicographically-ordered.

## Sample and insert edges
num_edges <- rbinom(n = 1L, size = choose(n, 2), prob = prob[2])
idx <- sort(sample.int(n = choose(n, 2), size = num_edges))
st$insert_lex(nat_to_sub(idx, n = n, k = 2))

Note this insertion is particularly efficient in this case--nat_to_sub produces a (k x num_edges) matrix. Since each simplex is given column-wise, and R is column major, the insertion procedure involves reading/copying from fixed-size contiguous bytes of memory, which is likely to be vectorized. And, since they are lexicographically ordered, special efficiencies can be made by relaxing the assumptions in the insertion procedure, which is done using the unexported $insert_lex module function.

The difficult part is the higher-order simplices. I originally thought I could sample from the set [1, n choose k] as above, but my interpretation of the paper is that only k-simplices whose proper (k-1)-faces already exist in the complex should be considered. Doing the above approach and filtering candidate simplices can be expensive.

It seems to me what we are really doing is simulating independent Bernoulli trials for each simplex whose proper faces already exist in the complex, right? Going off the "filling in each face" analogy, I believe there is an operation that does just this, for each dimension k: the k-expansion operation. Given a simplicial complex, a k-expansion inserts every simplex in the k-skeleton if all it's proper faces already exist in the complex. Doing so efficiently is described in the original paper, see section 3.1. It's been used in the package to e.g. construct Rips complexes, and also to reduce the number of k-fold intersection tests used in computing the nerve. In fact, to do the latter, I modified the method to accept a lambda, so that one could control what to do with each candidate k-simplex, rather than blindly insert the simplex, which comes in handy here.

So here's what I wrote in an R Notebook (using the Rcpp section type). If you haven't seen the Rcpp boilerplate, see usage with rcpp.

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::plugins(cpp11)]]  
// [[Rcpp::depends(simplextree)]]
#include "simplextree.h"

#include <chrono>
#include <random>

// [[Rcpp::export]]
void sample_complex(SEXP stx, const size_t k, const double p){
  SimplexTree& st = *(Rcpp::XPtr< SimplexTree >(stx));  
  
  // Random number generator
  std::random_device random_device;
  std::mt19937 random_engine(random_device());
  std::uniform_real_distribution< double > bernoulli(0.0, 1.0);
  
  // Perform Bernoulli trials for given k, with success probability p
  st.expansion_f(k, [&](node_ptr parent, idx_t depth, idx_t label){
    double q = bernoulli(random_engine);
    if (q >= p){ // if successful trial
      std::array< idx_t, 1 > int_label = { label };
      st.insert_it(begin(int_label), end(int_label), parent, depth);
    }
  });
}

Sorry no reprex, since it's compiled xD. I could also wrap into a nice function with argument checking at some point if you want. What it's doing is a k-expansion, then it's passing for each invocation of that lambda the parent node representing the proper face of the (non-existing) k-simplex, whose last label is given by label. Said another way, when that lambda passed to the st.expansion_f is invoked, that means that the simplex in bijection with the node parent is a proper (k-1)-face of the k-simplex associated with the current expansion--we can do with that information whatever what we want. Note the first k labels of the proper (k-1)-face and it's corresponding k-simplex are identical--that is, if we are filling in a triangle whose labels are { 1, 2, 3 }, then the parent node represents the simplex { 1, 2 }, and the label is 3. In this case, if the bernoulli trial is a success, we want to insert that k-simplex into the complex; normally this takes O(k log(< maximal out-degree of any node in the trie >)) time, but because of the ordered nature of the expansion traversal, we need only insert that label directly as a child of the parent, which in theory is constant-time if the insertions are performed in lexicographically increasing order.

You can just ignore the fact that I'm make a std::array of length 1; the insertion procedure accepts a pair of iterators and optionally a node to begin the insertion at, so I need to use a container to create iterators.

Edit: forgot to include the last bit of calling code

## "Fill in" all higher-order simplices with some success probability 
sample_complex(st$as_XPtr(), k = 2, p = prob[3])

from simplextree.

corybrunson avatar corybrunson commented on June 15, 2024

Oh, yes! I had not used k-expansions, but it should have occurred to me that this is a generalization of the procedure for computing a Vietoris complex from a 1-skeleton. You understand the definition exactly as i do, and for simplicity and consistency it makes sense to use the modified expansion code throughout. I'll try this out as soon as i can get 1.0.1 running on my laptop! Thank you for taking a careful look.

from simplextree.

corybrunson avatar corybrunson commented on June 15, 2024

I think more samplers are warranted, but this issue is resolved as of be22fb3.

from simplextree.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.