Giter Club home page Giter Club logo

riemann-divisor-sum's Introduction

Divisor Sums for the Riemann Hypothesis

CircleCI Coverage Status Language grade: Python

An application that glibly searches for RH counterexamples, while also teaching software engineering principles.

Blog posts:

Development requirements

Requires Python 3.7 and

  • GMP for arbitrary precision arithmetic
  • gmpy2 for Python GMP bindings
  • postgres

On Mac OS X these can be installed via brew as follows

brew install gmp mpfr libmpc postgresql

Or using apt

apt install -y libgmp3-dev libmpfr-dev libmpc-dev postgresql-12

Then, in a virtualenv,

pip install -r requrements.txt

Local development with PostgreSQL

For postgres, create a new database cluster and start the server.

initdb --locale=C -E UTF-8 /usr/local/var/postgres
pg_ctl -D /usr/local/var/postgres -l /tmp/logfile start

Then create a database like

CREATE DATABASE divisor
    WITH OWNER = jeremy;  -- or whatever your username is

Then install the pgmp extension using pgxn

sudo pip install pgxnclient
pgxn install pgmp
pgxn load -d divisor pgmp

Note you may need to add the location of gmp.h to $C_INCLUDE_PATH so that the build step for pgmp can find it. This appears to be a problem mostly on Mac OSX. See dvarrazzo/pgmp#4 if you run into issues.

# your version may be different than 6.2.0. Find it by running
# brew info gmp
export C_INCLUDE_PATH="/usr/local/Cellar/gmp/6.2.0/include:$C_INCLUDE_PATH"

In this case, you may also want to build pgmp from source,

git clone https://github.com/j2kun/pgmp && cd pgmp
make
sudo make install

Running the program

Run some combination of the following three worker jobs

python -m riemann.generate_search_blocks
python -m riemann.process_search_blocks
python -m riemann.cleanup_stale_blocks

Deploying with Docker

Running with docker removes the need to install postgres and dependencies.

Locally

docker build -t divisordb -f docker/divisordb.Dockerfile .
docker build -t generate -f docker/generate.Dockerfile .
docker build -t process -f docker/process.Dockerfile .
docker build -t cleanup -f docker/cleanup.Dockerfile .

docker volume create pgdata

docker run -d --name divisordb -p 5432:5432 -v pgdata:/var/lib/postgresql/data divisordb:latest
export PGHOST=$(docker inspect -f "{{ .NetworkSettings.IPAddress }}" divisordb)

docker run -d --name generate --env PGHOST="$PGHOST" generate:latest
docker run -d --name cleanup --env PGHOST="$PGHOST" cleanup:latest
docker run -d --name process --env PGHOST="$PGHOST" process:latest

Manual inspection

After the divisordb container is up, you can test whether it's working by

pg_isready -d divisor -h $PGHOST -p 5432 -U docker

or by going into the container and checking the database manually

$ docker exec -it divisordb /bin/bash
# now inside the container
$ psql
divisor=# \d   # \d is postgres for 'describe tables'

              List of relations
 Schema |        Name        | Type  | Owner
--------+--------------------+-------+--------
 public | riemanndivisorsums | table | docker
 public | searchmetadata     | table | docker
(2 rows)

On EC2

# install docker, see get.docker.com
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu

# log out and log back in

git clone https://github.com/j2kun/riemann-divisor-sum && cd riemann-divisor-sum

Updating existing EC2 deployment

Fill out the environment variables from .env.template in .env, then run python deploy.py.

This will only work if the application has been set up initially (docker installed and the repository cloned).

Running the monitoring script

sudo apt install -y python3-pip ssmtp
pip3 install -r alerts/requirements.txt
sudo -E alerts/configure_ssmtp.sh
nohup python3 -m alerts.monitor_docker &

Exporting and plotting data

python -m riemann.export_for_plotting --data_source_name='dbname=divisor' --divisor_sums_filepath=divisor_sums.csv

# sort the csv by log_n using gnu sort
sort -t , -n -k 1 divisor_sums.csv -o divisor_sums_sorted.csv

# convert to hdf5
python -c "import vaex; vaex.from_csv('divisor_sums.csv', convert=True, chunk_size=5_000_000)"

python -m plot.plot_divisor_sums --divisor_sums_hdf5_path=divisor_sums.hdf5

riemann-divisor-sum's People

Contributors

j2kun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

riemann-divisor-sum's Issues

Overflow error

One of the processors failed with this error

File "/divisor/riemann/superabundant.py", line 129, in compute_riemann_divisor_sum
wv = ds / (n * math.log(math.log(n)))
OverflowError: 'mpz' too large to convert to float

I believe it was this search block

start
98,150004736

end
99,56599

Start time and end times are the same in production

divisor=# select start_time, end_time from searchmetadata where state = 'FINISHED';
         start_time         |          end_time          
----------------------------+----------------------------
 2021-02-09 04:15:00.315681 | 2021-02-09 04:15:00.315681
 2021-02-09 04:16:22.456979 | 2021-02-09 04:16:22.456979
 2021-02-09 04:17:49.202149 | 2021-02-09 04:17:49.202149
 2021-02-09 04:19:19.648897 | 2021-02-09 04:19:19.648897
 2021-02-09 04:20:52.012555 | 2021-02-09 04:20:52.012555
 2021-02-09 04:22:28.888724 | 2021-02-09 04:22:28.888724
 2021-02-09 04:13:11.032878 | 2021-02-09 04:13:11.032878

Bug with usage of CachedPartitionsOfN

The superabundant search strategy checks

if self.current_level[0] != self.search_index.level:

but self.current_level[0] is always, e.g., [4] if the level is 4, while self.search_index.level = 4. This causes the check to always fail and the level to always be recomputed. This doesn't harm correctness but harms efficiency.

We should have a special check for the zero-th element, or raise the partitions of n to an interface with a "level"-like function.

Add a simple script to automate deploying updated jobs

I think it would be something like: ssh into each server, run docker stop, docker rm, git pull, docker build, docker start

Stop in the order of processors, then cleanup/generate, then database.

Then apply any database migrations (haven't had one yet, thankfully).

Then pull, build, start in order of database, generate/cleanup, and processors.

Then write about how this is still tedious to manage, and leaves the application "down" for some time while things are being rebuilt. To keep it up at all times (e.g., if a frontend was serving info from the database) requires quite a bit more engineering...

Bug in superabundant -> partitions_of_n function

I was trying to reproduce the blog post and the code here and I think I have run into a bug.

partitions_of_n function gives the wrong partitions where n > 5. For example,

partitions_of_n(6) currently gives

list(enumerate([[6], [5, 1], [4, 2], [4, 1, 1],
        [3,3, 1], [3, 2, 1], [3,1,1,1],
         [2,2,2, 1], [2, 2, 1, 1], [2, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]]))

but it should be giving

list(enumerate([[6], [5, 1], [4, 2], [4, 1, 1],
        [3,3], [3, 2, 1], [3,1,1,1],
         [2,2,2], [2, 2, 1, 1], [2, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]])),

[3,3,1] and [2,2,2,1] instead of [3,3] and [2,2,2]

the partitions_of_n function works correctly till n = 5 and always gives the right number of partition count so the tests fail to capture this.

Setup automated test runs

I used to use Travis CI, but it looks like the company got bought and everyone fired, so I will try CircleCI instead

Add a worker job to mark stale "in_progress" blocks as "failed"

After #8, one more thing remains to allow the application to self-heal when things crash. After a crash, a search block will remain in the "in_progress" forever, and never be finished or claimed.

We should make a new job in a dedicated container, which, every few minutes, will load all metadata, find any "in progress" blocks for which the gap between the "start_time" and now is larger than 10x the median start-to-end gap among completed blocks, and mark it as failed.

It might also help to have a new failure_count field, so we can tell if this scheme goes awry and failure counts get super large.

Add a type-checking test

This may be difficult to do if the gmp-based types get in the way, but at least we can do it partially

Visualize some data!

I have a 60 GiB backup of the database from before we started doing hashes of blocks instead of storing all the data.

It would be nice to produce a visualization of this plot.

Some options

Vaex

https://vaex.io/

Export data to an hdf5 (memory-mapped numpy array?)
Use Vaex to lazily load (nyctaxi = vaex.open('s3://vaex/taxi/yellow_taxi_2009_2015_f32.hdf5?anon=true'))

will need to first convert n values to log scale (or maybe log log?)...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.