j2kun / riemann-divisor-sum Goto Github PK

View Code? Open in Web Editor NEW

20.0 5.0 2.0 7.33 MB

Code for the series "Searching for Riemann Hypothesis Counterexamples"

Home Page: https://jeremykun.com/2020/09/11/searching-for-rh-counterexamples-setting-up-pytest/

Python 1.40% Shell 0.02% Dockerfile 0.07% Awk 0.01% HTML 98.50%

gmp mathematics riemann-hypothesis postgres docker

riemann-divisor-sum's Introduction

Divisor Sums for the Riemann Hypothesis

An application that glibly searches for RH counterexamples, while also teaching software engineering principles.

Blog posts:

Development requirements

Requires Python 3.7 and

GMP for arbitrary precision arithmetic
gmpy2 for Python GMP bindings
postgres

On Mac OS X these can be installed via brew as follows

brew install gmp mpfr libmpc postgresql

Or using apt

apt install -y libgmp3-dev libmpfr-dev libmpc-dev postgresql-12

Then, in a virtualenv,

pip install -r requrements.txt

Local development with PostgreSQL

For postgres, create a new database cluster and start the server.

initdb --locale=C -E UTF-8 /usr/local/var/postgres
pg_ctl -D /usr/local/var/postgres -l /tmp/logfile start

Then create a database like

CREATE DATABASE divisor
    WITH OWNER = jeremy;  -- or whatever your username is

Then install the pgmp extension using pgxn

sudo pip install pgxnclient
pgxn install pgmp
pgxn load -d divisor pgmp

Note you may need to add the location of gmp.h to $C_INCLUDE_PATH so that the build step for pgmp can find it. This appears to be a problem mostly on Mac OSX. See dvarrazzo/pgmp#4 if you run into issues.

# your version may be different than 6.2.0. Find it by running
# brew info gmp
export C_INCLUDE_PATH="/usr/local/Cellar/gmp/6.2.0/include:$C_INCLUDE_PATH"

In this case, you may also want to build pgmp from source,

git clone https://github.com/j2kun/pgmp && cd pgmp
make
sudo make install

Running the program

Run some combination of the following three worker jobs

python -m riemann.generate_search_blocks
python -m riemann.process_search_blocks
python -m riemann.cleanup_stale_blocks

Deploying with Docker

Running with docker removes the need to install postgres and dependencies.

Locally

docker build -t divisordb -f docker/divisordb.Dockerfile .
docker build -t generate -f docker/generate.Dockerfile .
docker build -t process -f docker/process.Dockerfile .
docker build -t cleanup -f docker/cleanup.Dockerfile .

docker volume create pgdata

docker run -d --name divisordb -p 5432:5432 -v pgdata:/var/lib/postgresql/data divisordb:latest
export PGHOST=$(docker inspect -f "{{ .NetworkSettings.IPAddress }}" divisordb)

docker run -d --name generate --env PGHOST="$PGHOST" generate:latest
docker run -d --name cleanup --env PGHOST="$PGHOST" cleanup:latest
docker run -d --name process --env PGHOST="$PGHOST" process:latest

Manual inspection

After the divisordb container is up, you can test whether it's working by

pg_isready -d divisor -h $PGHOST -p 5432 -U docker

or by going into the container and checking the database manually

$ docker exec -it divisordb /bin/bash
# now inside the container
$ psql
divisor=# \d   # \d is postgres for 'describe tables'

              List of relations
 Schema |        Name        | Type  | Owner
--------+--------------------+-------+--------
 public | riemanndivisorsums | table | docker
 public | searchmetadata     | table | docker
(2 rows)

On EC2

# install docker, see get.docker.com
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu

# log out and log back in

git clone https://github.com/j2kun/riemann-divisor-sum && cd riemann-divisor-sum

Updating existing EC2 deployment

Fill out the environment variables from .env.template in .env, then run python deploy.py.

This will only work if the application has been set up initially (docker installed and the repository cloned).

Running the monitoring script

sudo apt install -y python3-pip ssmtp
pip3 install -r alerts/requirements.txt
sudo -E alerts/configure_ssmtp.sh
nohup python3 -m alerts.monitor_docker &

Exporting and plotting data

python -m riemann.export_for_plotting --data_source_name='dbname=divisor' --divisor_sums_filepath=divisor_sums.csv

# sort the csv by log_n using gnu sort
sort -t , -n -k 1 divisor_sums.csv -o divisor_sums_sorted.csv

# convert to hdf5
python -c "import vaex; vaex.from_csv('divisor_sums.csv', convert=True, chunk_size=5_000_000)"

python -m plot.plot_divisor_sums --divisor_sums_hdf5_path=divisor_sums.hdf5

riemann-divisor-sum's People

Contributors

Stargazers

Watchers

Forkers

sinakooshamanesh lgtm-migrator

riemann-divisor-sum's Issues

Overflow error

One of the processors failed with this error

File "/divisor/riemann/superabundant.py", line 129, in compute_riemann_divisor_sum
wv = ds / (n * math.log(math.log(n)))
OverflowError: 'mpz' too large to convert to float

I believe it was this search block

start
98,150004736

end
99,56599

Start time and end times are the same in production

divisor=# select start_time, end_time from searchmetadata where state = 'FINISHED';
         start_time         |          end_time          
----------------------------+----------------------------
 2021-02-09 04:15:00.315681 | 2021-02-09 04:15:00.315681
 2021-02-09 04:16:22.456979 | 2021-02-09 04:16:22.456979
 2021-02-09 04:17:49.202149 | 2021-02-09 04:17:49.202149
 2021-02-09 04:19:19.648897 | 2021-02-09 04:19:19.648897
 2021-02-09 04:20:52.012555 | 2021-02-09 04:20:52.012555
 2021-02-09 04:22:28.888724 | 2021-02-09 04:22:28.888724
 2021-02-09 04:13:11.032878 | 2021-02-09 04:13:11.032878

Bug with usage of CachedPartitionsOfN

The superabundant search strategy checks

if self.current_level[0] != self.search_index.level:

but self.current_level[0] is always, e.g., [4] if the level is 4, while self.search_index.level = 4. This causes the check to always fail and the level to always be recomputed. This doesn't harm correctness but harms efficiency.

We should have a special check for the zero-th element, or raise the partitions of n to an interface with a "level"-like function.

Add a simple script to automate deploying updated jobs

I think it would be something like: ssh into each server, run docker stop, docker rm, git pull, docker build, docker start

Stop in the order of processors, then cleanup/generate, then database.

Then apply any database migrations (haven't had one yet, thankfully).

Then pull, build, start in order of database, generate/cleanup, and processors.

Then write about how this is still tedious to manage, and leaves the application "down" for some time while things are being rebuilt. To keep it up at all times (e.g., if a frontend was serving info from the database) requires quite a bit more engineering...

Bug in superabundant -> partitions_of_n function

I was trying to reproduce the blog post and the code here and I think I have run into a bug.

partitions_of_n function gives the wrong partitions where n > 5. For example,

partitions_of_n(6) currently gives

list(enumerate([[6], [5, 1], [4, 2], [4, 1, 1],
        [3,3, 1], [3, 2, 1], [3,1,1,1],
         [2,2,2, 1], [2, 2, 1, 1], [2, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]]))

but it should be giving

list(enumerate([[6], [5, 1], [4, 2], [4, 1, 1],
        [3,3], [3, 2, 1], [3,1,1,1],
         [2,2,2], [2, 2, 1, 1], [2, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]])),

[3,3,1] and [2,2,2,1] instead of [3,3] and [2,2,2]

the partitions_of_n function works correctly till n = 5 and always gives the right number of partition count so the tests fail to capture this.

Setup automated test runs

I used to use Travis CI, but it looks like the company got bought and everyone fired, so I will try CircleCI instead

Add an automated database backup

See https://docs.docker.com/storage/volumes/#backup-a-container

Add alerting when processor jobs fail

I'm not quite sure how I want to do this yet, but creating this issue as a placeholder.

Add a worker job to mark stale "in_progress" blocks as "failed"

After #8, one more thing remains to allow the application to self-heal when things crash. After a crash, a search block will remain in the "in_progress" forever, and never be finished or claimed.

We should make a new job in a dedicated container, which, every few minutes, will load all metadata, find any "in progress" blocks for which the gap between the "start_time" and now is larger than 10x the median start-to-end gap among completed blocks, and mark it as failed.

It might also help to have a new failure_count field, so we can tell if this scheme goes awry and failure counts get super large.

Try exploring colossally abundant numbers

https://twitter.com/NicolasTessore/status/1402638778611687424

https://drive.google.com/file/u/0/d/1lAT_UhAKAe1EbtFZWEDMs9mL-jGB_tB7/edit

Add a type-checking test

This may be difficult to do if the gmp-based types get in the way, but at least we can do it partially

Add verification script for ensuring we don't skip any search blocks

The bug fixed by 5da4bbe should have a regression test, suggests a new job that asserts the search blocks in the database are contiguous, and suggests refactoring the Search Strategy classes/interfaces to avoid the mutability that enabled this bug to slip through.

Look for patterns in the prime factorizations of numbers with large witness values

Visualize some data!

I have a 60 GiB backup of the database from before we started doing hashes of blocks instead of storing all the data.

It would be nice to produce a visualization of this plot.

Some options

Vaex

https://vaex.io/

Export data to an hdf5 (memory-mapped numpy array?)
Use Vaex to lazily load (nyctaxi = vaex.open('s3://vaex/taxi/yellow_taxi_2009_2015_f32.hdf5?anon=true'))

will need to first convert n values to log scale (or maybe log log?)...