jellevanderwerff / thebeat Goto Github PK

View Code? Open in Web Editor NEW

16.0 3.0 0.0 26.38 MB

thebeat: Rhythms in Python for science

Home Page: https://thebeat.readthedocs.io

License: GNU General Public License v3.0

Python 100.00%

cognition music rhythm timing

thebeat's Introduction

thebeat: Rhythms in Python for Science

thebeat is a Python package for working with temporal sequences and rhythms in the behavioural and cognitive sciences. It provides functionality for creating stimuli, and for visualizing and analyzing temporal data.

As a collection of accepted methods for use in music and timing research, thebeat will save you time when creating experiments or analyzing data.

thebeat is an open-source, on-going, and collaborative project, integrating easily with the existing Python ecosystem, and with your own scripts. The package was specifically designed to be useful for both skilled and novice programmers.

Documentation

The package documentation is available from https://thebeat.readthedocs.io. The documentation contains detailed descriptions of all package functionality, as well as a large number of (copyable) examples.

Installation

thebeat is available through PyPI, and can be installed using:

pip install thebeat

Note that if you want to use thebeat's functionality for plotting musical notation, you have to install it using:

pip install 'thebeat[music_notation]'

This will install thebeat with the optional dependencies abjad and Lilypond.

thebeat is actively tested on Linux, macOS, and Windows. We aim to provide support for all supported versions of Python (3.8 and higher).

Try directly via Binder

If you first would like to try thebeat, or of you wish to use it in, for instance, an educational setting, you can use this link to try thebeat in a Binder environment.

Getting started

The code below illustrates how we might create a simple trial for use in an experiment:

from thebeat import Sequence, SoundStimulus, SoundSequence

seq = Sequence.generate_isochronous(n_events=10, ioi=500)
sound = SoundStimulus.generate(freq=440, duration_ms=50, onramp_ms=10, offramp_ms=10)
trial = SoundSequence(sound, seq)

trial.play()  # play sound over loudspeakers
trial.plot_waveform()  # plot as sound waveform
trial.plot_sequence()  # plot as an event plot
trial.write_wav('example_trial.wav')  # save file to disk

Open discussion

One of the reasons for creating thebeat was the lack of a collection of standardized/accepted methods for use in rhythm and timing research. Therefore, an important part of thebeat's merit lies in opening discussions about the methods that are included. As an example, there are different ways of calculating phase differences and integer ratios, and we imagine people to have different opinions about which method to use. Where possible, we have included references to the literature in the package documentation. But, we encourage anyone with an opinion to openly question the methods that thebeat provides.

There are two places where you can go with comments and/or questions:

You can click the 'Issues' tab at the top of this GitHub page, and start a thread. Note that this place is mostly for questioning methods, or for reporting bugs.
You can drop by in our Gitter chatroom. This is likely the best place to go to with questions about how thebeat works.

License

thebeat is distributed under the GPL-3 license. You are free to distribute or modify the code, both for non-commercial and commercial use. See here for more info.

Collaborators

The package was developed by the Comparative Bioacoustics Group at the Max Planck Institute for Psycholinguistics, in Nijmegen, the Netherlands.

The collaborators were: Jelle van der Werff, Andrea Ravignani, and Yannick Jadoul.

thebeat's People

Stargazers

Watchers

thebeat's Issues

Handling of matplotlib styles by thebeat

Following up on #66: matplotlib behaves somewhat unpredictable if the figure creation, plotting, and/or showing/saving happen in different style contexts. For example, the current style's figure.figsize seems to be accessed upon the creation of a Figure, colors and other properties of lines during the plotting of those lines, default DPI to save a figure during savefig, etc.

My suggestion would be to kick out all the style= parameters from our plotting functionality and teach users to use plt.style.use(...) and plt.style.context(...) in the docs and examples. It would clean up our code, it would not really change much for the users, and we'd probably be using matplotlib more correctly.

Plus, it would allow the modern seaborn styles to be used with thebeat: seaborn does not have its new styles as matplotlib styles anymore, and provides sns.set_style() or sns.axes_style instead of going through plt.style: https://seaborn.pydata.org/tutorial/aesthetics.html

Furthermore, thebeat's default style (in most places) is now the old seaborn-v0_8 style, and users cannot tell thebeat to not override the style (see #44).

64bit wav files not readable in Praat on Windows

Apparently the wav files that thebeat saves cannot be opened in Praat on Windows (on Mac it works):

Different return type than given

E.g. for thebeat.stats.get_ugof_isochronous, the expected return type is np.float64, even though the function returns np.float32.

Sequence.quantize() cannot change onsets

stim_q = stim.quantize(to=peak/16)
  File "/Users/jellevanderwerff/npor_analysis/.venv/lib/python3.9/site-packages/thebeat/core/sequence.py", line 783, in quantize
    self.onsets = np.round(self.onsets / to) * to
  File "/Users/jellevanderwerff/npor_analysis/.venv/lib/python3.9/site-packages/thebeat/core/sequence.py", line 134, in onsets
    raise ValueError(
ValueError: Cannot change onsets of sequences that end with an interval. This is because we need to know the final IOI for such sequences. Either reconstruct the sequence, or change the IOIs.

Will change quantization to quantize using IOIs instead of onsets. Also implement tests.

Phase space plot not square

Something is going wrong with the axes tick values for producing phase space plots. As a result, the plot is not necessarily square.

PDF output for plot_rhythm returns full A4

When using Rhythm.plot_rhythm(filepath='test.pdf') we get a full sized A4 PDF, instead of a cropped version.

Add info about running plt.show()

Our plotting functions call fig.show(), but not plt.show(). If running in interactive mode, in a notebook etc., the plot will show. If not, plt.show() must be called.

We need to mention this in the documentation:

In the usage examples
In the docstrings
In the docstrings examples

Figure out what to do with quantization of onsets vs. intervals

What to do for instance in cases where the first onset is not at t=0

LilyPond figures and matplotlib

Following up on #66: how should the LilyPond-generated figures interact with matplotlib?

A couple of discussion points (partially taken from #66 (comment)):

I feel like the dpi argument is somewhat confusingly mixed between lilypond and matplotlib? Is there a point in generating a higher-resolution image from lilypond, but
What's the reason for putting the lilypond-generated figure inside another matploblib figure? This also runs the risk of users saving the matplotlib plot of the previously-saved-and-read lilypond image?
Moreover, if the idea is that the lilypond image could be plotted in the context of a larger plot, a user can currently not choose at which coordinates and on which scale to put the image in the larger plot?

Return deep copy by default for .copy()

Better type checking using the numbers module

Sometimes input validation is done using e.g. isinstance(input, (int, float, np.float64)) etc. Better change this to use the numbers module, e.g. isinstance(input, Integral)

When plotting a Sequence with lots of onsets, not all lines are drawn

Example:

fft_values makes rounding error?

s = thebeat.Sequence([500, 502, 499, 500])
thebeat.stats.fft_plot(s, 1000)
plt.show()

returns:

Traceback (most recent call last):
  File "/Users/jellevanderwerff/thebeat/scratch.py", line 8, in <module>
    thebeat.stats.fft_plot(s, 1000)
  File "/Users/jellevanderwerff/thebeat/thebeat/stats.py", line 750, in fft_plot
    ax.plot(xf, yf)
  File "/Users/jellevanderwerff/thebeat/venv/lib/python3.9/site-packages/matplotlib/axes/_axes.py", line 1721, in plot
    lines = [*self._get_lines(self, *args, data=data, **kwargs)]
  File "/Users/jellevanderwerff/thebeat/venv/lib/python3.9/site-packages/matplotlib/axes/_base.py", line 303, in __call__
    yield from self._plot_args(
  File "/Users/jellevanderwerff/thebeat/venv/lib/python3.9/site-packages/matplotlib/axes/_base.py", line 499, in _plot_args
    raise ValueError(f"x and y must have same first dimension, but "
ValueError: x and y must have same first dimension, but have shapes (1001,) and (1002,)

This is only on the main branch, not on the stable branch, so has to do with 08f87a4 .

Error when calling lilypond

Python version: 3.11.2
thebeat version: 0.1.1.dev3+gb4949b3.d20230731

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[13], line 6
      3 time_sign_numerator = seq_q.duration [/](https://file+.vscode-resource.vscode-cdn.net/) max_x
      4 r = thebeat.music.Rhythm(seq_q.iois, time_signature=(time_sign_numerator, 4), beat_ms=max_x)
----> 6 r.plot_rhythm()

File [~/thebeat/thebeat/_decorators.py:36](https://file+.vscode-resource.vscode-cdn.net/Users/jellevanderwerff/npor_analysis/scripts/plots/~/thebeat/thebeat/_decorators.py:36), in requires_lilypond..requires_lilypond_wrapper(*args, **kwds)
     32     raise ImportError("This function or method requires lilypond for plotting notes. You can install this "
     33                       "opional depencency with pip install thebeat[music_notation].\n"
     34                       "For more details, see https://thebeat.readthedocs.io/en/latest/installation.html.")
     35 orig_path = os.environ["PATH"]
---> 36 os.environ["PATH"] += os.pathsep + os.path.dirname(lilypond.executable())
     37 return_value = f(*args, **kwds)
     38 os.environ["PATH"] = orig_path

AttributeError: 'NoneType' object has no attribute 'executable'

Allow a freeform rhythm

So a rhythm without time signatures, nor measures. This is useful when for instance plotting a sequence of taps as musical notes.

Rhythm.generate_random_rhythm() does deep-first search every time

Another option would be to cache a list of all the combinations of different Rhythms.

When concatenating sequences it does not allow numbers on the left-hand side

Say we want a Sequence that starts with a silence, we now cannot do:

seq = 5000 + Sequence.generate_isochronous(10, 500)

The resulting Sequence will simply have first_onset=5000

Auto-rerun notebooks on GitHub actions

Create a workflow that will automatically rerun the notebooks in the documentation whenever they get changed

thebeat.stats.fft_plot should not discard first value!

Two things: First, the fft_plot function now discards the first value at x = 0. The reasoning was that here the power/y value is always the highest, and that therefore it only confuses rather than informs your analysis. However, it's a bit strange to get a graph that doesn't have the highest power at x=0, and so this way it confuses rather than informs...

Second, when doing a Fourier transform on a series of IOIs made up of only 8th and 16th notes, the highest power (in most cases) should be at the 1/16th note (i.e. 1/4th of the beat IOI). However, if we follow the example from the docs, this happens:

import numpy as np
import thebeat
rng = np.random.default_rng(123)
r = thebeat.music.Rhythm.generate_random_rhythm(1, 500, allowed_note_values=[8, 16], rng=rng)
seq = r.to_sequence()
fig, ax = thebeat.stats.fft_plot(seq, 1000)
x_data, y_data = ax.lines[0].get_data()
max_y_index = np.argmax(y_data)
max_x = x_data[max_y_index]
print(1000 / max_x)

this way, you get the wrong value (namely 117.1875). However, if you do:

max_x = x_data[max_y_index - 1]

you get the right value (namely 125; i.e. 1/4th of the beat_ms).

Error not raised when constructing a SoundSequence with end_with_interval=True and too long sound

The doesn't raise a helpful error:

seq = thebeat.Sequence([500, 500, 10], end_with_interval=True)
s = thebeat.SoundStimulus.generate(duration_ms=50)
ss = thebeat.SoundSequence(s, seq)

Whereas this does:

seq = thebeat.Sequence([10, 500, 500], end_with_interval=False)
s = thebeat.SoundStimulus.generate(duration_ms=50)
ss = thebeat.SoundSequence(s, seq)

Idea: create function for amplitude-modulated (AM) sounds

Function to produce rhythm schematics

As soon as Rhythm allows multiple layers, we might make plotting functions for creating schematics, such as in Fig. 1 of 10.1371/journal.pone.0097467

Calculating phase differences

Warning: thebeat: The first onset of the test sequence was at t=0.
This would result in a phase difference that is always 0, which is not very informative.
Therefore, the first phase difference was discarded.
If you want the first onset at a different time than zero, use the Sequence.from_onsets() method to create the Sequence object.

`Sequence.generate_random_poisson` does not generate IOIs

Sequence.generate_random_poisson generates a random sample of the number of events and uses it as IOIs. This does not seem to make sense, as it does not produce the time until the next sample, but just samples integers.
This includes 0, which results in an error, cause 0 is not a valid IOI length.

Create Sequence.from_binary_string() class method

Sometimes, patterns are represented as, for instance, '101011100', etc. Where, each digit represents a point on a theoretical grid, zero means silence, and 1 means an onset. So, the example could be represented, in integer numerators as 2 2 1 1 3.

Plot multiple sequences combines sequences with the same name

When two sequences have the same name attribute they are erroneously combined into one event plot. This has to do with using a categorical variable in matplotlib for the y axis

suppress_display, fig.show(), and plt.show()

In an interactive environment, the plotting functions work as expected, where they are shown when calling e.g. Sequence.plot_sequence(), and not shown when calling Sequence.plot_sequence(suppress_display=True). The problem is that in a non-interactive environment one needs to explicitly call plt.show() in order to see the plot, and also suppress_display does not seem to be working.

Add REPP-like phase/alignment calculations?

See:

Proof of concept with some NumPy broadcasting magic:

import numpy as np

ref_onsets = np.array([500, 1500, 2000, 3000])
ref_iois = np.diff(ref_onsets)
test_onsets = np.array([-100, 100, 500, 600, 1000, 1400, 1500, 1600, 2000, 2500, 3000, 3500, 4000])

onset_offsets = test_onsets[:, None] - ref_onsets[None, :]

prev_ref_iois = np.concatenate([[ref_iois[0]], ref_iois])
next_ref_iois = np.concatenate([ref_iois, [ref_iois[-1]]])

matching_iois = np.where(onset_offsets < 0, prev_ref_iois, next_ref_iois)
phase_offsets = onset_offsets / matching_iois

print([phase for (distance, phase) in zip(np.abs(onset_offsets).ravel(), phase_offsets.ravel()) if -0.4 < phase < 0.4 and distance < 1999])

Fix rounding terminology and consistency

When onsets of a sequence do exactly match the samples, a warning is given about rounding the onsets, and:

The warning (and documentation) says "[...] were rounded off to the neirest integer ceiling." (which btw now gets underlined as being a typo of "nearest")
The actual code uses start_pos = int(start_pos), which rounds down (i.e. takes the floor, not ceiling)
The suggested round_onsets() function uses np.round, which rounds to the closest integer (i.e., floor or ceiling, depending on the actual value).

This should be made consistent, both in code as well as in the warning's text and docs.
I'm not sure what the most obvious best one is, flooring or rounding; there's probably some argument to be made for both. My intuition says that flooring might have better mathematical properties, but I might be wrong. I do think ceiling makes the least sense of all three options.

ugof calculation: correlation between number of onsets and ugof

MWE

import thebeat
import numpy as np

seq_iois = np.random.default_rng(123).normal(500, 50, 100)
n_onsets = range(1, len(seq_iois) + 2)
ugofs = []

# get ugofs for increasing number of onsets
for n in n_onsets:
    seq = thebeat.Sequence(seq_iois[:n])
    ugofs.append(thebeat.stats.get_ugof_isochronous(seq, 500))

# calculate correlation between number of onsets and ugof
corr = np.corrcoef(n_onsets, ugofs)[0, 1]
print(corr)  # output: 0.9352718532998031

add ``plot_multiple_rhythms``

Instead of creating a new class MultiRhythm (or something), start with creating a simple function for plotting multiple rhythms. Reasonig: In principle, it is already possible to overlay SoundSequence objects to create a multi-rhythm.

Turn on nitpicky and sphinx-linkcheck for docs

Small thing, but building docs with nitpicky and running linkcheck should catch potential future issues with outdate/bad links in the docs. In Parselmouth, it already caught several dead/redirected external links.

Change some properties into functions

There are a number of class properties which really make more sense as functions. For instance, Sequence.interval_ratios_from_dyads. Also to have consistency with e.g. pandas.DataFrame.mean().

Sequence duration returns wrong value in case of first_onset != 0

Sequence.duration now returns the sum of the IOIs. However, this is incorrect for sequences that do not have the first onset at t = 0.

reset index of pd dataframe returned by thebeat.utils.get_ioi_df

It seems the index is not reset after concatenating different dataframes

Idea: Make function for calculating rhythmic contours

I.e. for making sequences that are simply 'short-longer-longer-shorter', etc.

SoundStimulus/SoundSequence: Working with different channels

Idea for the capacity to easily produce wav files with sequences of waveforms on one channel and sequences of pulses for later synchronization purposes.

Make fft_values function

Instead of everywhere writing x, y = ax[0].get_data() to find the peaks etc., make a function fft_values, like acf_values.

Return new Sequences for methods

Consider whether we want to have e.g. Sequence.change_tempo() return a new Sequence

Entropy: use resolution argument instead of bin_fraction

Now, the thebeat.stats.get_rhythmic_entropy uses a 'bin_fraction' based on the tempo of the provided sequence. For consistency, and for allowing people to choose those values themselves, change this to resolution, similar to the edit distance functions

thebeat.helpers.sequence_to_binary should return integers

Entropy

Do input validation; I was now able to supply something different than a Sequence and got non-intuitive error

Don't force a style onto users of plotting functions

The current default style for all plots is some old, deprecated version of seaborn's style, through matplotlib.

This does not seem correct or extandable. If I apply my own style before plotting, I don't want it to be overwritten by seaborn!

For example, the following code does not use seaborn's modern "white" style:

import seaborn as sns
with sns.axes_style("white"):
    seq.plot_sequence()

As far as I can tell, there's also no way to force this (modern, non-v0_8 version of) seaborn style, as seaborn or other libraries do not extend the plt.style.available list.

At a minimum, this should work, by taking None as default parameter, not changing the already set style.

In my opinion, the style kwarg does not really have a place at all in these plotting (and this issue is symptom of that). Rather than

seq1.plot_sequence(style='blahblah')
seq2.plot_sequence(style='blahblah')
...

I think we should teach users/make examples with:

with plt.style.context('blahblah'):  # Or something from sns!
    seq1.plot_sequence()
    seq2.plot_sequence()

(For more plots/sequences, note that this is also less repetitive)
But this is obviously a bigger change and should be discussed further, cause I know you don't fully agree here.

rhythm = thebeat.music.Rhythm.from_integer_ratios([1, 2, 3, 2, 1, 2, 3, 2])