Giter Club home page Giter Club logo

query_repeat_part_by_audio's Introduction

功能

传入多段音频,找出其中相同的部分,并给出各个音频相同部分的起止时间

环境

Python=3.8.3
librosa=0.8.0
numpy=1.19.2
scipy=1.4.1
pandas=1.0.5
soundfile=0.10.3
matplotlib=3.3.0

音频特征

采样率:16000
比特数:16bit
声道:单声道

原理

  1. 倒排索引
  2. Shazam音频指纹提取算法

运行

1 安装所需模块:pip install -r requirements.txt,然后将多个音频放入audio_data文件夹下

2 首先从百度网盘里将数据下载完毕放入audio_data文件夹里(链接:https://pan.baidu.com/s/1YZWSFYUciFCeF6QyedMCOA 提取码:anpj )
然后直接运行python main.py可将结果打印在屏幕上

或者运行./main.sh,结果会在log里
log中会出现如下内容:
path: ./audio_data/66.wav sr: 16000 duration: 959.0 feature.shape: (34479, 3)
path: ./audio_data/70.wav sr: 16000 duration: 204.0 feature.shape: (6784, 3)
path: ./audio_data/72.wav sr: 16000 duration: 459.0 feature.shape: (17897, 3)
target_advise_list of 66.wav: ['70', '72'] # 若音频数量过大,可通过倒排索引快速定位目标音频
['66', 104.8875, 119.075] ['70', 2.7375, 16.925] 解释:音频66的第104秒到119秒和音频70的第2秒到第16秒是相同的内容(广告)
['66', 895.975, 908.9625] ['72', 430.9375, 443.9375]
['66', 90.4875, 104.2625] ['72', 415.45, 429.2375]
['66', 940.5, 952.9375] ['72', 130.475, 142.9]
target_advise_list of 70.wav: ['72', '66']
['70', 2.9375, 16.925] ['72', 355.05, 369.0375]
['70', 2.7375, 16.925] ['66', 104.8875, 119.075]
target_advise_list of 72.wav: ['70', '66']
['72', 355.05, 369.0375] ['70', 2.9375, 16.925]
['72', 430.9375, 443.9375] ['66', 895.975, 908.9625]
['72', 415.45, 429.2375] ['66', 90.4875, 104.2625]
['72', 130.475, 142.9] ['66', 940.5, 952.9375]
若想中途停止,运行./stop.sh

参考

[1]. https://www.toptal.com/algorithms/shazam-it-music-processing-fingerprinting-and-recognition
[2]. https://zhuanlan.zhihu.com/p/75360272
[3]. https://github.com/lukemcraig/AudioSearch
[4]. An Industrial-Strength Audio Search Algorithm

query_repeat_part_by_audio's People

Contributors

anpengjin avatar

Stargazers

Chenby-26 avatar zhujiem avatar  avatar  avatar  avatar wbglearn avatar chdzhangqi avatar  avatar

Watchers

 avatar

Forkers

cby2566 cwoner

query_repeat_part_by_audio's Issues

在 Python3.9 环境下出现了问题

Traceback (most recent call last):
  File "D:\Work_woker\_user\py_class\query_repeat_part_by_audio\main.py", line 233, in <module>
    test()
  File "D:\Work_woker\_user\py_class\query_repeat_part_by_audio\main.py", line 230, in test
    main(audio_path_list)
  File "D:\Work_woker\_user\py_class\query_repeat_part_by_audio\main.py", line 213, in main
    audio = Audio(audio_path)
  File "D:\Work_woker\_user\py_class\query_repeat_part_by_audio\delete_repeat_advise\audio_feature.py", line 229, in __init__
    self.get_audio_params(self.audio_path)
  File "D:\Work_woker\_user\py_class\query_repeat_part_by_audio\delete_repeat_advise\audio_feature.py", line 234, in get_audio_params
    self.audio_feature = self.audio_obj.get_audio_feature(self.y, self.sr, 1)
  File "D:\Work_woker\_user\py_class\query_repeat_part_by_audio\delete_repeat_advise\audio_feature.py", line 35, in get_audio_feature
    return self.get_fingerprints(audio_data, audio_sr)
  File "D:\Work_woker\_user\py_class\query_repeat_part_by_audio\delete_repeat_advise\audio_feature.py", line 49, in get_fingerprints
    fingerprints = self._get_fingerprints_from_peaks(len(f) - 1, f_step, peak_locations, len(t) - 1, t_step)
  File "D:\Work_woker\_user\py_class\query_repeat_part_by_audio\delete_repeat_advise\audio_feature.py", line 103, in _get_fingerprints_from_peaks
    paired_df_peak_locations, n_pairs = self._query_dataframe_for_peaks_in_target_zone_binary_search(
  File "D:\Work_woker\_user\py_class\query_repeat_part_by_audio\delete_repeat_advise\audio_feature.py", line 155, in _query_dataframe_for_peaks_in_target_zone_binary_search
    paired_df_peak_locations = df_peak_locations.loc[t_index & f_index]
  File "C:\Users\chen\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py", line 81, in new_method
    return method(self, other)
  File "C:\Users\chen\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arraylike.py", line 70, in __and__
    return self._logical_method(other, operator.and_)
  File "C:\Users\chen\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\indexes\base.py", line 6791, in _logical_method
    res_values = ops.logical_op(lvalues, rvalues, op)
  File "C:\Users\chen\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py", line 394, in logical_op
    res_values = na_logical_op(lvalues, rvalues, op)
  File "C:\Users\chen\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py", line 304, in na_logical_op
    result = op(x, y)
ValueError: operands could not be broadcast together with shapes (99,) (11,)

是否可以将librosa 0.8.0numpy 1.19.2升级至librosa 0.9.2numpy 1.23.5

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.