A project from Digital Signal Processing course
- Python 3.6
- numpy
- librosa
- pysoundfile
- sounddevice
- matplotlib
- scikit-learn
- tensorflow
- keras
Dataset could be downloaded at Dataverse or Github.
I'd recommend use ESC-10 for the sake of convenience.
Example:
├── 001 - Cat
│ ├── cat_1.ogg
│ ├── cat_2.ogg
│ ├── cat_3.ogg
│ ...
...
└── 002 - Dog
├── dog_barking_0.ogg
├── dog_barking_1.ogg
├── dog_barking_2.ogg
...
Put audio files (.wav untested) under data directory and run the following command:
python feat_extract.py
Features and labels will be generated and saved in the directory.
Make sure you have scikit-learn installed and feat.npy and label.npy under the same directory. Run svm.py and you could see the result.
Install tensorflow and keras at first. Run nn.py to train and test the network.
- Run
cnn.py -tto train and test a CNN. Optionally set how many epochs to train on. - Predict files by either:
- Putting target files under
predict/directory and runningcnn.py -p - Recording on the fly with
cnn.py -P
- Putting target files under