Need C or Python Script for calculating the cumulative time span - corresponding to 'on' status of midi note.
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

We are only given a data.dat or data.csv file which has following pattern of rows ranging from a few hundreds up to 80000-90000 and 6 columns:

Table 1

1, 759, Note_off_c, 0, 62, 64
1, 759, Note_on_c, 0, 63, 50
1, 765, Note_off_c, 0, 63, 64
1, 765, Note_on_c, 0, 64, 64
1, 778, Note_on_c, 0, 83, 23
1, 816, Note_off_c, 0, 83, 64

The program should outright ignore, 1st 4th and 6th columns; and focus only on 2nd, 3rd and 5th columns from table-1.

Table 2.

Col1 = time stamp
Col2 = Note status
Col3 = piano key Number (strictly ranging between 0-127) designating the note or the key which was hit on or off:
759, Note_on_c, 63,
765, Note_off_c, 63,
765, Note_on_c, 64,
778, Note_on_c, 83,
816, Note_off_c, 83,

The first column among table 2 is sort of a time stamp and its value can very between 0, to several tens of thousands
2nd column in table 2 shows the status of a musical note being played (status can be either Note_on_C or Note_off_c) Note_on_c means the note is played on and Note_off_c means the note is silenced off.
The last column shows the designated number ranging from 0 to 127 pertaining to the note
which has either gone on or off.

  1. The program should check the status of note, If it is off, it should note down the time-stamp, and deduct from it the time stamp when it was LAST switched on; so that we know for how long the note was on! And..
  2. The program should add-up these time spans grouped against each corresponding key number (3rd column (table 2)), to produce a simple table (127 x 2) with key numbers (0-127) and corresponding cumulative time span suggesting the duration it was on..
Is the file ordered by a timestamp?
TheOsch 1 year ago
Hello TheOsch Sorry for my late response, The data file is ordered by sequence of instances of note on or off events over time. Great job done indeed. You got the bounty. Thank you so much for your time and effort from the bottom of my heart.
Decopper 1 year ago
Thank you too. If the data file is ordered then you can remove data.sort() from the script, it will work faster.
TheOsch 1 year ago
Hi TheOsch I've posted a new bounty at and I thought you might be interested in it. Regards, Decopper
Decopper 1 year ago
awarded to TheOsch

Crowdsource coding tasks.

1 Solution

Winning solution

# Cumulative time for each note
time_on = [0] * 128

# Last 'on' time for each note. -1 means that the note isn't on
last_on = [-1] * 128

def map_line(line):
    # turn a line from the data file into a tuple
    (_, timestamp, status, _, note, _) = line.split(',')
    return (
        status.strip() == 'Note_on_c',

data_file = open('data.dat', 'r')
data = [map_line(line) for line in data_file.readlines()]


for (time, note_on, note) in data:
    if note_on and last_on[note] == -1:
        last_on[note] = time
    elif not note_on and last_on[note] != -1:
        time_on[note] += time - last_on[note]
        last_on[note] = -1

for i in range(128):
    print('{}, {}'.format(i, time_on[i]))