Skip to content

Commit 2fad9e8

Browse files
committed
generalize to a portion of recordings
1 parent 899eacd commit 2fad9e8

5 files changed

Lines changed: 65 additions & 49 deletions

File tree

README.md

Lines changed: 31 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1092,24 +1092,31 @@ file.
10921092
## Measuring Generalization ##
10931093

10941094
Up to this point we have validated on a small portion of each recording. Once
1095-
you have annotated many recordings though, it is good to set aside entire WAV
1096-
files to validate on. In this way we measure the classifier's ability to
1095+
you have annotated many recordings though, it is good to set aside entire
1096+
recordings to validate on. In this way we measure the classifier's ability to
10971097
extrapolate to different microphones, individuals, or whatever other
10981098
characteristics that are unique to the withheld recordings.
10991099

11001100
To train one classifier with a single recording or set of recordings withheld
11011101
for validation, first click on `Generalize` and then `Omit All`. Use the `File
1102-
Browser` to either select (1) specific WAV file(s), (2) a text file
1103-
containing a list of WAV file(s) (either comma separated or one per line), or
1104-
(3) the `Ground Truth` folder or a subdirectory therein. Finally press the
1105-
`Validation Files` button and `DoIt!`.
1102+
Browser` to either select (1) specific WAV file(s), (2) a text file containing
1103+
a list of WAV file(s) (either comma separated or one per line), or (3) a
1104+
subdirectory within the `Ground Truth` folder. Finally press the `Validation
1105+
Files` button and `DoIt!`.
11061106

11071107
To train multiple classifiers, each of which withholds a single recording in a
11081108
set you specify, click on `Omit One`. Select the set as described above for
11091109
`Omit All`. The `DoIt!` button will then iteratively launch a job for each WAV
11101110
file that has been selected, storing the result in the same `Logs Folder` but in
1111-
separate files and subdirectories that are suffixed with the letter "w". Of
1112-
course, training multiple classifiers is quickest when done simultaneously
1111+
separate files and subdirectories that are suffixed with the letter "w".
1112+
1113+
To train multiple classifiers, each of which withholds a portion of the
1114+
recordings in the set you specify, click on `Omit Some`. Select the set as
1115+
before, and specify the number of partitions using `k-fold`. For example, if
1116+
`k-fold` is "4", then four models will be trained, each with a different fourth
1117+
of the recordings in the chosen set withheld.
1118+
1119+
Of course, training multiple classifiers is quickest when done simultaneously
11131120
instead of sequentially. If your model is small, you might be able to fit
11141121
multiple on a single GPU (see the `models_per_job` variable in
11151122
"configuration.py"). Otherwise, you'll need a machine with multiple GPUs,
@@ -1268,20 +1275,22 @@ To perform a simple grid search for the optimal value of a particular
12681275
hyperparameter, first choose how many folds you want to partition your
12691276
ground-truth data into using `k-fold`. More folds permit characterizing the
12701277
variance better, but take longer to train and also result in fewer annotations
1271-
to measure the accuracy. Ensure that you have at least 10 annotations for each
1272-
label in the validation set if using many folds. Then set the hyperparameter
1273-
of interest to the first value you want to optimize and use the name of the
1274-
hyperparameter and it's value as the `Logs Folder` (e.g. "mb64" for a
1275-
mini-batch size of 64). Suffix any additional hyperparameters of interest
1276-
using underscores (e.g. "mb64_ks129_fm64" for a kernel size of 129 and 64
1277-
feature maps). If your model is small, use `models_per_job` in
1278-
"configuration.py" to train multiple folds on a GPU. Click the `X-Validate`
1279-
button and then `DoIt!`. One classifier will be trained for each fold, using
1280-
it as the validation set and the remaining folds for training. Separate files
1281-
and subdirectories are created in the `Logs Folder` that are suffixed by the
1282-
fold number and the letter "k". Plot training curves with the `Accuracy`
1283-
button, as before. Repeat the above procedure for each of remaining
1284-
hyperparameter values you want to try (e.g. "mb128_ks129_fm64",
1278+
to measure the accuracy. Unlike `Omit Some` in [Measuring
1279+
Generalization](#measuring-generalization) above, the partitions here group
1280+
individual annotations, as opposed to entire recordings. Ensure that you have
1281+
at least 10 annotations for each label in the validation set if using many
1282+
folds. Then set the hyperparameter of interest to the first value you want to
1283+
optimize and use the name of the hyperparameter and it's value as the `Logs
1284+
Folder` (e.g. "mb64" for a mini-batch size of 64). Suffix any additional
1285+
hyperparameters of interest using underscores (e.g. "mb64_ks129_fm64" for a
1286+
kernel size of 129 and 64 feature maps). If your model is small, use
1287+
`models_per_job` in "configuration.py" to train multiple folds on a GPU. Click
1288+
the `X-Validate` button and then `DoIt!`. One classifier will be trained for
1289+
each fold, using it as the validation set and the remaining folds for training.
1290+
Separate files and subdirectories are created in the `Logs Folder` that are
1291+
suffixed by the fold number and the letter "k". Plot training curves with the
1292+
`Accuracy` button, as before. Repeat the above procedure for each of remaining
1293+
hyperparameter values you want to try (e.g. "mb128_ks129_fm64",
12851294
"mb256_ks129_fm64", etc.). Then use the `Compare` button to create a figure of
12861295
the cross-validation data over the hyperparameter values, specifying for the
12871296
`Logs Folder` the independent variable (e.g. "mb") suffixed with the fixed

src/gui/controller.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -844,7 +844,7 @@ def remaining_callback():
844844
V.context_update()
845845

846846
def action_callback(thisaction, thisactuate):
847-
M.action=None if M.action is thisaction else thisaction
847+
M.action=None if M.action is thisaction and M.action is not V.leaveout else thisaction
848848
M.function=thisactuate
849849
V.buttons_update()
850850

@@ -1272,11 +1272,16 @@ def generalize_xvalidate_succeeded(kind, logdir, currtime):
12721272
return False
12731273
return True
12741274

1275-
async def leaveout_actuate(comma):
1275+
async def leaveout_actuate(kind):
12761276
test_files = _validation_test_files(V.test_files.value)[0]
12771277
validation_files = list(filter(
12781278
lambda x: not any([y!='' and y in x for y in test_files.split(',')]),
1279-
_validation_test_files(V.validation_files.value, comma)))
1279+
_validation_test_files(V.validation_files.value, False)))
1280+
if kind=="omit all":
1281+
validation_files = [','.join(validation_files)]
1282+
elif kind=="omit some":
1283+
k = int(V.kfold.value)
1284+
validation_files = [','.join(validation_files[x::k]) for x in range(k)]
12801285
currtime = time.time()
12811286
jobids = []
12821287
os.makedirs(V.logs_folder.value, exist_ok=True)

src/gui/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@
9191
V.fixfalsenegatives, V.generalize, V.tunehyperparameters,
9292
V.examineerrors, V.testdensely, V.findnovellabels,
9393
V.doit, width=M.gui_width_pix),
94-
row(V.detect, V.misses, V.train, V.leaveoneout, V.leaveallout, V.xvalidate,
94+
row(V.detect, V.misses, V.train, V.leaveout, V.xvalidate,
9595
V.mistakes, V.activations, V.cluster, V.visualize, V.accuracy, V.freeze,
9696
V.ensemble, V.classify, V.ethogram, V.compare, V.congruence,
9797
width=M.gui_width_pix),

src/gui/model.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -233,7 +233,7 @@ def next_pow2_sec(x_sec):
233233
return False, x_sec
234234

235235
def init(_bokeh_document, _configuration_file, _use_aitch):
236-
global changed_style
236+
global changed_style, default_style, primary_style
237237
global bokeh_document, configuration_file, bindirs, repodir, srcdir
238238
global use_aitch, local_current_job, server_current_job
239239
global audio_tic_rate, audio_nchannels, video_channels
@@ -260,6 +260,8 @@ def init(_bokeh_document, _configuration_file, _use_aitch):
260260
exec(open(_configuration_file).read(), globals())
261261

262262
changed_style = [".bk-input { background-color: #FFA500; }"]
263+
default_style = [".bk-input { background-color: #FFFFFF; color: #000000; font-size: 12px; }"]
264+
primary_style = [".bk-input { background-color: #428BCA; color: #FFFFFF; font-size: 12px; }"]
263265

264266
time_units = gui_time_units
265267
time_scale = gui_time_scale

src/gui/view.py

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1302,7 +1302,10 @@ def buttons_update():
13021302
for button in wizard_buttons:
13031303
button.button_type="success" if button==M.wizard else "default"
13041304
for button in action_buttons:
1305-
button.button_type="primary" if button==M.action else "default"
1305+
if button == leaveout:
1306+
button.stylesheets = M.primary_style if button==M.action else M.default_style
1307+
else:
1308+
button.button_type="primary" if button==M.action else "default"
13061309
button.disabled = button not in wizard2actions[M.wizard]
13071310
if M.action in [detect,classify,ethogram]:
13081311
wavcsv_files_button.label='wav files:'
@@ -1349,6 +1352,8 @@ def buttons_update():
13491352
textinput.disabled = cluster_parameters[thislogic[0]].value not in thislogic[1]
13501353
else:
13511354
textinput.disabled = False
1355+
elif M.action is leaveout and textinput is kfold:
1356+
textinput.disabled = leaveout.value != "omit some"
13521357
else:
13531358
textinput.disabled=False
13541359
if textinput.disabled==False and textinput.value=='':
@@ -1533,7 +1538,7 @@ def init(_bokeh_document):
15331538
global load_multimedia, play, video_slider, load_multimedia_callback, play_callback, video_slider_callback, video_toggle, video_div
15341539
global undo, redo, remaining
15351540
global recordings
1536-
global detect, misses, train, leaveoneout, leaveallout, xvalidate, mistakes, activations, cluster, visualize, accuracy, freeze, ensemble, classify, ethogram, compare, congruence
1541+
global detect, misses, train, leaveout, xvalidate, mistakes, activations, cluster, visualize, accuracy, freeze, ensemble, classify, ethogram, compare, congruence
15371542
global status_ticker, waitfor, deletefailures
15381543
global file_dialog_source, configuration_contents
15391544
global logs_folder_button, logs_folder, model_file_button, model_file, wavcsv_files_button, wavcsv_files, groundtruth_folder_button, groundtruth_folder, validation_files_button, test_files_button, validation_files, test_files, labels_touse_button, labels_touse, kinds_touse_button, kinds_touse, prevalences_button, prevalences, delete_ckpts, copy, labelsounds, makepredictions, fixfalsepositives, fixfalsenegatives, generalize, tunehyperparameters, findnovellabels, examineerrors, testdensely, doit, nsteps, restore_from, save_and_validate_period, validate_percentage, mini_batch, kfold, activations_equalize_ratio, activations_max_sounds, cluster_these_layers, precision_recall_ratios, congruence_portion, congruence_convolve, congruence_measure, context, shiftby, optimizer, loss, learning_rate, nreplicates, batch_seed, weights_seed, augment_volume, augment_noise, augment_dc, augment_reverse, augment_invert, file_dialog_string, file_dialog_table, readme_contents, model_summary, labelcounts, wizard_buttons, action_buttons, parameter_buttons, parameter_textinputs, wizard2actions, action2parameterbuttons, action2parametertextinputs, status_ticker_update, status_ticker_pre, status_ticker_post
@@ -1952,7 +1957,7 @@ def init(_bokeh_document):
19521957
remaining = Button(label='add remaining', disabled=True)
19531958
remaining.on_click(C.remaining_callback)
19541959

1955-
recordings = Select(title="recording:", height=50)
1960+
recordings = Select(title="recording:", height=48)
19561961
recordings.on_change('value', C.recordings_callback)
19571962

19581963
detect = Button(label='detect', width_policy='fit')
@@ -1964,13 +1969,11 @@ def init(_bokeh_document):
19641969
train = Button(label='train', width_policy='fit')
19651970
train.on_click(lambda: C.action_callback(train, C.train_actuate))
19661971

1967-
leaveoneout = Button(label='omit one', width_policy='fit')
1968-
leaveoneout.on_click(lambda: C.action_callback(leaveoneout,
1969-
lambda: C.leaveout_actuate(False)))
1970-
1971-
leaveallout = Button(label='omit all', width_policy='fit')
1972-
leaveallout.on_click(lambda: C.action_callback(leaveallout,
1973-
lambda: C.leaveout_actuate(True)))
1972+
leaveout = Select(title="", value="omit one", margin = 4,
1973+
options=["omit one", "omit some", "omit all"],
1974+
width_policy='fit', stylesheets=M.default_style)
1975+
leaveout.on_change('value',
1976+
lambda a,o,n: C.action_callback(leaveout, lambda: C.leaveout_actuate(n)))
19741977

19751978
xvalidate = Button(label='x-validate', width_policy='fit')
19761979
xvalidate.on_click(lambda: C.action_callback(xvalidate, C.xvalidate_actuate))
@@ -2146,7 +2149,7 @@ def init(_bokeh_document):
21462149
disabled=False, sizing_mode='stretch_width')
21472150
precision_recall_ratios.on_change('value', lambda a,o,n: C.generic_parameters_callback(n))
21482151

2149-
congruence_portion = Select(title="portion", height=50,
2152+
congruence_portion = Select(title="portion", height=48,
21502153
value=M.state['congruence_portion'],
21512154
options=["union", "intersection"],
21522155
sizing_mode='stretch_width')
@@ -2157,7 +2160,7 @@ def init(_bokeh_document):
21572160
disabled=False, sizing_mode='stretch_width')
21582161
congruence_convolve.on_change('value', lambda a,o,n: C.generic_parameters_callback(n))
21592162

2160-
congruence_measure = Select(title="measure", height=50,
2163+
congruence_measure = Select(title="measure", height=48,
21612164
value=M.state['congruence_measure'],
21622165
options=["label", "tic", "both"],
21632166
sizing_mode='stretch_width')
@@ -2171,12 +2174,12 @@ def init(_bokeh_document):
21712174
disabled=False, sizing_mode='stretch_width')
21722175
shiftby.on_change('value', lambda a,o,n: C.generic_parameters_callback(n))
21732176

2174-
optimizer = Select(title="optimizer", height=50, value=M.state['optimizer'],
2177+
optimizer = Select(title="optimizer", height=48, value=M.state['optimizer'],
21752178
options=["Adadelta", "Adagrad", "Adam", "Adamax", "Ftrl", "Nadam", "RMSProp", "SGD"],
21762179
sizing_mode="stretch_width")
21772180
optimizer.on_change('value', lambda a,o,n: C.generic_parameters_callback(''))
21782181

2179-
loss = Select(title="loss", height=50, value=M.state['loss'],
2182+
loss = Select(title="loss", height=48, value=M.state['loss'],
21802183
options=["exclusive", "overlapped", "autoencoder"],
21812184
sizing_mode="stretch_width")
21822185
loss.on_change('value', lambda a,o,n: C.generic_parameters_callback(''))
@@ -2207,7 +2210,7 @@ def parse_plugin_parameters(Mparameters, width, msu=False):
22072210
thisparameter = Select(value=M.state[parameter[0]],
22082211
title=parameter[1],
22092212
options=parameter[2],
2210-
height=50,
2213+
height=48,
22112214
sizing_mode='stretch_width')
22122215
thisparameter.on_change('value', get_callback(parameter[6], msu))
22132216
parameters[parameter[0]] = thisparameter
@@ -2332,8 +2335,7 @@ def parse_plugin_parameters(Mparameters, width, msu=False):
23322335
action_buttons = set([
23332336
detect,
23342337
train,
2335-
leaveoneout,
2336-
leaveallout,
2338+
leaveout,
23372339
xvalidate,
23382340
mistakes,
23392341
activations,
@@ -2409,7 +2411,7 @@ def parse_plugin_parameters(Mparameters, width, msu=False):
24092411
makepredictions: [train, accuracy, freeze, classify, ethogram, delete_ckpts],
24102412
fixfalsepositives: [activations, cluster, visualize, delete_ckpts],
24112413
fixfalsenegatives: [detect, misses, activations, cluster, visualize, delete_ckpts],
2412-
generalize: [leaveoneout, leaveallout, accuracy, delete_ckpts],
2414+
generalize: [leaveout, accuracy, delete_ckpts],
24132415
tunehyperparameters: [xvalidate, accuracy, compare, delete_ckpts],
24142416
findnovellabels: [detect, train, activations, cluster, visualize, delete_ckpts],
24152417
examineerrors: [detect, mistakes, activations, cluster, visualize, delete_ckpts],
@@ -2419,8 +2421,7 @@ def parse_plugin_parameters(Mparameters, width, msu=False):
24192421
action2parameterbuttons = {
24202422
detect: [wavcsv_files_button],
24212423
train: [logs_folder_button, groundtruth_folder_button, labels_touse_button, test_files_button, kinds_touse_button],
2422-
leaveoneout: [logs_folder_button, groundtruth_folder_button, validation_files_button, test_files_button, labels_touse_button, kinds_touse_button],
2423-
leaveallout: [logs_folder_button, groundtruth_folder_button, validation_files_button, test_files_button, labels_touse_button, kinds_touse_button],
2424+
leaveout: [logs_folder_button, groundtruth_folder_button, validation_files_button, test_files_button, labels_touse_button, kinds_touse_button],
24242425
xvalidate: [logs_folder_button, groundtruth_folder_button, test_files_button, labels_touse_button, kinds_touse_button],
24252426
mistakes: [groundtruth_folder_button],
24262427
activations: [logs_folder_button, model_file_button, groundtruth_folder_button, labels_touse_button, kinds_touse_button],
@@ -2440,8 +2441,7 @@ def parse_plugin_parameters(Mparameters, width, msu=False):
24402441
action2parametertextinputs = {
24412442
detect: [wavcsv_files] + list(detect_parameters.values()),
24422443
train: [context, shiftby, optimizer, loss, learning_rate, nreplicates, batch_seed, weights_seed, augment_volume, augment_noise, augment_dc, augment_reverse, augment_invert, logs_folder, groundtruth_folder, test_files, labels_touse, kinds_touse, nsteps, restore_from, save_and_validate_period, validate_percentage, mini_batch] + list(model_parameters.values()),
2443-
leaveoneout: [context, shiftby, optimizer, loss, learning_rate, batch_seed, weights_seed, augment_volume, augment_noise, augment_dc, augment_reverse, augment_invert, logs_folder, groundtruth_folder, validation_files, test_files, labels_touse, kinds_touse, nsteps, restore_from, save_and_validate_period, mini_batch] + list(model_parameters.values()),
2444-
leaveallout: [context, shiftby, optimizer, loss, learning_rate, batch_seed, weights_seed, augment_volume, augment_noise, augment_dc, augment_reverse, augment_invert, logs_folder, groundtruth_folder, validation_files, test_files, labels_touse, kinds_touse, nsteps, restore_from, save_and_validate_period, mini_batch] + list(model_parameters.values()),
2444+
leaveout: [context, shiftby, optimizer, loss, learning_rate, batch_seed, weights_seed, augment_volume, augment_noise, augment_dc, augment_reverse, augment_invert, logs_folder, groundtruth_folder, validation_files, test_files, labels_touse, kinds_touse, nsteps, restore_from, save_and_validate_period, mini_batch, kfold] + list(model_parameters.values()),
24452445
xvalidate: [context, shiftby, optimizer, loss, learning_rate, batch_seed, weights_seed, augment_volume, augment_noise, augment_dc, augment_reverse, augment_invert, logs_folder, groundtruth_folder, test_files, labels_touse, kinds_touse, nsteps, restore_from, save_and_validate_period, mini_batch, kfold] + list(model_parameters.values()),
24462446
mistakes: [groundtruth_folder],
24472447
activations: [context, shiftby, logs_folder, model_file, groundtruth_folder, labels_touse, kinds_touse, activations_equalize_ratio, activations_max_sounds, mini_batch, batch_seed] + list(model_parameters.values()),

0 commit comments

Comments
 (0)