Evaluation module

Supervised Evaluation

class mdgru.eval.SupervisedEvaluation(modelcls, datacls, kw)[source]

Bases: object

Handler for the evaluation of model defined in modelcls using data coming from datacls.

Parameters:
  • kw (dict containing the following options.) –
    • dropout_rate [default: 0.5] “keep rate” for weights using dropconnect. The higher the value, the closer the sampled models to the full model.
    • namespace [default: default] override default model name (if no ckpt is provided). Probably not a good idea!
    • only_save_labels [default: False] save only labels and no probability distributions
    • validate_same [default: True] always pick other random samples for validation!
    • evaluate_uncertainty_times [default: 1] Number times we want to evaluate one volume. This only makes sense using a keep rate of less than 1 during evaluation (dropout_during_evaluation less than 1)
    • evaluate_uncertainty_dropout [default: 1.0] Keeprate of weights during evaluation. Useful to visualize uncertainty in conjunction with a number of samples per volume
    • evaluate_uncertainty_saveall [default: False] Save each evaluation sample per volume. Without this flag, only the standard deviation and mean over all samples is kept.
    • show_f05 [default: True]
    • show_f1 [default: True]
    • show_f2 [default: True]
    • show_l2 [default: True]
    • show_cross_entropy [default: True]
    • print_each [default: 1] print execution time and losses each # iterations
    • batch_size [default: 1] Minibatchsize
    • datapath path where training, validation and testing folders lie. Can also be some other path, as long as the other locations are provided as absolute paths. An experimentsfolder will be created in this folder, where all runs and checkpoint files will be saved.
    • locationtraining [default: None] absolute or relative path to datapath to the training data. Either a list of paths to the sample folders or one path to a folder where samples should be automatically determined.
    • locationtesting [default: None] absolute or relative path to datapath to the testing data. Either a list of paths to the sample folders or one path to a folder where samples should be automatically determined.
    • locationvalidation [default: None] absolute or relative path to datapath to the validation data. Either a list of paths to the sample folders or one path to a folder where samples should be automatically determined.
    • output_dims number of output channels, e.g. number of classes the model needs to create a probability distribution over.
    • windowsize window size to be used during training, validation and testing, if not specified otherwise
    • padding [default: [0]] padding to be used during training, validation and testing, if not specified otherwise. During training, the padding specifies the amount a patch is allowed to reach outside of the image along all dimensions, during testing, it specifies also the amount of overlap needed between patches.
    • windowsizetesting [default: None] override windowsize for testing
    • windowsizevalidation [default: None]
    • paddingtesting [default: None] override padding for testing
    • paddingvalidation [default: None]
    • testbatchsize [default: 1] batchsize for testing
  • modelcls (cls) – Python class defining the model to evaluate
  • datacls (cls) – Python class implementing the data loading and storing
_defaults = {'batch_size': {'value': 1, 'help': 'Minibatchsize', 'type': <class 'int'>, 'name': 'batchsize', 'short': 'b'}, 'datapath': {'help': 'path where training, validation and testing folders lie. Can also be some other path, as long as the other locations are provided as absolute paths. An experimentsfolder will be created in this folder, where all runs and checkpoint files will be saved.'}, 'dropout_rate': {'value': 0.5, 'help': '"keep rate" for weights using dropconnect. The higher the value, the closer the sampled models to the full model.'}, 'evaluate_uncertainty_dropout': {'value': 1.0, 'type': <class 'float'>, 'help': 'Keeprate of weights during evaluation. Useful to visualize uncertainty in conjunction with a number of samples per volume', 'name': 'dropout_during_evaluation'}, 'evaluate_uncertainty_saveall': {'value': False, 'help': 'Save each evaluation sample per volume. Without this flag, only the standard deviation and mean over all samples is kept.', 'name': 'save_individual_evaluations'}, 'evaluate_uncertainty_times': {'value': 1, 'type': <class 'int'>, 'help': 'Number times we want to evaluate one volume. This only makes sense using a keep rate of less than 1 during evaluation (dropout_during_evaluation less than 1)', 'name': 'number_of_evaluation_samples'}, 'locationtesting': {'value': None, 'help': 'absolute or relative path to datapath to the testing data. Either a list of paths to the sample folders or one path to a folder where samples should be automatically determined.', 'nargs': '+'}, 'locationtraining': {'value': None, 'help': 'absolute or relative path to datapath to the training data. Either a list of paths to the sample folders or one path to a folder where samples should be automatically determined.', 'nargs': '+'}, 'locationvalidation': {'value': None, 'help': 'absolute or relative path to datapath to the validation data. Either a list of paths to the sample folders or one path to a folder where samples should be automatically determined.', 'nargs': '+'}, 'namespace': {'value': 'default', 'help': 'override default model name (if no ckpt is provided). Probably not a good idea!', 'alt': ['modelname']}, 'only_save_labels': {'value': False, 'help': 'save only labels and no probability distributions'}, 'output_dims': {'help': 'number of output channels, e.g. number of classes the model needs to create a probability distribution over.', 'type': <class 'int'>, 'alt': ['nclasses']}, 'padding': {'help': 'padding to be used during training, validation and testing, if not specified otherwise. During training, the padding specifies the amount a patch is allowed to reach outside of the image along all dimensions, during testing, it specifies also the amount of overlap needed between patches.', 'value': [0], 'nargs': '+', 'short': 'p', 'type': <class 'int'>}, 'paddingtesting': {'value': None, 'help': 'override padding for testing', 'nargs': '+', 'type': <class 'int'>}, 'paddingvalidation': None, 'print_each': {'value': 1, 'help': 'print execution time and losses each # iterations', 'type': <class 'int'>}, 'show_cross_entropy': True, 'show_f05': True, 'show_f1': True, 'show_f2': True, 'show_l2': True, 'testbatchsize': {'value': 1, 'help': 'batchsize for testing'}, 'validate_same': {'value': True, 'help': 'always pick other random samples for validation!', 'invert_meaning': 'dont_'}, 'windowsize': {'type': <class 'int'>, 'short': 'w', 'help': 'window size to be used during training, validation and testing, if not specified otherwise', 'nargs': '+'}, 'windowsizetesting': {'value': None, 'help': 'override windowsize for testing', 'nargs': '+', 'type': <class 'int'>}, 'windowsizevalidation': None}
_load(f)[source]

Load model in current framework from f

Parameters:f (location of stored model) –
_predict(batch, dropout, testing)[source]

Predict given batch and keeprate dropout.

Parameters:
  • batch (ndarray) –
  • dropout (float) – Keeprate for dropconnect
  • testing
Returns:

ndarray (Prediction based on data batch)

_predict_with_loss(batch, batchlabs)[source]

Predict for given batch and return loss compared to labels in batchlabs

Parameters:
  • batch (image data) –
  • batchlabs (corresponding label data) –
Returns:

tuple of ndarray prediction and losses

_save(f)[source]

Save to file f in current framework

Parameters:f (location to save model at) –
_set_session(sess, cachefolder)[source]
_train()[source]

Performs one training iteration in respective framework and returns loss(es)

add_summary_simple_value(text, value)[source]
get_globalstep()[source]

Return number of iterations this model has been trained in

Returns:int (iteration count)
load(f)[source]

loads model at location f from disk

Parameters:f (str) – location of stored model
save(f)[source]

saves model to disk at location f

Parameters:f (str) – location to save model to
set_session(sess, cachefolder, train=False)[source]
test_all_available(batch_size=None, dc=None, return_results=False, dropout=None, testing=False)[source]

Completely evaluates each full image in tps using grid sampling.

Parameters:
  • batch_size (int) – minibatch size to compute on
  • dc (datacollection instance, optional) – datacollection to sample from
  • return_results (bool) – should results be returned or stored right away?
  • dropout (float) – keeprate of dropconnect for inference
  • testing
Returns:

either tuple of predictions and errors or only errors, depending on return_results flag

test_all_random(batch_size=None, dc=None, resample=True)[source]

Test random samples

Parameters:
  • batch_size (int) – minibatch size to compute on
  • dc (datacollection instance, optional) – datacollection to sample from
  • resample (bool) – indicates if we need to sample before evaluating
Returns:

tuple of loss and prediction ndarray

test_scores(pred, ref)[source]

Evaluates all selected scores between reference data ref and prediction pred.

Parameters:
  • pred (ndarray) – prediction, as probability distributions per pixel / voxel
  • ref (ndarray) – labelmap, either as probability distributions per pixel / voxel or as label map
train()[source]

Measures and logs time when performing data sampling and training iteration.

Tensorflow Backend

class mdgru.eval.tf.SupervisedEvaluationTensorflow(modelcls, datacls, kw)[source]

Bases: mdgru.eval.SupervisedEvaluation

Evaluation class for tensorflow backend

Parameters:
  • kw (dict containing the following options.) –
    • use_tensorboard [default: True] Dont use tensorboard
    • image_summaries_each [default: 100] Store image summaries in tensorboard every # iterations
    • restore_optimistically [default: False]
    • only_cpu [default: False] Only use cpu
    • gpubound [default: 1.0] manage how much of the memory of the gpu can be used
  • modelcls (cls) – Python class defining the model to be evaluated
  • datacls (cls) – Python class defining data loading and saving
_defaults = {'gpubound': {'value': 1.0, 'name': 'gpu_bound_fraction', 'help': 'manage how much of the memory of the gpu can be used', 'type': <class 'float'>}, 'image_summaries_each': {'value': 100, 'help': 'Store image summaries in tensorboard every # iterations'}, 'only_cpu': {'value': False, 'help': 'Only use cpu'}, 'restore_optimistically': False, 'use_tensorboard': {'value': True, 'help': 'Dont use tensorboard', 'invert_meaning': 'dont_'}}
_load(f)[source]
_optimistic_restore(session, save_file)[source]
_predict(batch, dropout, testing)[source]
_predict_with_loss(batch, batchlabs)[source]
_save(f)[source]
_set_session(sess, cachefolder)
_train(batch, batchlabs)[source]
add_summary_simple_value(text, value)[source]
get_globalstep()[source]
load(f)

loads model at location f from disk

Parameters:f (str) – location of stored model
save(f)

saves model to disk at location f

Parameters:f (str) – location to save model to
set_session(sess, cachefolder, train=False)[source]
test_all_available(batch_size=None, dc=None, return_results=False, dropout=None, testing=False)

Completely evaluates each full image in tps using grid sampling.

Parameters:
  • batch_size (int) – minibatch size to compute on
  • dc (datacollection instance, optional) – datacollection to sample from
  • return_results (bool) – should results be returned or stored right away?
  • dropout (float) – keeprate of dropconnect for inference
  • testing
Returns:

either tuple of predictions and errors or only errors, depending on return_results flag

test_all_random(batch_size=None, dc=None, resample=True)

Test random samples

Parameters:
  • batch_size (int) – minibatch size to compute on
  • dc (datacollection instance, optional) – datacollection to sample from
  • resample (bool) – indicates if we need to sample before evaluating
Returns:

tuple of loss and prediction ndarray

test_scores(pred, ref)

Evaluates all selected scores between reference data ref and prediction pred.

Parameters:
  • pred (ndarray) – prediction, as probability distributions per pixel / voxel
  • ref (ndarray) – labelmap, either as probability distributions per pixel / voxel or as label map
train()

Measures and logs time when performing data sampling and training iteration.

Pytorch Backend

class mdgru.eval.torch.SupervisedEvaluationTorch(modelcls, datacls, kw)[source]

Bases: mdgru.eval.SupervisedEvaluation

Evaluation class for the pytorch backend

Parameters:
  • kw (dict containing the following options.) –
  • modelcls (cls) – Python class defining the model to be evaluated
  • datacls (cls) – Python class defining the loading and saving of the data being evaluated here
_defaults = {}
_load(f)[source]

Load model

_predict(batch, dropout, testing)[source]

predict given our graph for batch. Be careful as this method returns results always in NHWC or NDHWC

_predict_with_loss(batch, batchlabs)[source]

run evaluation and calculate loss

_save(f)[source]

Save model

_set_session(sess, cachefolder)
_train(batch, batchlabs)[source]

set inputs and run torch training iteration

add_summary_simple_value(text, value)
check_input(batch, batchlabs=None)[source]

Method to check correctness of input and convert them to cuda pytorch tensors

Parameters:
  • batch (ndarray) – input data to be moved to pytorch
  • batchlabs (ndarray) – label information to be moved to pytorch
get_globalstep()[source]
load(f)

loads model at location f from disk

Parameters:f (str) – location of stored model
save(f)

saves model to disk at location f

Parameters:f (str) – location to save model to
set_session(sess, cachefolder, train=False)
test_all_available(batch_size=None, dc=None, return_results=False, dropout=None, testing=False)

Completely evaluates each full image in tps using grid sampling.

Parameters:
  • batch_size (int) – minibatch size to compute on
  • dc (datacollection instance, optional) – datacollection to sample from
  • return_results (bool) – should results be returned or stored right away?
  • dropout (float) – keeprate of dropconnect for inference
  • testing
Returns:

either tuple of predictions and errors or only errors, depending on return_results flag

test_all_random(batch_size=None, dc=None, resample=True)

Test random samples

Parameters:
  • batch_size (int) – minibatch size to compute on
  • dc (datacollection instance, optional) – datacollection to sample from
  • resample (bool) – indicates if we need to sample before evaluating
Returns:

tuple of loss and prediction ndarray

test_scores(pred, ref)

Evaluates all selected scores between reference data ref and prediction pred.

Parameters:
  • pred (ndarray) – prediction, as probability distributions per pixel / voxel
  • ref (ndarray) – labelmap, either as probability distributions per pixel / voxel or as label map
train()

Measures and logs time when performing data sampling and training iteration.