Tensorflow and deep learning without a PhD by Martin Görner

deep-learning notes

My notes from Martin Görner’s Youtube video talk at Devoxx about neural networks and Tensorflow.

3-hour course (video + slides) offers developers a quick introduction to deep-learning fundamentals, with some TensorFlow thrown into the bargain. More info

Tensorflow and deep learning - without a PhD

Part 1 - slides

  • Softmax - a good activation function for multi-class logistic regression

    Y = softmax(X * W +b )
    b biases; W weights; X images in arrays; softmax applied line by line; Y predictions
    Hidden layers: ReLu outperforms sigmoid
    Activation functions - external link

  • Loss function

    For classification problems, cross entropy works a bit better

  • Optimization

    Gradient descent performs better with batches, points better towards lower value
    If the accuracy curve is noisy, jumping by 1%, it means you go too fast
    Start with more significant decent value first ex 0.003, decrease later to 0.0001
    Epoch - you see all your data (all batches) once

  • Overfitting

    Overfitting happens when you have too much freedom when you have too many weights and biases, and you store your training data there in some form
    Once model works great with training, fails miserably once it faces test data
    If cross-entropy loss graph is strange, starts increasing slowly, there is potential overfitting
    Good solution for over-fitting is regularization: dropout
    Dropout removes part of the neurons above a specific threshold pKeep = 0.75

  • CNN Convolutional Neural Networks

    Good for 2d representations
    With the previous example, we used 1d matrix for image pixels, losing shape information

Part 2 - slides

  • Batch normalization

    The intention behind batch normalization is to optimize network training
    The idea is to normalize the inputs of each layer in such a way that they have a mean output activation of zero and standard deviation of one.
    Batch normalization happens before activation function
    When you use batch normalization, bias is no longer needed

  • RNN Recurrent Neural Network

    Good for long sequences, for example writing the next word
    RNNs are always very deep

  • LSTM Long Short Term Memory networks

    Tracking long-term dependencies

  • GRU Gated Recurrent Unit networks

    GRUs are improved version of the standard recurrent neural network
    The special thing about them is that they can be trained to keep information from long ago, without washing it through time or remove information which is irrelevant to the prediction
    The GRU unit controls the flow of information like the LSTM unit, but without having to use a memory unit. It just exposes the full hidden content without any control