Derivatives of activation functions

Random initialization, week 3: shallow neural networks, neural network overview.

Up until this point, we have used logistic regression as a stand-in for neural networks. The "network" we have been describing looked like:

\(a\) and \(\hat y\) are used interchangeably

A neural network looks something like this:

We typically don't distinguish between \(z\) and \(a\) when talking about neural networks, one neuron = one activation = one \(a\) like calculation.

We will introduce the notation of superscripting values with \(^{[l]}\), where \(l\) refers to the layer of the neural network that we are talking about.

Not to be confused with \(^{(i)}\) which we use to refer to a single input example \(i\) .

The key intuition is that neural networks stack activations of inputs multiplied by their weights .

Similar to the 'backwards' step that we discussed for logistic regression, we will explore the backwards steps that makes learning in a neural network possible.

Neural network Representation

This is the canonical representation of a neural network

neural_network_basics.png

On the left, we have the input features stacked vertically. This constitutes our input layer . The final layer, is called the output layer and it is responsible for generating the predicted value \(\hat y\) . Any layer in between these two layers is known as a hidden layer . This name derives from the fact that the true values of these hidden units is not observed in the training set.

The hidden layers and output layers have parameters associated with them. These parameters are denoted \(W^{[l]}\) and \(b^{[l]}\) for layer \(l\) .

Previously, we were referring to our input examples as \(x^{(i)}\) and organizing them in a design matrix \(X\) . With neural networks, we will introduce the convention of denoting output values of a layer \(l\), as a column vector \(a^{[l]}\), where \(a\) stands for activation . You can also think of these as the values a layer \(l\) passes on to the next layer.

Another note: the network shown above is a 2-layer neural network. We typically do not count the input layer. In light of this, we usually denote the input layer as \(l=0\).

Computing a Neural Networks Output

We will use the example of a single hidden layer neural network to demonstrate the forward propagation of inputs through the network leading to the networks output.

We can think of each unit in the neural network as performing two steps, the multiplication of inputs by weights and the addition of a bias , and the activation of the resulting value

unit_breakdown.png

Recall, that we will use a superscript, \(^{[l]}\) to denote values belonging to the \(l-th\) layer.

So, the \(j^{th}\) node of the \(l^{th}\) layer performs the computation

\[ a_j^{[l]} = \sigma(w_j^{[l]^T}a^{[l-1]} + b_j^{[l]}) \]

Where \(a^{[l-1]}\) is the activation values from the precious layer.

for some input \(x\). With this notation, we can draw our neural network as follows:

new_notation_nn.png

In order to easily vectorize the computations we need to perform, we designate a matrix \(W^{[l]}\) for each layer \(l\), which has dimensions (number of units in current layer X number of units in previous layer)

We can vectorize the computation of \(z^{[l]}\) as follows:

vectorized_z_nn.png

And the computation of \(a^{[l]}\) just becomes the element-wise application of the sigmoid function:

vectorized_a_nn.png

We can put it all together for our two layer neural network, and outline all the computations using our new notation:

putting_it_all_together_new_notation.png

Vectorizing across multiple examples

In the last video, we saw how to compute the prediction for a neural network with a single input example. In this video, we introduce a vectorized approach to compute predictions for many input examples.

We have seen how to take a single input example \(x\) and compute \(a^{[2]} = \hat y\) for a 2-layered neural network. If we have \(m\) training examples, we can used a vectorized approach to compute all \(m\) predictions.

First, lets introduce a new notation. The activation values of layer \(l\) for input example \(i\) is:

\[ a^{[l] (i)} \]

The \(m\) predictions our 2-layered are therefore computed in the following way:

m_examples_nn.png

Recall that \(X\) is a \((n_x, m)\) design matrix, where each column is a single input example and \(W^{[l]}\) is a matrix where each row is the transpose of the parameter column vector for layer \(l\).

Thus, we can now compute the activation of a layer in the neural network for all training examples:

\[Z^{[l]} = W^{[l]}X + b^{[l]}\] \[A^{[l]} = sign(Z^{[l]})\]

As an example, the result of a matrix multiplication of \(W^{[1]}\) by \(X\) is a matrix with dimensions \((j, m)\) where \(j\) is the number of units in layer \(1\) and \(m\) is the number of input examples

WX_vector.jpg

\(A^{[l]}\) is therefore a matrix of dimensions (size of layer \(l\) X \(m\)). The top-leftmost value is the activation for the first unit in the layer \(l\) for the first input example \(i\), and the bottom-rightmost value is the activation for the last unit in the layer \(l\) for the last input example \(m\) .

vectorized_activations.png

Activation Functions

So far, we have been using the sigmoid activation function

\[\sigma(z) = \frac{1}{1 + e^{-z}}\]

It turns out there are much better options.

The hyperbolic tangent function is a non-linear activation function that almost always works better than the sigmoid function.

\[tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}\]

The tanh function is really just a shift of the sigmoid function so that it crosses through the origin.

The tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer.

The single exception of sigmoid outperforming tanh is when its used in the ouput layer. In this case, it can be more desirable to scale our outputs from \(0\) to \(1\) (particularly in classification, when we want to output the probability that something belongs to a certain class). Indeed, we often mix activation functions in neural networks, and denote them:

\[g^{[p]}(z)\]

Where \(p\) is the \(p^{th}\) activation function.

If \(z\) is either very large, or very small, the derivative of both the tanh and sigmoid functions becomes very small, and this can slow down learning.

The rectified linear unit activation function solves the disappearing gradient problem faced by tanh and sigmoid activation functions. In practice, it also leads to faster learning.

\[ReLu(z) = max(0, z)\]

Note: the derivative at exactly 0 is not well-defined. In practice, we can simply set it to 0 or 1 (it matters little, due to the unlikeliness of a floating point number to ever be \(0.0000...\) exactly).

One disadvantage of ReLu is that the derivative is equal to \(0\) when \(z\) is negative. Leaky ReLu 's aim to solve this problem with a slight negative slope for values of \(z<0\) .

\[ReLu(z) = max(0.01 * z, z)\]

Image sourced from here .

Sometimes, the \(0.01\) value is treated as an adaptive parameter of the learning algorithm. Leaky ReLu's solve a more general problem of " dead neurons ". However, it is not used as much in practice.

Rules of thumb for choosing activations functions

  • If your output is a 0/1 value , i.e., you are performing binary classification, the sigmoid activation is a natural choice for the output layer.
  • For all other units , ReLu's is increasingly the default choice of activation function.

Why do you need non-linear activation functions?

We could imagine using some linear activation function, \(g(z) = z\) in place of the non-linear activation functions we have been using so far. Why is this a bad idea? Lets illustrate out explanation using our simple neural networks

For this linear activation function, the activations of our simple network become:

\[z^{[1]} = W^{[1]}x + b^{[1]}\] \[a^{[1]} = z^{[1]}\] \[z^{[2]} = W^{[2]}x + b^{[2]}\] \[a^{[2]} = z^{[2]}\]

From which we can show that,

\[a^{[2]} = (W^{[2]}W^{[1]})x + (W^{[2]}b^{[1]} + b^{[2]})\] \[a^{[2]} = W'x + b' \text{, where } W' = W^{[2]}W^{[1]} \text{ and } b' = W^{[2]}b^{[1]} + b^{[2]}\]

Therefore, in the case of a linear activation function , the neural network is outputting a linear function of the inputs , no matter how many hidden layers!

There are (maybe) two cases in which you may actually want to use a linear activation function.

  • The output layer of a network used to perform regression, where we want \(\hat y\) to be a real-valued number, \(\hat y \in \mathbb R\)
  • Extremely specific cases pertaining to compression.

When performing back-propogation on a network, we need to compute the derivatives of the activation functions. Lets take a look at our activation functions and their derivatives

sigmoid_deriv.png

The deriviative of \(g(z)\), \(g(z)'\) is:

\[\frac{d}{dz}g(z) = \frac{1}{1 + e^{-z}}(1 - \frac{1}{1 + e^{-z}})= g(z)(1-g(z)) = a(1-a)\]

We can sanity check this by inputting very large, or very small values of \(z\) into our derivative formula and inspecting the size of the outputs.

Notice that if we have already computed the value of \(a\), we can very cheaply compute the value of \(g(z)'\) .

tanh_deriv.png

\[\frac{d}{dz}g(z) = 1 - (tanh(z))^z\]

Again, we can sanity check this inspecting that the outputs for different values of \(z\) match our intuition about the activation function.

relu_deriv.png

The derivative of \(g(z)\), \(g(z)'\) is:

\[\frac{d}{dz}g(z) = 0 \text{ if } z < 0 ; 1 \text{ if } z > 0; \text{ undefined if } z = 0\]

If \(z = 0\), we typically default to setting \(g(z)\) to either \(0\) or \(1\) . In practice this matters little.

Gradient descent for Neural Networks

Lets implement gradient descent for our simple 2-layer neural network.

Recall, our parameters are: \(W^{[1]}, b^{[1]}, W^{[2]}, b^{[2]}\) . We have number of features, \(n_x = n^{[0]}\), number of hidden units \(n^{[1]}\), and \(n^{[2]}\) output units.

Thus our dimensions:

  • \(W^{[1]}\) : (\(n^{[1]}, n^{[0]}\))
  • \(b^{[1]}\) : (\(n^{[1]}, 1\))
  • \(W^{[2]}\) : (\(n^{[2]}, n^{[1]}\))
  • \(b^{[2]}\) : (\(n^{[2]}, 1\))

Our cost function is: \(J(W^{[1]}, b^{[1]}, W^{[2]}, b^{[2]}) = \frac{1}{m}\sum_{i=1}^m \ell(\hat y, y)\)

We are assuming binary classification.

Gradient Descent sketch

  • Initialize parameters randomly
  • compute predictions \(\hat y^{(i)}\) for \(i = 1 ,..., m\)
  • \(dW^{[1]} = \frac{\partial J}{\partial W^{[1]}}, db^{[1]} = \frac{\partial J}{\partial b^{[1]}}, ...\)
  • \(W^{[1]} = W^{[1]} - \alpha dW^{[1]}, ...\)
  • \(b^{[1]} = b^{[1]} - \alpha db^{[1]}, ...\)

The key to gradient descent is to computation of the derivatives, \(\frac{\partial J}{\partial W^{[l]}}\) and \(\frac{\partial J}{\partial b^{[l]}}\) for all layers \(l\) .

Formulas for computing derivatives

We are going to simply present the formulas you need, and defer their explanation to the next video. Recall the computation graph for our 2-layered neural network:

nn_overview_graph.png

And the vectorized implementation of our computations in our forward propagation

1.\[Z^{[1]} = W^{[1]}X + b^{[1]}\] 2.\[A^{[1]} = g^{[1]}(Z^{[1]})\] 3.\[Z^{[2]} = W^{[2]}A^{[1]} + b^{[2]}\] 4.\[A^{[2]} = g^{[2]}(Z^{[2]}) = \sigma(Z^{[2]})\]

Where \(g^{[2]}\) would likely be the sigmoid function if we are doing binary classification.

Now we list the computations for our backward propagation

1.\[ dZ^{[2]} = A^{[2]} - Y \] 2.\[ dW^{[2]} = \frac{1}{m}dZ^{[2]}A^{[1]T} \]

Transpose of A accounts for the fact that W is composed of transposed column vectors of parameters.

3.\[db^{[2]} = \frac{1}{m}np.sum(dZ^{[2]}, axis = 1, keepdims=True)\]

Where \(Y = [y^{(1)}, ..., y^{[m]}]\) . The keepdims arguments prevents numpy from returning a rank 1 array, \((n,)\)

4.\[dZ^{[1]} = W^{[2]T}dZ^{[2]} \odot g(Z)' (Z^{[1]})\]

Where \(\odot\) is the element-wise product. Note: this is a collapse of \(dZ\) and \(dA\) computations.

5.\[dW{[1]} = \frac{1}{m} = dZ^{[1]}X^T\] 6.\[db^{[1]} = \frac{1}{m}np.sum(dZ^{[1]}, axis=1, keepdims=True)\]

When you train your neural network, it is important to initialize your parameters randomly . With logistic regression, we were able to initialize our weights to zero because the cost function was convex. We will see that this will not work with neural networks.

Lets take the following network as example:

super_simple_network.png

Lets say we initialize our parameters as follows:

\(W^{[1]} = \begin{bmatrix}0 & 0 \\ 0 & 0 \end{bmatrix}\), \(b^{[1]} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}\), \(W^{[2]} = \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}\), \(b^{[2]} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}\)

It turns out that initializing the bias \(b\) with zeros is OK.

The problem with this initialization is that for any input examples \(i, j\),

\[a^{[1]}_i == a^{[1]}_j\]

\[dz^{[1]}_i == dz^{[1]}_j\]

Thus, \(dW^{[1]}\) will be some matrix \(\begin{bmatrix}u & v \\ u & v\end{bmatrix}\) and all updates to the parameters \(W^{[1]}\) will be identical.

Note we are referring to our single hidden layer \(^{[1]}\) but this would apply to any hidden layer of any fully-connected network, no matter how large.

Using a proof by induction , it is actually possible to prove that after any number of rounds of training the two hidden units are still computing identical functions . This is often called the symmetry breaking problem .

The solution to this problem, is to initialize parameters randomly . Heres an example on how to do that with numpy:

  • \(W^{[1]}\) = np.random.rand(2,2) * 0.01
  • \(W^{[2]}\) = np.random.rand(1,2) * 0.01
This will generate small, gaussian random values.
  • \(b^{[1]}\) = np.zeros((2,1))
  • \(b^{[2]}\) = 0
In next weeks material, we will talk about how and when you might choose a different factor than \(0.01\) for initialization.

It turns out the \(b\) does not have this symmetry breaking problem, because as long as the hidden units are computing different functions, the network will converge on different values of \(b\), and so it is fine to initialize it to zeros.

Why do we initialize to small values?

For a sigmoid-like activation function, large parameter weights (positive or negative) will make it more likely that \(z\) is very large (positive or negative) and thus \(dz\) will approach \(0\), slowing down learning dramatically .

Note this is a less of an issue when using ReLu's, however many classification problems use sigmoid activations in their output layer.

deep learning week 3 assignment

Week 3: Exploring Overfitting in NLP

Welcome to this assignment! During this week you saw different ways to handle sequence-like data. You saw how some Keras' layers such as GRU , Conv and LSTM can be used to tackle problems in this space. Now you will put this knowledge into practice by creating a model architecture that does not overfit.

For this assignment you will be using a variation of the Sentiment140 dataset , which contains 1.6 million tweets alongside their respective sentiment (0 for negative and 4 for positive).

You will also need to create the helper functions very similar to the ones you coded in previous assignments pre-process data and to tokenize sentences. However the objective of the assignment is to find a model architecture that will not overfit.

Let's get started!

NOTE: To prevent errors from the autograder, you are not allowed to edit or delete non-graded cells in this notebook . Please only put your solutions in between the ### START CODE HERE and ### END CODE HERE code comments, and also refrain from adding any new cells. Once you have passed this assignment and want to experiment with any of the non-graded code, you may follow the instructions at the bottom of this notebook.

Defining some useful global variables

Next you will define some global variables that will be used throughout the assignment.

EMBEDDING_DIM : Dimension of the dense embedding, will be used in the embedding layer of the model. Defaults to 100.

MAXLEN : Maximum length of all sequences. Defaults to 16.

TRUNCATING : Truncating strategy (truncate either before or after each sequence.). Defaults to 'post'.

PADDING : Padding strategy (pad either before or after each sequence.). Defaults to 'post'.

OOV_TOKEN : Token to replace out-of-vocabulary words during text_to_sequence calls. Defaults to "<OOV>".

MAX_EXAMPLES : Max number of examples to use. Defaults to 160000 (10% of the original number of examples)

TRAINING_SPLIT : Proportion of data used for training. Defaults to 0.9

For now leave them unchanged but after submitting your assignment for grading you are encouraged to come back here and play with these parameters to see the impact they have in the classification process.

Explore the dataset

The dataset is provided in a csv file.

Each row of this file contains the following values separated by commas:

target: the polarity of the tweet (0 = negative, 4 = positive)

ids: The id of the tweet

date: the date of the tweet

flag: The query. If there is no query, then this value is NO_QUERY.

user: the user that tweeted

text: the text of the tweet

Take a look at the first two examples:

Notice that this file does not have a header so you won't need to skip the first row when parsing the file.

For the task at hand you will only need the information of the target and the text, which are the first and last element of each row.

Parsing the raw data

Now you need to read the data from the csv file. To do so, complete the parse_data_from_file function.

A couple of things to note:

You should NOT omit the first line as the file does not contain headers.

There is no need to save the data points as numpy arrays, regular lists is fine.

To read from csv files use csv.reader by passing the appropriate arguments.

csv.reader returns an iterable that returns each row in every iteration. So the label can be accessed via row[0] and the text via row[5] .

The labels are originally encoded as strings ('0' representing negative and '4' representing positive). You need to change this so that the labels are integers and 0 is used for representing negative, while 1 should represent positive.

Expected Output:

You might have noticed that this dataset contains a lot of examples. In order to keep a low execution time of this assignment you will be using only 10% of the original data. The next cell does this while also randomnizing the datapoints that will be used:

Training - Validation Split

Now you will code the train_val_split , which given the list of sentences, the list of labels and the proportion of data for the training set, should return the training and validation sentences and labels:

Tokenization - Sequences, truncating and padding

Now that you have sets for training and validation it is time for you to begin the tokenization process.

Begin by completing the fit_tokenizer function below. This function should return a Tokenizer that has been fitted to the training sentences.

Remember that the pad_sequences function returns numpy arrays, so your training and validation sequences are already in this format.

However the labels are still Python lists. Before going forward you should convert them numpy arrays as well. You can do this by running the following cell:

Using pre-defined Embeddings

This time you will not be learning embeddings from your data but you will be using pre-trained word vectors.

In particular you will be using the 100 dimension version of GloVe from Stanford.

Now you have access to GloVe's pre-trained word vectors. Isn't that cool?

Let's take a look at the vector for the word dog :

Feel free to change the test_word to see the vector representation of any word you can think of.

Also, notice that the dimension of each vector is 100. You can easily double check this by running the following cell:

Represent the words in your vocabulary using the embeddings

Save the vector representation of each word in the vocabulary in a numpy array.

A couple of things to notice:

If a word in your vocabulary is not present in GLOVE_EMBEDDINGS the representation for that word is left as a column of zeros.

word_index starts counting at 1, because of this you will need to add an extra column at the left-most side of the EMBEDDINGS_MATRIX array. This is the reason why you add 1 to VOCAB_SIZE in the cell below:

Now you have the pre-trained embeddings ready to use!

Define a model that does not overfit

Now you need to define a model that will handle the problem at hand while not overfitting.

A couple of things to note / hints:

The first layer is provided so you can see how the Embedding layer is configured when using pre-trained embeddings

You can try different combinations of layers covered in previous ungraded labs such as:

GlobalMaxPooling1D

MaxPooling1D

Bidirectional(LSTM)

The last two layers should be Dense layers.

There multiple ways of solving this problem. So try an architecture that you think will not overfit.

Try simpler architectures first to avoid long training times. Architectures that are able to solve this problem usually have around 3-4 layers (excluding the last two Dense ones)

Include at least one Dropout layer to mitigate overfitting.

To pass this assignment your val_loss (validation loss) should either be flat or decreasing.

Although a flat val_loss and a lowering train_loss (or just loss ) also indicate some overfitting what you really want to avoid is having a lowering train_loss and an increasing val_loss .

With this in mind, the following three curves will be acceptable solutions:

While the following would not be able to pass the grading:

Run the following cell to check your loss curves:

If you wish so, you can also check the training and validation accuracies of your model:

A more rigorous way of setting the passing threshold of this assignment is to use the slope of your val_loss curve.

To pass this assignment the slope of your val_loss curve should be 0.0005 at maximum.

If your model generated a validation loss curve that meets the criteria above, run the following cell and then submit your assignment for grading. Otherwise, try with a different architecture.

Congratulations on finishing this week's assignment!

You have successfully implemented a neural network capable of classifying sentiment in text data while doing a fairly good job of not overfitting! Nice job!

Keep it up!

Important Note: Please only do this when you've already passed the assignment to avoid problems with the autograder.

  • On the notebook’s menu, click “View” > “Cell Toolbar” > “Edit Metadata”
  • Hit the “Edit Metadata” button next to the code cell which you want to lock/unlock
  • “true” if you want to unlock it
  • “false” if you want to lock it
  • On the notebook’s menu, click “View” > “Cell Toolbar” > “None”

deep learning week 3 assignment

Deep-Learning-Specialization

Coursera deep learning specialization, neural networks and deep learning.

In this course, you will learn the foundations of deep learning. When you finish this class, you will:

  • Understand the major technology trends driving Deep Learning.
  • Be able to build, train and apply fully connected deep neural networks.
  • Know how to implement efficient (vectorized) neural networks.
  • Understand the key parameters in a neural network’s architecture.

Week 1: Introduction to deep learning

Be able to explain the major trends driving the rise of deep learning, and understand where and how it is applied today.

  • Quiz 1: Introduction to deep learning

Week 2: Neural Networks Basics

Learn to set up a machine learning problem with a neural network mindset. Learn to use vectorization to speed up your models.

  • Quiz 2: Neural Network Basics
  • Programming Assignment: Python Basics With Numpy
  • Programming Assignment: Logistic Regression with a Neural Network mindset

Week 3: Shallow neural networks

Learn to build a neural network with one hidden layer, using forward propagation and backpropagation.

  • Quiz 3: Shallow Neural Networks
  • Programming Assignment: Planar Data Classification with Onehidden Layer

Week 4: Deep Neural Networks

Understand the key computations underlying deep learning, use them to build and train deep neural networks, and apply it to computer vision.

  • Quiz 4: Key concepts on Deep Neural Networks
  • Programming Assignment: Building your Deep Neural Network Step by Step
  • Programming Assignment: Deep Neural Network Application

Course Certificate

Certificate

APDaga DumpBox : The Thirst for Learning...

  • 🌐 All Sites
  • _APDaga DumpBox
  • _APDaga Tech
  • _APDaga Invest
  • _APDaga Videos
  • 🗃️ Categories
  • _Free Tutorials
  • __Python (A to Z)
  • __Internet of Things
  • __Coursera (ML/DL)
  • __HackerRank (SQL)
  • __Interview Q&A
  • _Artificial Intelligence
  • __Machine Learning
  • __Deep Learning
  • _Internet of Things
  • __Raspberry Pi
  • __Coursera MCQs
  • __Linkedin MCQs
  • __Celonis MCQs
  • _Handwriting Analysis
  • __Graphology
  • _Investment Ideas
  • _Open Diary
  • _Troubleshoots
  • _Freescale/NXP
  • 📣 Mega Menu
  • _Logo Maker
  • _Youtube Tumbnail Downloader
  • 🕸️ Sitemap

Coursera: Neural Networks and Deep Learning (Week 3) [Assignment Solution] - deeplearning.ai

▸  planar data classification with one hidden layer. i have recently completed the neural networks and deep learning course from coursera by deeplearning.ai while doing the course we have to go through various quiz and assignments in python. here, i am sharing my solutions for the weekly assignments throughout the course. these solutions are for reference only. >  it is recommended that you should solve the assignments by yourself honestly then only it makes sense to complete the course. >  but, in case you stuck in between, feel free to refer to the solutions provided by me., don't just copy paste the code for the sake of completion.  even if you copy the code, make sure you understand the code first. click here:  coursera: neural networks & deep learning (week 2) click here:  coursera: neural networks & deep learning (week 4a) scroll down  for  coursera: neural networks & deep learning (week 3) assignments . recommended machine learning courses: coursera: machine learning    coursera: deep learning specialization coursera: machine learning with python coursera: advanced machine learning specialization udemy: machine learning linkedin: machine learning eduonix: machine learning edx: machine learning fast.ai: introduction to machine learning for coders, planar data classification with one hidden layer.

  • Implement a 2-class classification neural network with a single hidden layer
  • Use units with a non-linear activation function, such as tanh
  • Compute the cross entropy loss
  • Implement forward and backward propagation

1 - Packages

  • numpy  is the fundamental package for scientific computing with Python.
  • sklearn  provides simple and efficient tools for data mining and data analysis.
  • matplotlib  is a library for plotting graphs in Python.
  • testCases provides some test examples to assess the correctness of your functions
  • planar_utils provide various useful functions used in this assignment

2 - Dataset

3 - simple logistic regression, check-out our free tutorials on iot (internet of things):.

4 - Neural Network model

deep learning week 3 assignment

4.1 - Defining the neural network structure

4.2 - initialize the model's parameters.

  • Make sure your parameters' sizes are right. Refer to the neural network figure above if needed.
  • Use:  np.random.randn(a,b) * 0.01  to randomly initialize a matrix of shape (a,b).
  • You will initialize the bias vectors as zeros.
  • Use:  np.zeros((a,b))  to initialize a matrix of shape (a,b) with zeros.

4.3 - The Loop

  • Look above at the mathematical representation of your classifier.
  • You can use the function  sigmoid() . It is built-in (imported) in the notebook.
  • You can use the function  np.tanh() . It is part of the numpy library.
  • Retrieve each parameter from the dictionary "parameters" (which is the output of  initialize_parameters() ) by using  parameters[".."] .
  • Implement Forward Propagation. Compute  Z [ 1 ] , A [ 1 ] , Z [ 2 ] Z [ 1 ] , A [ 1 ] , Z [ 2 ]  and  A [ 2 ] A [ 2 ]  (the vector of all your predictions on all the examples in the training set).
  • Values needed in the backpropagation are stored in " cache ". The  cache  will be given as an input to the backpropagation function.
  • There are many ways to implement the cross-entropy loss. To help you, we give you how we would have implemented  − ∑ i = 0 m y ( i ) log ( a [ 2 ] ( i ) ) : logprobs = np . multiply ( np . log ( A2 ), Y ) cost = - np . sum ( logprobs ) # no need to use a for loop!

deep learning week 3 assignment

  • To compute dZ1 you'll need to compute  g [ 1 ] ′ ( Z [ 1 ] ) . Since  g [ 1 ] ( . )  is the tanh activation function, if  a = g [ 1 ] ( z )  then  g [ 1 ] ′ ( z ) = 1 − a 2 . So you can compute  g [ 1 ] ′ ( Z [ 1 ] )  using  (1 - np.power(A1, 2)) .

4.4 - Integrate parts 4.1, 4.2 and 4.3 in nn_model()

4.5 predictions.

deep learning week 3 assignment

4.6 - Tuning hidden layer size (optional/ungraded exercise)

deep learning week 3 assignment

  • The larger models (with more hidden units) are able to fit the training set better, until eventually the largest models overfit the data.
  • The best hidden layer size seems to be around n_h = 5. Indeed, a value around here seems to fits the data well without also incurring noticable overfitting.
  • You will also learn later about regularization, which lets you use very large models (such as n_h = 50) without much overfitting.
  • What happens when you change the tanh activation for a sigmoid activation or a ReLU activation?
  • Play with the learning_rate. What happens?
  • What if we change the dataset? (See part 5 below!)
  • Build a complete neural network with a hidden layer
  • Make a good use of a non-linear unit
  • Implemented forward propagation and backpropagation, and trained a neural network
  • See the impact of varying the hidden layer size, including overfitting.

5) Performance on other datasets

deep learning week 3 assignment

  • http://scs.ryerson.ca/~aharley/neural-networks/
  • http://cs231n.github.io/neural-networks-case-study/
  • coursera.org

deep learning week 3 assignment

You can get the course's at minimal costs. Some of the courses on Coursera are free as well. You can also apply for free aid or audit the coursers on Coursrera itself.

deep learning week 3 assignment

how to submit files on coursera

Our website uses cookies to improve your experience. Learn more

Contact form

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

Deep Learning Specialization by Andrew Ng on Coursera.

Kulbear/deep-learning-coursera

Folders and files, repository files navigation, deep learning specialization on coursera.

Master Deep Learning, and Break into AI

Instructor: Andrew Ng

Introduction

This repo contains all my work for this specialization. All the code base, quiz questions, screenshot, and images, are taken from, unless specified, Deep Learning Specialization on Coursera .

What I want to say

VERBOSE CONTENT WARNING: YOU CAN JUMP TO THE NEXT SECTION IF YOU WANT

As a CS major student and a long-time self-taught learner, I have completed many CS related MOOCs on Coursera, Udacity, Udemy, and Edx. I do understand the hard time you spend on understanding new concepts and debugging your program. There are discussion forums on most MOOC platforms, however, even a question with detailed description may need some time to be answered. Here I released these solutions, which are only for your reference purpose . It may help you to save some time. And I hope you don't copy any part of the code (the programming assignments are fairly easy if you read the instructions carefully), see the quiz solutions before you start your own adventure. This course is almost the simplest deep learning course I have ever taken, but the simplicity is based on the fabulous course content and structure. It's a treasure given by deeplearning.ai team.

Currently, this repo has 3 major parts you may be interested in and I will give a list here.

Programming Assignments

Course 1: Neural Networks and Deep Learning

  • Week 2 - PA 1 - Logistic Regression with a Neural Network mindset
  • Week 3 - PA 2 - Planar data classification with one hidden layer
  • Week 4 - PA 3 - Building your Deep Neural Network: Step by Step¶
  • Week 4 - PA 4 - Deep Neural Network for Image Classification: Application

Course 2: Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization

  • Week 1 - PA 1 - Initialization
  • Week 1 - PA 2 - Regularization
  • Week 1 - PA 3 - Gradient Checking
  • Week 2 - PA 4 - Optimization Methods
  • Week 3 - PA 5 - TensorFlow Tutorial

Course 3: Structuring Machine Learning Projects

  • There is no PA for this course. But this course comes with very interesting case study quizzes.

Course 4: Convolutional Neural Networks

  • Week 1 - PA 1 - Convolutional Model: step by step
  • Week 1 - PA 2 - Convolutional Model: application
  • Week 2 - PA 1 - Keras - Tutorial - Happy House
  • Week 2 - PA 2 - Residual Networks

Course 5: Sequence Models

  • Week 1 - PA 1 - Building a Recurrent Neural Network - Step by Step
  • Week 1 - PA 2 - Character level language model - Dinosaurus land

Quiz Solutions

There are concerns that some people may use the content here to quickly ace the course so I'll no longer update any quiz solution.

  • Week 1 Quiz - Introduction to deep learning
  • Week 2 Quiz - Neural Network Basics
  • Week 3 Quiz - Shallow Neural Networks
  • Week 4 Quiz - Key concepts on Deep Neural Networks
  • Week 1 Quiz - Practical aspects of deep learning
  • Week 2 Quiz - Optimization algorithms
  • Week 3 Quiz - Hyperparameter tuning, Batch Normalization, Programming Frameworks
  • Week 1 Quiz - Bird recognition in the city of Peacetopia (case study)
  • Week 2 Quiz - Autonomous driving (case study)

- Course 4: Convolutional Neural Networks - Course 5: Sequence Models

## Important Slide Notes

I screenshotted some important slide page and store them into GitHub issues. It seems not very helpful for everyone since I only keep those I think may be useful to me.

- Screenshots for Course 1: Neural Networks and Deep Learning

- Screenshots for Course 2: Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization

- Screenshots for Course 3: Structuring Machine Learning Projects

- Screenshots for Course 4: Convolutional Neural Networks

- Screenshots for Course 5: Sequence Models

  • 2017-08-17 : Finished the first-released 3 courses, YAY! 😈

Contributors 5

@Kulbear

  • Jupyter Notebook 100.0%

deep learning week 3 assignment

  • Live TV stream

deep learning week 3 assignment

Portland hosting New England Brownfields Summit this week

Portland is hosting a New England Brownfields Summit this week, where hundreds of stakeholders are learning about programs to transform the blighted properties into economic engines.

Brownfield grants and low interest loans can help communities clean up land and water contaminated by mills and military chemical storage.

Thompson's Point in Portland was once a port for the railroad and shipping industries but by the 1940s was relegated to industrial manufacturing and storage.

Mayor Mark Dion said a brownfields grant served as a catalyst to create the bustling event and business venue it is today.

"Every investment in brownfields has resulted in significant activity. Every one of those districts that were abandoned, and now it's people. It's the reason people come here to our city," Dion said.

The Environmental Protection Agency has announced $86 million in brownfield grants are coming to New England. Maine is getting $17 million of brownfield grants for 8 projects that will tackle contamination of mill sites and adjacent waterways.

EPA New England Administrator David Cash Wednesday told hundreds of waste management officials at the summit that environmental justice is a goal of federal brownfield grants.

"Almost by definition many of these broken down mills have suffered under overburden of pollution and have been underserved by the government. Here's an opportunity to invest in these communities. It's a real opportunity to bring justice to communities that have been denied that," Cash said.

Tribal leaders said that some of their land that was taken for industrial and military uses over 75 years ago became a toxic waste dump and destroyed a tribal village.

Corey Hinton of the Passamaquoddy Tribe described the devastation after the Eastern Surplus Company took over.

"Meddybemps, the place where toxic military chemicals were improperly stored for many, many years and destroyed an ancestral village our people inhabited a thousand years ago," Hinton said.

The EPA designated Meddybemps as a Superfund site in the 1980s. The tribe assisted in the remediation of Meddybemps and returned the site to the robust landscape it once was and to tribal control in 2021.

Hinton said federal funds and redevelopment projects like Meddybemps are desperately needed in Washington and Aroostook Counties as well to tackle polluted properties and make them safe and clean for the tribal members who live there.

Cash said Maine is getting $17 million of federal grants for 8 projects. The Environmental Protection Agency has announced $300 million in brownfield grants for projects across the country.

The agency is also providing $11 million in supplemental funding to four successful existing Revolving Loan Fund Grant programs in Maine that are in the process of cleaning up brownfield sites: Eastern Maine Development Corporation, Maine Department of Economic and Community Development, Piscataquis County Economic Development Council, and Southern Maine Planning and Development Commission.

Learn more about the 2024 Multipurpose, Assessment and Cleanup applicants selected for brownfield grant funding.

Need help with Week 3 Coding assignment

Hi all. I am at week 3 programming assignment of Supervised Machine learning regression and classification and need assistance as it requires minimum of 80% to pass the course. However, i am stuck at 66% and unable to understand what was wrongly done for Q3 and Q6. Can anyone please help? Lab ID: qseajjfjtule. Can anyone please correct me what is wrongly done in below image?

{moderator edit: code removed}

Please do not share your code on the forum. That is not allowed by the Code of Conduct.

Sorry for this. I will keep it in mind.

Please post a copy of the error message or assert that you see when you run the notebook tests. (Not the grader - what do you get when you run all of the cells in the notebook?)

image

That’s the grader report.

Does your code pass all of the tests in the notebook?

Check your personal messages for more information.

I am having exactly same issue for same assignment.

Not sure what is wrong. All tests are passing.

Then what shall we do here? Can anyone please help us with this assignment?

… all except for the tests that the grader uses. The grader uses different test than the notebook. Passing the notebook’s tests does not prove your code is perfect.

Check for issues like use of global variables, or hard-coded index values. Those are the most common mistakes.

The first step is for your code to pass all of the tests in the notebook. If you have a specific error message, you can post it here and we can discuss it.

Ok, I found the problem , Issue is not with my code but the submission requires filename to be same as given. I created a copy out of it and renamed it , so that I do not loose text in my original. I was expecting running notebook to be submitted as final submission. But it was submitting one from disk which was incomplete.

That’s a bummer, if I am allowed to create new notebooks, I would expect be able to submit any of the notebook as my submission.

The grader always looks for the default name for that notebook. It’s a limitation of Coursera’s platform.

This is the exact same issue I have been dealing with for the last two weeks. I have been working over 12 hours on this lab and all answers are correct but still get error message on the same exact sections. There is no place in the code I am supposed to be writing that addresses the topic of global variables or hard-coded index values. Have entered over ten variations on the code including the EXACT code Coursera hints at.

Please check your personal messages for instructions.

IMO - instead of using the nested for loops given in the notebook, you could do everything in one line using the following functions: np.dot, np.sum, np.log and functions we have defined already (sigmoid and the compute_gradient).

But I think for a better understanding of how parameters are updated iteratively we see the nested for loops.

IMAGES

  1. NPTEL| Deep Learning| Week-3 Assignment Solutions

    deep learning week 3 assignment

  2. deep learning week 3 assignment by Boost My Class

    deep learning week 3 assignment

  3. Deep Learning week 3 NPTEL assignment 3 answers #nptel #swayam

    deep learning week 3 assignment

  4. NPTEL DEEP LEARNING week-3 Assignment-3

    deep learning week 3 assignment

  5. NPTEL Deep Learning Week 3 Assignment Answers 2024

    deep learning week 3 assignment

  6. NPTEL Deep Learning Week 3 Assignment Answers 2024

    deep learning week 3 assignment

VIDEO

  1. nptel deep learning week-9, assignment-9

  2. Nptel deep learning week 3 assignment 3 #nptel #swayam #deeplearning #dl #cnn #rnn

  3. Deep learning week 4 assignment solutions #nptel #deeplearning

  4. nptel deep learning week-7 assignment-7

  5. NPTEL Python for Data Science Week 3 Quiz answers with detailed proof of each answer

  6. Week3Machine Learning, ML|ANSWERS|NPTEL Swayam|Jan-April2024|Assignment1

COMMENTS

  1. amanchadha/coursera-deep-learning-specialization

    Notes, programming assignments and quizzes from all courses within the Coursera Deep Learning specialization offered by deeplearning.ai: (i) Neural Networks and Deep Learning; (ii) Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization; (iii) Structuring Machine Learning Projects; (iv) Convolutional Neural Networks; (v) Sequence Models - amanchadha/coursera-deep ...

  2. y33-j3T/Coursera-Deep-Learning

    Contribute to y33-j3T/Coursera-Deep-Learning development by creating an account on GitHub. ... Week 3 - Transfer Learning. Programming Assignment: Exercise 3 - Horses vs. humans using Transfer Learning ... Programming Assignment: Deep Convolutional GAN (DCGAN) Week 3 - Wasserstein GANs with Gradient Penalty. Lab: (Optional) SN-GAN ...

  3. GitHub

    Course 1: Neural Networks and Deep Learning. Learning Objectives: Understand the major technology trends driving Deep Learning; Be able to build, train and apply fully connected deep neural networks ... Week 3 - Programming Assignment 2 - Planar data classification with one hidden layer; Week 4 - Programming Assignment 3 - Building your Deep ...

  4. NPTEL Deep Learning Week 3 Assignment Answers

    #deeplearning #nptel #npteldeeplearning Deep Learning In this video, we're going to unlock the answers to the Deep Learning questions from the NPTEL 2024 Jan...

  5. Week 3

    Course 1 - Neural Networks and Deep Learning Course 1 - Neural Networks and Deep Learning Week 1 Week 2 Week 3 Week 3 Table of contents. Week 3: Shallow neural networks Neural network overview Neural network Representation Computing a Neural Networks Output Vectorizing across multiple examples

  6. Deep Learning

    The availability of huge volume of Image and Video data over the internet has made the problem of data analysis and interpretation a really challenging task....

  7. CoCalc -- C3W3_Assignment.ipynb

    Week 3: Exploring Overfitting in NLP. Welcome to this assignment! During this week you saw different ways to handle sequence-like data. You saw how some Keras' layers such as GRU, Conv and LSTM can be used to tackle problems in this space. Now you will put this knowledge into practice by creating a model architecture that does not overfit.

  8. Neural Networks and Deep Learning

    Quiz 3: Shallow Neural Networks; Programming Assignment: Planar Data Classification with Onehidden Layer; Week 4: Deep Neural Networks. Understand the key computations underlying deep learning, use them to build and train deep neural networks, and apply it to computer vision. Quiz 4: Key concepts on Deep Neural Networks

  9. Deep Learning

    #nptl #deeplearning #nptlanswersCOURSE- Deep LearningORGANIZATON- IITPLATFORM- SWAYAMIn this video, you can solutions for assignment 3 - Deep Learning.NOTE -...

  10. Week 3 assignment

    Week 3 assignment. Course Q&A. Deep Learning Specialization. Neural Networks and Deep Learning. Kushagra_luthra May 12, 2021, 5:58am 1. Hi Everyone, Initially in the exercise 1 when 1 give the command to know the size of training set I got the output as The shape of X is: (2, 400) The shape of Y is: (1, 400) ...

  11. muhac/coursera-deep-learning-solutions

    Solutions of Deep Learning Specialization by Andrew Ng on Coursera - muhac/coursera-deep-learning-solutions. ... Week 1 - Introduction to Deep Learning. Introduction to deep learning; Week 2 - Neural Networks Basics. Neural Network Basics; Week 3 - Shallow Neural Networks.

  12. Coursera: Neural Networks and Deep Learning (Week 3) [Assignment

    I have recently completed the Neural Networks and Deep Learning course from Coursera by deeplearning.ai. While doing the course we have to go through various quiz and assignments in Python. Here, I am sharing my solutions for the weekly assignments throughout the course. These solutions are for reference only.

  13. Week 3,Programming Assignment

    Week 3,Programming Assignment. Course Q&A. Deep Learning Specialization. Neural Networks and Deep Learning. starboy August 10, 2021, 5:57am 1. 1.1 1477×457 28.4 KB. 1.2 1108×223 22.8 KB. ... In this case, the docs are the assignment. The general strategy is to visualize visualize visualize. So play around with the dataset and learn what it ...

  14. Course 2 week 3 programming assignment

    That's good to hear! For anyone else who sees this thread, here's the "checklist" for problems on this function. I'm guessing that #4 was the issue here.

  15. Deep Learning

    Deep Learning NPTEL 2024 || WEEK 3 ASSIGNMENT SOLUTION | |Your Queries : deep learningdeep learning nptel 2023deep learning - iit ropar week 11deep learning ...

  16. haocai1992/Deep-Learning-Specialization

    Week 1: Practical Aspects of Deep Learning - notes, quizzes and assignments; Week 2: Optimization Algorithms - notes, quizzes and assignments; Week 3: Hyperparameter Tuning, Batch Normalization and Programming Frameworks - notes, quizzes and assignments

  17. NPTEL Deep Learning Week 3 Assignment Answers 2024

    Answer :- For Answer Click Here. 2. Suppose there is a feature vector represented as [1, 4, 3]. What is the distance of this feature vector from the separating plane x1+ 2×2- 2×3 + 3 = 0. Choose the correct option.

  18. Week 3 assignment problem

    Hi, you need to indent your code correctly. I suggest you to spend a few minutes to read this post, and make changes to your code for indentation.Also, you have 3 code lines starting with z_wb, and I suppose the first one is not intended to be there and please remove it.. Raymond. PS: we can't share assignment code here, so I am removing it.

  19. Deep Learning

    Deep Learning - IIT Ropar Week 3 : Assignment 3 Answers || July-Dec 2023 || NPTEL1. https://youtu.be/iA9qQgRd0v42. Join telegram Channel -- https://t.me/dou...

  20. Deep Learning Specialization on Coursera

    Programming Assignments. Course 1: Neural Networks and Deep Learning. Week 2 - PA 1 - Logistic Regression with a Neural Network mindset. Week 3 - PA 2 - Planar data classification with one hidden layer. Week 4 - PA 3 - Building your Deep Neural Network: Step by Step¶. Week 4 - PA 4 - Deep Neural Network for Image Classification: Application.

  21. Portland hosting New England Brownfields Summit this week

    Maine Public. Portland is hosting a New England Brownfields Summit this week, where hundreds of stakeholders are learning about programs to transform the blighted properties into economic engines. Brownfield grants and low interest loans can help communities clean up land and water contaminated by mills and military chemical storage.

  22. Week3 assignment issue

    CNN course week3 first assignment instruction has issue: the instruction says: For this exercise, a box is defined using its two corners: upper left (𝑥1,𝑦1) and lower right (𝑥2,𝑦2), instead of using the midpoint, height and width. ... Deep Learning Specialization. Convolutional Neural Networks. week-3. Dennis_Sinitsky January 23 ...

  23. Opinion: Gabby Douglas' attempted comeback may have been her ...

    In the end, Ross abandoned her bid for an Olympic comeback. Now, Gabby Douglas has ended her Olympic quest as well, citing an ankle injury she sustained while training for this weekend's US ...

  24. Week 3 assignment: bug in the test

    Week 3 assignment: bug in the test. Course Q&A. Deep Learning Specialization. Neural Networks and Deep Learning. anon57530071 June 14, 2022, 1:28pm 91. Good to know that you completed this assignment ! Here is a good article which may help you. FAQ: Frequently Asked Questions for all DLS Courses. Good luck ! ...

  25. Need help with Week 3 Coding assignment

    Hi all. I am at week 3 programming assignment of Supervised Machine learning regression and classification and need assistance as it requires minimum of 80% to pass the course. However, i am stuck at 66% and unable to understand what was wrongly done for Q3 and Q6. Can anyone please help? Lab ID: qseajjfjtule. Can anyone please correct me what is wrongly done in below image? {moderator edit ...