Play with a live Neptune project -> Take a tour 📈

Transfer Learning Guide: A Practical Tutorial With Examples for Images and Text in Keras

It can take weeks to train a neural network on large datasets. Luckily, this time can be shortened thanks to model weights from pre-trained models – in other words, applying transfer learning .

Transfer learning is a technique that works in image classification tasks and natural language processing tasks. In this article, you’ll dive into:

  • what transfer learning is,
  • how to implement transfer learning (in Keras),
  • transfer learning for image classification,
  • transfer learning for natural language processing

Well then, let’s start learning! (no pun intended… ok, maybe a little) 

What is transfer learning?

Transfer learning is about leveraging feature representations from a pre-trained model , so you don’t have to train a new model from scratch. 

The pre-trained models are usually trained on massive datasets that are a standard benchmark in the computer vision frontier. The weights obtained from the models can be reused in other computer vision tasks. 

These models can be used directly in making predictions on new tasks or integrated into the process of training a new model. Including the pre-trained models in a new model leads to lower training time and lower generalization error.  

Transfer learning is particularly very useful when you have a small training dataset. In this case, you can, for example, use the weights from the pre-trained models to initialize the weights of the new model. As you will see later, transfer learning can also be applied to natural language processing problems. 

Transfer learning idea

The advantage of pre-trained models is that they are generic enough for use in other real-world applications. For example:

  • models trained on the ImageNet can be used in real-world image classification problems. This is because the dataset contains over 1000 classes. Let’s say you are an insect researcher. You can use these models and fine-tune them to classify insects. 
  • classifying text requires knowledge of word representations in some vector space. You can train vector representations yourself. The challenge here is that you might not have enough data to train the embeddings. Furthermore, training will take a long time. In this case, you can use a pre-trained word embedding like GloVe to hasten your development process.  

You will explore these use cases in a moment.

What is the difference between transfer learning and fine-tuning?

Fine-tuning is an optional step in transfer learning. Fine-tuning will usually improve the performance of the model. However, since you have to retrain the entire model, you’ll likely overfit. 

case study of transfer learning

Overfitting is avoidable. Just retrain the model or part of it using a low learning rate . This is important because it prevents significant updates to the gradient. These updates result in poor performance. Using a callback to stop the training process when the model has stopped improving is also helpful. 

Why use transfer learning?

Assuming you have 100 images of cats and 100 dogs and want to build a model to classify the images. How would you train a model using this small dataset? You can train your model from scratch, but it will most likely overfit horribly. Enter transfer learning. Generally speaking, there are two big reasons why you want to use transfer learning:

  • training models with high accuracy requires a lot of data . For example, the ImageNet dataset contains over 1 million images. In the real world, you are unlikely to have such a large dataset. 
  • assuming that you had that kind of dataset, you might still not have the resources required to train a model on such a large dataset. Hence transfer learning makes a lot of sense if you don’t have the compute resources needed to train models on huge datasets. 
  • even if you had the compute resources at your disposal, you still have to wait for days or weeks to train such a model . Therefore using a pre-trained model will save you precious time. 

When does transfer learning not work?

Transfer learning will not work when the high-level features learned by the bottom layers are not sufficient to differentiate the classes in your problem. For example, a pre-trained model may be very good at identifying a door but not whether a door is closed or open. In this case, you can use the low-level features (of the pre-trained network) instead of the high-level features. In this case, you will have to retrain more layers of the model or use features from earlier layers. 

When datasets are not similar, features transfer poorly. This paper investigates the similarity of datasets in more detail. That said, as shown in the paper, initializing the network with pre-trained weights results in better performance than using random weights. 

You might find yourself in a situation where you consider the removal of some layers from the pre-trained model. Transfer learning is unlikely to work in such an event. This is because removing layers reduces the number of trainable parameters, which can result in overfitting.  Furthermore, determining the correct number of layers to remove without overfitting is a cumbersome and time-consuming process. 

Transfer learning image

How to implement transfer learning?

Let’s now take a moment and look at how you can implement transfer learning. 

Transfer learning in 6 steps

You can implement transfer learning in these six general steps. 

Transfer learning steps

Obtain the pre-trained model

The first step is to get the pre-trained model that you would like to use for your problem. The various sources of pre-trained models are covered in a separate section. 

Create a base model

Usually, the first step is to instantiate the base mode l using one of the architectures such as ResNet or Xception. You can also optionally download the pre-trained weights . If you don’t download the weights, you will have to use the architecture to train your model from scratch. Recall that the base model will usually have more units in the final output layer than you require. When creating the base model, you, therefore, have to remove the final output layer. Later on, you will add a final output layer that is compatible with your problem. 

Transfer learning base model

Freeze layers so they don’t change during training

Freezing the layers from the pre-trained model is vital. This is because you don’t want the weights in those layers to be re-initialized . If they are, then you will lose all the learning that has already taken place. This will be no different from training the model from scratch. 

Fine tuning pretrained network

Add new trainable layers 

The next step is to add new trainable layers that will turn old features into predictions on the new dataset. This is important because the pre-trained model is loaded without the final output layer. 

New trainable layers

Train the new layers on the dataset

Remember that the pre-trained model’s final output will most likely be different from the output that you want for your model. For example, pre-trained models trained on the ImageNet dataset will output 1000 classes. However, your model might just have two classes. In this case, you have to train the model with a new output layer in place. 

Therefore, you will add some new dense layers as you please, but most importantly, a final dense layer with units corresponding to the number of outputs expected by your model . 

Improve the model via fine-tuning

Once you have done the previous step, you will have a model that can make predictions on your dataset. Optionally, you can improve its performance through fine-tuning . Fine-tuning is done by unfreezing the base model or part of it and training the entire model again on the whole dataset at a very low learning rate. The low learning rate will increase the performance of the model on the new dataset while preventing overfitting. 

The learning rate has to be low because the model is quite large while the dataset is small. This is a recipe for overfitting, hence the low learning rate. Recompile the model once you have made these changes so that they can take effect. This is because the behavior of a model is frozen whenever you call the compile function. That means that you have to call the compile function again whenever you want to change the model’s behavior. The next step will be to train the model again while monitoring it via callbacks to ensure it does not overfit. 

Freeze layers

Pretty straightforward, eh?

Where to find pre-trained models?

Let’s now talk about where you can find pre-trained models to use in your applications. 

Keras pre-trained models

There are more than two dozen pre-trained models available from Keras. They’re served via Keras applications . You get pre-trained weights alongside each model. When you download a model, the weights are downloaded automatically. They will be stored in `~/.keras/models/.` All the Keras applications are used for image tasks. For instance, here is how you can initialize the MobileNet architecture trained on ImageNet. 

Transfer learning using TensorFlow Hub

It’s worth mentioning that Keras applications are not your only option for transfer learning tasks. You can also use models from TensorFlow Hub .

See how you can track Keras model traning with Neptune’s integration with TensorFlow/Keras

Pretrained word embeddings

Word embeddings are usually used for text classification problems. In as much as you can train your word embeddings, using a pre-trained one is much quicker. Here are a couple of word embeddings that you can consider for your natural language processing problems:

  • GloVe(Global Vectors for Word Representation) by Stanford
  • Google’s Word2vec trained on around 1000 billion words from Google News
  • Fasttext English vectors 

Training, Visualizing, and Understanding Word Embeddings: Deep Dive Into Custom Datasets

Hugging face

Hugging Face provides thousands of pre-trained models for performing tasks on texts. Some of the supported functions include:

  • question answering 
  • summarization 
  • translation and 
  • text generation, to mention a few

Over 100 languages are supported by Hugging face. 

Here’s an example of how you can use Hugging face to classify negative and positive sentences. 

How you can use pre-trained models

There are three ways to use a pre-trained model:

  • prediction ,
  • feature extraction ,
  • fine-tuning .

Here, you download the model and immediately use it to classify new images. Here is an example of ResNet50 used to classify ImageNet classes. 

ImageNet is an extensive collection of images that have been used to train models, including ResNet50. There are over 1 million images and 1000 classes in this dataset.

Feature extraction

In this case, the output of the layer before the final layer is fed as input to a new model. The goal is to use the pre-trained model, or a part of it, to pre-process images and get essential features. 

Then, you pass these features to a new classifier—no need to retrain the base model. The pre-trained convolutional neural network already has features that are important to the task at hand. 

Feature extraction

However, the pre-trained model’s final part doesn’t transfer over because it’s specific to its dataset. So, you have to build the last part of your model to fit your dataset.

In the natural language processing realm, pre-trained word embedding can be used for feature extraction. The word embeddings help to place words in their right position in a vector space. They provide relevant information to a model because they can contextualize words in a sentence. The main objective of word embeddings is semantic understanding and the relationship between words. As a result, these word embeddings are task agnostic for natural language problems. 

Fine-tuning

When your new classifier is ready, you can use fine-tuning to improve its accuracy . To do this, you unfreeze the classifier , or part of it, and retrain it on new data with a low learning rate. Fine-tuning is critical if you want to make feature representations from the base model (obtained from the pre-trained model) more relevant to your specific task. 

You can also use weights from the pre-trained model to initialize weights in a new model. The best choice here depends on your problem, and you might need to experiment a bit before you get it right. 

Still, there is a standard workflow you can use to apply transfer learning. 

Let’s check it out. 

Example of transfer learning for images with Keras 

With that background in place, let’s look at how you can use pre-trained models to solve image and text problems. Whereas there are many steps involved in training a model, the focus will be on those six steps specific to transfer learning. 

CHECK LATER

Neptune’s Integration With Keras

Transfer learning with image data

In this illustration, let’s take a look at how you can use a pre-trained model to build and fine-tune an image classifier. Let’s assume that you are a pet lover and you would like to create a machine learning model to classify your favorite pets; cats and dogs. Unfortunately, you don’t have enough data to do this. Fortunately, you are familiar with Kaggle and can get a small dataset. With that in place, you can now select a pre-trained model to use. Once you have chosen your pre-trained model , you can start training the model with Keras. To illustrate, let’s use the Xception architecture , trained on the ImageNet dataset.

If you’re coding along, follow this section step-by-step to apply transfer learning properly.

Getting the dataset

I recommend using Google Colab because you get free GPU computing. 

First, download the dataset into Colab’s virtual machine. 

After that, unzip the dataset and set the path to the training and validation set. 

Loading the dataset from a directory

Let’s now load the images from their location. The `image_dataset_from_directory` function can be used because it can infer class labels.

The function will create a ` tf.data.Dataset ` from the directory. Note that for this to work, the directory structure should look like this:

case study of transfer learning

Import the required modules and load the training and validation set. 

Data pre-processing

Whereas data pre-processing isn’t a specific step in transfer learning, it is an important step in training machine learning models in general. Let’s, therefore, apply some augmentation to the images. When you apply augmentation to a training set, you’re preventing overfitting, because augmentation exposes different aspects of the image to the model. 

You especially want to augment the data when there’s not a lot of data for training. You can augment it using various transformations, like:

  • random rotations,
  • horizontal flipping,

You can apply these transformations when loading the data. Alternatively, as you can see below, you can augment by introducing unique layers. 

These layers will only be applied during the training process.

You can see the result of the above transformations by applying the layers to the same image. Here’s the code:

And here’s how the result would look like (since the images are shuffled, you might get a different result): 

Transfer learning dogs 2

Create a base model from the pre-trained Xception model

Let’s load the model with the weights trained on ImageNet . When that’s done, the desired input shape is defined. 

`include_top=False` means that you’re not interested in the last layer of the model. Since models are visualized from bottom to top, that layer is referred to as the top layer. Excluding the top layers is important for feature extraction . 

Next, freeze the base model layers so that they’re not updated during the training process. 

Since many pre-trained models have a `tf.keras.layers.BatchNormalization` layer, it’s important to freeze those layers. Otherwise, the layer mean and variance will be updated, which will destroy what the model has already learned. Let’s freeze all the layers in this case.

Create the final dense layer

When loading the model, you used `include_top=False` meaning that the final dense layer of the pre-trained model wasn’t included. Now it’s time to define a final output layer for this model . 

Let’s start by standardizing the size of the input images.

After this, apply the data augmentation. 

This model expects data in the range of (-1,1) and not (0,1). So, you have to process the data. 

Luckily, most pre-trained models provide a function for doing that. 

Let’s now define the model as follows:

  • ensure that the base model is running in inference mode so that batch normalization layers are not updated during the fine-tuning stage (set `training=False`);
  • convert features from the base model to vectors , using `GlobalAveragePooling2D`;
  • apply dropout regularization;
  • add a final dense laye r (when you used `include_top=False,` the final output layer was not included, so you have to define your own).

Train the model

You can now train the top layer . Notice that since you’re using a pretrained model, validation accuracy starts at an already high value. 

Transfer learning epoch

Fine-tuning the model

The model can be improved by unfreezing the base model, and retraining it on a very low learning rate. 

You need to monitor this step because the wrong implementation can lead to overfitting. First, unfreeze the base model. 

After updating the trainable attribute, the model has to be compiled again to implement the change.

To prevent overfitting, let’s monitor training loss via a callback. Keras will stop training when the model doesn’t improve for five consecutive epochs. Let’s also use TensorBoard to monitor loss and accuracy. 

How to Make Your TensorBoard Projects Easy to Share and Collaborate On Deep Dive Into TensorBoard: Tutorial With Examples

OK, time to retrain the model . When it’s finished, you’ll notice a slight improvement from the previous model.

At this point, you have a working model for the cats and dogs classification dataset. 

If you were tracking this using an experimentation platform, you can now save the model and send it to your model registry. 

Example of transfer learning with natural language processing

In the natural language processing realm, you can use pre-trained word embeddings to solve text classification problems . Let’s take an example. 

A word embedding is a dense vector that represents a document. In the vector, words with similar meanings appear closer together. You can use the embedding layer in Keras to learn the word embeddings. Training word embeddings takes a lot of time, especially on large datasets, so let’s use word embeddings that have already been trained. 

A couple of popular pre-trained word embeddings are Word2vec and GloVe .

Word embeddings visualization

Let’s walk through a complete example using GloVe word embeddings in transfer learning. 

Loading the dataset

A sentiment analysis dataset will be used for this illustration. Before loading it, let’s import all the modules that are needed for this task. 

Next, download the dataset and load it in using Pandas.

Text sentiment

The goal is to predict the sentiment column above. Since this is text data, it has to be converted into numerical form because that’s what the deep learning model expects. 

Select the features, and the target then split the data into a training and testing set. 

Data Pre-processing

Since this is text data, it has to be processed to make it ready for the models. This is not specific to transfer learning in text classification, but to machine learning models in general. 

Tokenizing the words

To convert sentences into numerical representations, use `Tokenizer`. Tokenizer removes punctuation marks and special characters and converts the sentence to lowercase. 

Just create an instance of `tokenizer` and fit it to the training set. You have to define the size of vocabulary you want. An out-of-word token is also defined to represent words in the testing set that won’t be found in the vocabulary. 

You can use the word index to see how words are mapped to numbers.

Word index

Let’s convert the words to sequences so that a complete sequence of numbers can represent every sentence. This is done using `texts_to_sequences` from the tokenizer.

train sequences

Since the sentences have different lengths, the sequences will also have different lengths. But, the sequences need to have an equal length for the machine learning model. This can be achieved by truncating longer sentences and padding shorter ones with zeros. 

Using `post` for padding will add the zeros at the end of the sequences. `post` for the truncation type will truncate sentences longer than 100 at the end. 

train padded

Using GloVe Embeddings

Now, this is specific to transfer learning in natural language processing . First, let’s download the pre-trained word embeddings. 

Next, extract them into a temporary folder.

Now, use these word embeddings to create your own embedding layer . Load the Glove embeddings, and append them to a dictionary. 

Use this dictionary to create an embedding matrix for each word in the training set. To do this, get the embedding vector for each word using `embedding_index`.

In case a word isn’t found, zero will represent it. For example, here is the embedding vector for the word bakery.

embeddings index

Create the embedding layer

At this point, you can create the embedding layer. Here are a couple of things to note:

  • setting `trainable` to false is crucial because you want to make sure that this layer isn’t re-trained;
  • weights are set to the embedding matrix you just created ;
  • `len(word_index) + 1` is the size of the vocabulary with one added because zero is reserved for padding;
  • `input_length` is the length of input sequences.

Create the model 

You can now create the model using this embedding layer. Bidirectional LSTMs are used to ensure that information is passed backward and forward. 

Training the model 

You can now compile and train the model. 

The early stopping callback can be used to stop the training process when the model training stops improving. You can monitor model loss and accuracy using the TensorBoard callback. 

The performance of the model can be e using the `evaluate` function.

Nice! You have trained and tested a natural language processing model using pre-trained word embeddings. 

That’s all, folks!

In this article, you explored transfer learning, with examples of how to use it to develop models faster. You used pre-trained models in image classification and natural language processing tasks. I hope you enjoyed it, thank you for reading!

If you want to read more about Transfer Learning feel free to check other sources:

  • https://keras.io/guides/transfer_learning/
  • https://builtin.com/data-science/transfer-learning
  • https://towardsdatascience.com/a-comprehensive-hands-on-guide-to-transfer-learning-with-real-world-applications-in-deep-learning-212bf3b2f27a
  • https://www.tensorflow.org/tutorials/images/transfer_learning
  • https://machinelearningmastery.com/transfer-learning-for-deep-learning/
  • https://machinelearningmastery.com/how-to-use-transfer-learning-when-developing-convolutional-neural-network-models/
  • https://towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751
  • https://www.researchgate.net/post/What-is-the-difference-between-Transfer-Learning-vs-Fine-Tuning-vs-Learning-from-scratch
  • https://arxiv.org/pdf/1411.1792.pdf

Was the article useful?

More about transfer learning guide: a practical tutorial with examples for images and text in keras, check out our product resources and related articles below:, how to automate ml experiment management with ci/cd, building high-performing computer vision models with encord active and neptune.ai, scaling machine learning experiments with neptune.ai and kubernetes, building mlops capabilities at gitlab as a one-person ml platform team, explore more content topics:, manage your model metadata in a single place.

Join 50,000+ ML Engineers & Data Scientists using Neptune to easily log, compare, register, and share ML metadata.

  • Survey paper
  • Open access
  • Published: 22 October 2022

Transfer learning: a friendly introduction

  • Asmaul Hosna 1   na1 ,
  • Ethel Merry 1   na1 ,
  • Jigmey Gyalmo 1   na1 ,
  • Zulfikar Alom 1 ,
  • Zeyar Aung 2 &
  • Mohammad Abdul Azim   ORCID: orcid.org/0000-0001-5529-9482 1  

Journal of Big Data volume  9 , Article number:  102 ( 2022 ) Cite this article

17k Accesses

61 Citations

1 Altmetric

Metrics details

Infinite numbers of real-world applications use Machine Learning (ML) techniques to develop potentially the best data available for the users. Transfer learning (TL), one of the categories under ML, has received much attention from the research communities in the past few years. Traditional ML algorithms perform under the assumption that a model uses limited data distribution to train and test samples. These conventional methods predict target tasks undemanding and are applied to small data distribution. However, this issue conceivably is resolved using TL. TL is acknowledged for its connectivity among the additional testing and training samples resulting in faster output with efficient results. This paper contributes to the domain and scope of TL, citing situational use based on their periods and a few of its applications. The paper provides an in-depth focus on the techniques; Inductive TL, Transductive TL, Unsupervised TL, which consists of sample selection, and domain adaptation, followed by contributions and future directions.

Introduction

People can hardly afford the luxury of investing resources in data gathering in today’s world since they are rare, inaccessible, often expensive, and difficult to compile. As a result, most people found a better means of data collection: one of the ways is to transfer knowledge between the tasks [ 1 ]. This philosophy has inspired Transfer Learning(TL): to improve data gathering and learn in machine learning (ML) using the data compiled before it has been introduced. Most of the algorithms of ML are to predict future outcomes, which are traditionally in the interest of addressing tasks in isolation [ 2 ]. Whereas TL does the otherwise, it bridges the data from the source and targets the task to find a solution, perhaps a better one.

TL aims to improve understanding of the current task by relating it to other tasks performed at different periods but through a related source domain. Figure  1 explains the improvement brought by using the TL strategy in ML. It enhances learning by creating a relation between previous tasks and the target task, providing logical, faster, and better solutions. TL attempts to provide an efficient manner of learning and communication between the source task and the target task, making learning debatable [ 3 ].In addition, TL is most applicable when there is a limited supply of target training data. The strategic use of TL is that not only among the performed(ing) task itself but somewhat beyond and across other tasks [ 4 ]. However, the relationship between source and target task is sometimes not compatible. If the user transfers the testing and training samples, it decreases the target task’s performance; such a situation is a negative transfer and vice versa.

figure 1

Traditional/Classical ML vs. TL [ 3 ]

This paper introduces the traditional approach to TL, improvements in the modern approach, techniques, applications of TL, data gathering, challenges, and the future scope of TL. Although TL is used in numerous areas with its varieties, this paper focuses on a few in-depth areas to provide brief insights and appreciation. The remainder of this paper is organized as follows. " Related work " section provides background information about the TL, definitions, and notations. " Techniques of TL " section describes the three settings of TL strategies: Inductive learning—case studies on multi-task learning and self-learning, Transductive TL, Unsupervised TL—sample selection, its applications, and domain adaptation in TL; " Domain adaptation " section describes numerous TL applications in different domains. " Contributions of TL " section addresses some of the contributions made by TL in medical and related fields. " Future directions of TL " section provides the future directions of TL techniques and conclusions, respectively.

Related work

To date, the disciplines of traditional ML and data mining have been extensively applied in many areas, such as retrieving patterns from existing records obtained from labeled or unlabeled data sets, for instance, training data, to predict future occurrences [ 5 ]. Traditional ML uses training and testing data methods with similar data distribution and input featured. Following the difference generated in the distribution data between the training and the testing set, the outcome or the prediction can either be deteriorated or improved [ 6 ]. In some cases, acquiring training data that fits the testing data’s input feature set as well as the anticipated outcome of distribution data features can be quite challenging and very costly [ 2 ]. As a result, a top-level learner is required for any target domain, which has previously learned and improved from a related field. This innovation drives the backbone of how TL is being adopted today.

TL focuses on wide domains, tasks, and patterns in both training and testing datasets [ 3 ]. Multiple instances of TL can be seen in the real world, such as the ability to distinguish between objects like cars and bikes. Another real-life example can be two individuals learning how to ride a bike. Assuming that one person has no prior biking experience, while the other person has some practice of riding a bicycle. In that situation, the person with the bicycle background will be able to learn to ride a bike comparably faster than the other person since his prior understanding of riding a bicycle will aid him in learning the task of riding a bike effectively. Likewise, TL operates on the premise of storing information from a previously learned task and applying it to a new one. The idea of TL is driven by the fact that humans can effectively relate previously acquired skills in solving contemporary challenges quicker and more accurately [ 3 ].

Since 1995, a variety of terms have been used to describe TL studies, some of which include “learning to learn,” “transfer of knowledge,” “multi-task learning,” “inductive transfer,” “knowledge integration,” “knowledge-based inductive bias learning,” “supervised learning,” “meta-learning,” and “semi-supervised learning” [ 3 , 7 ]. Among such, the multi-task learning model is seen to have a strong learning strategy that is similar to TL because both learning models strive to learn multiple tasks at the same time, despite the fact that they are different [ 8 ]. A detailed and insightful approach to multi-task learning is explored under the TL technique in the latter part of this study. Figure  1 depicts the differences between traditional ML and modern TL strategies in effective learning. As shown, classical ML only attempts to learn from the root/scratch, whereas TL aims to transfer information from the primary tasks to a new task with a top-quality training data set.

Additionally, as mentioned above, TL is required when the target training set is scarce. This can happen due to data problems like the rare, costly to acquire and evaluate, or even unavailability of data. However, TL strategies have become more appealing as large-scale data sources grow increasingly accessible. Utilizing available datasets that are somehow related but not the same makes this learning approach such a viable method [ 2 ]. Some ML applications where TL has proved to be successful and discussed in this paper are TL in a real-world simulation, sentiment classification, gaming, image classification, and zero-shot translation.

Overall the paper considers providing a generic appreciation to TL, a technique of ML to enhance the performance between the training dataset through the acknowledgment of the trained dataset. Unlike other articles, this paper brings in the background, ongoing and future scope of TL, emphasizing multi-task learning, sample selection, and domain adaptation.

Definitions of TL

According to Matt, he defines TL, a category under ML is when the reuse of pre-existing models to solve current challenges. He also acknowledges that TL is a technique employed to train models together. The concepts of pre-existing training data are utilized to enhance the performance of the ongoing challenge, so the solution need not have to be developed from scratch [ 9 ]. Similarly, Daipanja also aligns with the above definition of TL. He further uses the comparison between the traditional ML approach where the data were isolated based on specific tasks, and each challenge was developed from scratch, with limited knowledge to acknowledge one another. Now, however, the TL, the acknowledgment of previous data; trained models for the current training models have been comparatively enhanced and emphasized [ 10 ]. An article by Yoshua et al. defines TL as the technique that trains current models with trained models of previous similar related tasks. The explanations are wide and varieties of explanations are provided; however, most of them align with each definition. Lastly, Jason writes, TL is an optimization tool that escalates the performance of modeling second task [ 11 ].

Relationship between ML and TL

The relationship between TL and ML can be understood when TL improves developing models through the pre-trained models and improves their efficiency. A few of the benefits of TL includes as follows: starting every task or current challenge by building new model to train and test; scratch, Improves the efficiency of ML techniques and models progression Relation between the dataset could be understood from different points of view rather than in isolated terms. Models could be trained based on the required simulations rather than the natural world environments. In times when the resources are limited and the observations of the models are required, TL is one of the tools that help in learning and generating more accurate results so the assigned target domain functions [ 9 ].

Techniques of TL

This section covers the techniques used in TL’s core approaches to three major research questions: What, How and When to transfer.

The question of “what to transfer” entails determining which aspects of information or knowledge will be shared or transferred between domains or tasks. Sometimes, some information may be specific to certain domains, while others are exchanged across common domains, which may increase performance in the target domain.

Following the discovery of which information may be transmitted, learning algorithms to transmit the information must be devised, which now correlates to the questions of “How to transfer.”

“When to transfer” question when it is appropriate to transfer the available information. In some cases, when the source and target domains are unrelated, the transfer may fail forcefully. Alternatively, under a worst-case scenario, it may even harm the learning performance of the target domain, a circumstance termed as “negative transfer.” For this reason, avoiding negative transfer is a critical unanswered question till today.

Based on distinct conditions between the source domain, target domain, and the tasks, there are three sub-settings of TL strategies, categorized as: inductive TL, transductive TL and unsupervised TL (see Table  1 ).

Inductive TL

In this case, the target task differs from the source task, despite the source and target domains is similar. With traditional learning, the focus is usually on the target domain or tasks; however, in multi-learning or multi-tasks learning (a subset of Inductive TL), the goal is to excel at every task available [ 3 ]. Inductive TL is further classified into two cases based on its labeled or unlabeled data source:

Case 1—Multi-task learning

Here, the source and the target domain are the same, and a ton of labeled information in the domain source is accessible. For this situation, the ’inductive TL setting’ is similar to ‘multi-task learning’ since the source and target are the same. Notwithstanding, the inductive learning setting only targets high performance in the objective task by transferring information from the primary source. Multi-learning attempts to gain proficiency with the aim and source the job simultaneously.

Case 2—Self-taught learning

Here, the source and target domain are different but somehow related. No labeled information in the source area is accessible. For this situation, the inductive TL is similar to a self-taught learning setting, which implies that the spaces between the source and target areas materialize to be unique and somehow related to inductive TL (first reported by Raina et al [ 12 ]). Note that here the labeled information in the source area is inaccessible [ 3 ]

Transductive TL

Here, both the tasks (source and target) are the identical in this case. However, the domains are distinct. As shown in Table  1 , no labeled data is available in this target domain, although amany labeled data areavailable in the source domain.

Unsupervised TL

In this TL scenario, the target and the source task are different but somehow related, similar to the inductive TL. Unsupervised TL, on the other hand, focuses more on completing unsupervised tasks, such as clustering and dimension reduction [ 13 , 14 ]. In this situation, both the domains, i.e., source and target, have no labeled data.

  • Sample selection

The sample selection in TL is one of the most critical areas of the building model workflow. It is where selecting variables, and source tasks take place concerning the target task’s requirements. There are primarily two factors to begin the sample selection. Considering these requirements, Firstly, the common sense, the user of sample selection should have an intuition that there is a relation between metrics of source variables that matches or is similar to the target task. The metrics that add value and quantify the target task are then chosen to begin with problem-solving tasks.

Second, to take caution, although there might be many promising source variables and domains available in the source task, the user should be aware of what metrics target task values. While selecting the most relatable and efficient data for the target task, an important issue is not adding too many parameters from the source task that will eventually cause overfitting. However, when overfitting occurs and sample selection negatively affects the target task, a penalty over data is introduced. Penalty criteria narrow the parameters to incorporate only the most helpful information target required from the source task [ 3 ].

Sample selection bias

Sample Selection Bias has been acknowledged as one of the most complicated issues in practical application. The future, and the current training data d differ constraints and distribution [ 15 , 16 ] . Thus, causing the side effects of sample selection [ 17 ]. In some cases, small sampled groups duplicate or create a pattern for bigger groups of the sample, eventually leading the datasets to suffer from sample selection bias. Nevertheless, some measures have been applied to reduce sample selection bias, especially between the target and source task. Figure  2 demonstrates the changes bought by sample selection in TL.

figure 2

Changes brought by sample selection in TL

It can be expressed that many ML algorithms use almost the same training data to test data which will soon be used to make predictions on the training data. However, it fails to recognize that the practical applications use data that are sometimes different from one another, creating variations between the testing and training data. Conventionally, following the distribution of Q ( x ,  y ), the datasets are trained and similarly, the distribution \(P(x, y) = Q(x, y)\) is employed on a dataset for testing. In the last few years, strategies to improve the sample selection area have been constantly enforced and several articles have been published focusing on this matter. Note that a compact overview of the TL strategies and their settings are given in Table  1 .

Regardless of interventions in this area, sample selection has been susceptible to bias, including the choices that inappropriately select the control groups in the case–control studies, bias in loss-to-follow-up cases, and others. Like the other TL features, sample selection has received massive attention from the communities such as ML, statistics, economics, bioinformatics, epidemiology, medicine, and many others. Sample selections use source data to build to predict prediction and alter the source data. Such action crosses data limits to the broader range and beyond a single data distribution giving the user higher scope to build efficient models. It is one of the TL areas that has received much attention from ML and research groups in the past few years.

Brief analysis of sample selection; Kernel mean matching algorithm

Estimation models \(\beta (x)\) such as kernel density estimation, a naive approach, are used to measure the training sample and minimize selection bias from the external data [ 18 ]. Regardless of using them in the label data distribution, some models have been inferior or less effective. Being less effective includes not estimating data with high density or data with heavy information. Also, if the model makes a small estimation error, in that case, especially when the testing \(P(x|s = 1)\) and training P ( x ) models have small values, it disturbs the whole performance of the data causing relatively worse performance to the target task compared to source task. This incident was noticed in several cases of performance evaluation. Estimating \(\beta (x)\) directly was considered more logical than evaluating these models while working with the target task having huge data density and less training and testing samples (Fig. 3 ).

figure 3

Symbols abbreviations [ 16 ]

Gradual improvements are made in the estimation models \(\beta (x)\) present in the covariate shift; an algorithm is suggested known as kernel mean matching (KMM) and unconstrained least-square importance fitting (uLSIF) [ 1 ]. However, only the KMM model will be elaborated in this paper although, we acknowledge the unconstrained least-square importance fitting (uLSIF) and its uses.

Both of them are better measures and versions of the kernel density estimation. The KMM model considers the classical statistics perspective, denoted by \(P^{\theta (y|x)}\) . It is a parametric model that organizes the label data distribution, mainly for the logistic regression models, and applies to other models. It estimates the prediction loss from the source data to the target data and reduces its overfitting. A form of penalty criteria is to narrow the parameters to incorporate only the most helpful information target required from the source task.

The KMM also contradicts the primary assumption of ML, which stated that the testing and training of data comes from one data distribution. The model relates the testing \({(x_i , y_i )}n_i=1 \sim Pte(x) = P(x, y)\) and training \({(x_i , y_i )}n_i=1 \sim Ptr(x) = P(x, y|s = 1)\) samples from multiple sources and ultimately predicts how X (variables) equals to the Y (output).

Data sampling falls under transductive TL, which helps to learn an optimal model workflow for the target domain and task by minimizing any expected risk. Concepts of empirical risk minimization (ERM) are some of the measures that help stimulate data and its risk towards the target problem. Thus, optimal parameters ( \(\theta ^{*}\) ) such as; \(\theta ^{*} = arg min \ \theta ^{*}\in \Theta (x,y) \in P [l(x, y, \theta )]\) , where \(l(x, y, \theta )\) defines the loss function of parameters. Similarly, since estimating the probability distribution of data is difficult, the ERM concept is then utilized. To train the optimal workflow; \(\theta ^{*} = arg min \ \theta \in \Theta 1 n, n_{i=1} [l(x_i, y_i, \theta )]\) , where n defines the size of training data [ 19 ].

The above models for workflow are used in source data selection, but the target domain does not remain ideal. Optimal parameters such as; \(\theta ^{*} = arg min \ \theta \in \Theta (x,y) \in DT P(DT )l(x, y, \theta )\) . However, the training dataset should be obtained from the source domain to structure the target domain. In the case of \(P(DS) = P(DT)\) , we can solve optimization using the target domain. \(\theta ^{*} = arg min \ \theta \in \Theta (x,y) \in DS P(DS)l(x, y, \theta )\) .

In contrast, if P ( DS ) is not equal P ( DT ), modification of the above optimization problem is considered where a model learn high generalization ability for the target domain, as follows: Several methods are available to predict the values of P ( \(xS_i\) ) P ( \(xT_i\) ). According to the Zandrozny [ 20 ] , the author requested to estimate the numerical values of \(P(xS_i)\) and \(P(xT_i)\) without depending on the other classification problems. Similarly, an article by Fan et al. [ 21 ] elaborated this concept and idea of solving selection problems by using multiple classifiers to predict the probability ratio.

Lastly, the covariate shift or sample selection bias offers a considerable advantage and exponentially increases data quality to process the target data.

Applications of sample selection

In TL, sample selection has been used in several areas and models, making them different. It has been used to study drugs at the medical clinics, choose patients from the general population of the given demographic, and many others. The selected data thus differs from the general population based on gender, race, and patients. It has also been used in system detection software built many years ago to improve its predictive method and capabilities.

After using sample selection, the old system lacked organizing due to modern times’ new attacks or spam patterns that improved these challenges. The surveys based on particular religions that were undertaken did not relate to others due to their differences also used sample selection to link and bridge and apply to one another types of research which otherwise. Overall toward the survey’s religion, sample selection has been used to bring about relations among them with the other categories of the beliefs. All of the given examples do not follow the primary assumption of ML, which states that training and testing data have to be from one data distribution. The data are of more than one data distribution, where testing and training occur in different domains. Such selection makes the sample selection users thoughtfully choose the standard features of previous source data and the current target data.

Considering the above applications and the sample selection usage, it defines how many datasets in real-world applications are potentially biased. Further research has been done based on the sample selection bias. The proposed approaches do not assume the exact type of biases or formal models to quantify the distribution of the bias to rectify [ 20 ]. Reducing the sample selection bias is an open problem.

  • Domain adaptation

Domain adaptation is a type of TL in which the task remains the same but the source and destination have different domains or distributions. Consider a model that has been trained on x -rays of many patients to determine whether or not a patient is infected with covid-19 [ 22 ]. However, the best-generalized systems depend on appropriate datasets [ 3 ]. If the data is biased, the system can not generalize accurate outputs, and such a problem set is known as domain adaptation. Domain adaptation helps apply an algorithm to train one or more source domains to improve the target domain. The Domain adaptation process tries to alter the source domain to bring closer the distribution of the source domain to the target domain [ 23 ].

Mathematical explanation of the domain adaptation

Lets denote the domain as D which contains two components; a feature space X and a marginal probability distribution P ( x ). Feature space X can be \(X_1, X_2 ..... X_n .... \infty\) .

So, supervised learning tasks on a specific domain will be, \(D=\{X,P(x)\}\) . Further task will be consistent with Y lebel space and object predictive function f (); it will be denoted as \(T=\{Y, p(y/x)\}\) . The predictive object function f () may train data which contains the pairs of ( \(x_i\) , \(y_i\) ); where Y can be \(y_1\) , \(y_2\) , \(\infty\) and function f is used to predict the level of x . However, domain adaptation has two domains and two tasks. Given source domain [ 4 ], \(D_S=\{XS,p(xs)\}\) , where \(XS={xs_1,....xS_n}\) task on the source domain, \(T_S =\{YS,p(ys/xs)\}\) , where \(YS=\{YS_1, ... YS_N\}\) target domain \(D_T= \{XT,P(XT)\}\) , where \(XT=\{XT_1\ldots .XT_N)\}\) task on target domain \(T_t= \{T_t, P(YS/XT)\}\) , where \(YT=\{YT_1 ... YT_n\}\) [ 24 ] Those components are defined as improving the learning of the target predictive function \(f_T()\) . The target domain uses the information of the source domain and task on the source domain. \(f_{ST}()\) is initiate as a predictive model to train the source domain \(D_S\) so that that domain can adopt the target domain \(D_T\) [ 23 ].

TL applications

Real-world simulations.

Large production companies face enormous challenges to be more flexible for any production fluctuation while providing higher product quality with significantly lower costs in manufacturing and processing in a dynamic workforce [ 25 ]. The central objective of this process’s underlying setup is to distinguish optimal parameters that fulfill high product quality and efficiency.

One approach to defeating these difficulties is exploiting the techniques of artificial intelligence (AI), mainly supervised learning models. Supervised learning trains the models by using appropriately categorized or labeled information. Each ML (ML) or AI application that depends on gathering information or preparing a real-world model is costly, tedious, or even hazardous for our use or the environment. Therefore, robots are being trained to utilize simulations results in advancement and technology and limit development costs. Consequently, with these advancements, the systems become more practical and ideal. Furthermore, one can train, test, recreate, and program the robots to train themselves so that the real-world robots can transfer and use each information learned in the process. These kinds of transfers are done using progressive networks, an ideal platforms for real-world robot simulations. Contrarily, sometimes not all the simulation highlights are effectively repeated when applied in the actual word because of their complex interactions.

Considering the enhancement of the performance, the TL techniques have been also emphasized and used in the real world simulation dataset. The dataset and research articles based on real world simulation include; Policy transfer from simulations to the real world by transfer component analysis [ 26 ], Simulations, learning and real world capabilities [ 27 ], Real-world reinforcement learning via multifidelity simulators [ 28 ], Knowledge-aided Convolutional Neural Network for Small Organ Segmentation [ 29 ], Adaptive Fusion and Category-Level Dictionary Learning Model for Multi-View Human Action Recognition [ 30 ] and Stimulus-driven and concept-driven analysis for image caption generation [ 31 ] to name a few.

The rise in ML and other AI applications has made an enormous impact on gaming advancements so far. Today, one of the fine examples of this yield is AlphaGo, one of the first ML programs that defeated an expert human Go player, developed by Deepmind’s neural network. AlphaGo is an ace in this game. However, it is incompetent with other games and fails to win when entrusted to play different games. This failure happens because it is only programmed, designed, and fitted to play ’Go,’ which drives the ultimate drawback of utilizing artificial neural networks (ANN) in gaming. It can neither be as fast nor ace all games like a human brain. Therefore, in order to play and win other games, AlphaGo needs to thoroughly forget the algorithms of ’Go’ and learn to adapt to an entirely new program [ 32 ].

Consequently, with the help of TL, new games can now be played by re-applying the strategies learned in a previous game, as the definition of TL states. For instance, the applications of TL in gaming can be seen in the game MadRTS [ 33 ], which is a real-time strategy game that includes ongoing competing players. In this game, the application uses CARL (case-based reinforcement learning (RL)) [ 34 ], which is a multi-layered plan that joins case-based reasoning (CBR) and Reinforcement Learning (RL), that permits us to secure as well as separate the keys and strategies of our tasks and use the particular idea of TL in it [ 33 ].

  • Image classification

Multiple models on image classification have been developed to facilitate the resolving of the most pressing issue of identification and recognition accuracy [ 35 ]. Image classification is a significant subject in computer vision, with many applications. Object identification for robotic handling, human or object tracking for autonomous cars, and so on are a few examples of the applications of image classification [ 35 ]. Today, convolutional neural networks (CNN) show reliable outcomes on image or object detection and recognition that are helpful in real-world applications [ 36 ]. The architecture of CNN models works on training and predictions on a high level of abstraction. One of the best tendencies of neural networks is the ability to perceive things inside an image as they are prepared on labeled pictures of massive datasets, which is very challenging in time management. Several Computer Vision and ML issues have demonstrated that the CNN framework performs effectively on solving accuracy.

Convolutional Neural Networks (CNN) have influenced and dominated the ML vision field. In recent years CCN comprised three layers, namely, “an input layer, an output layer, and several hidden layers that includes deep networks, pooling layers, fully linked layers, and normalization layers (ReLU)” (main). For example, in a VGG-16 [ 37 ] ConvNet [ 38 ], illustrated in Fig.  4 , that consist of different layers containing a unique collection of picture combination attributes. The figure must be prepared to perceive images inside a dataset. In doing so, it is firstly pre-trained by utilizing ImageNet. It is layer-wise ready, beginning from the SoftMax layer and preparing it simultaneously, followed by the thick layers. However, these models rapidly strain battery power, limiting smaller gadgets, storage devices, and inexpensive phones [ 36 , 39 ]. To reduce such burdens, TL helps prepare the models through pre-training using ImageNet, consisting of many pictures from various sources and saving time. Another example can be, if a facial image is set as the input into a CNN structure, the system will start to learn basic properties in its training stage, such as lines and edges of face, bright and dark areas, contours, and so on [ 35 ].

figure 4

The architecture of VGG-16 ConvNet [ 39 ]

  • Zero shot translation

One of the popular procedures for machine translation is the neural machine translation (NMT) [ 40 ] which a colossal artificial neural organization achieves. It has displayed promising results and has indicated tremendous potential in unraveling machine interpretation and translations. The best way to exercise machine translation into a language can be done with a touch of planning data using zero-shot translation [ 41 ]. In this note, Zero-shot learning is considered as one of the most promising learning strategies, where the input sources and the classes we intend to portray are disjoint. Accordingly, zero-shot learning is connected to using supervised learning (similar to applications in gaming) to access its accuracy and the training data. One famous example is Google’s Neural Machine Translational model (GNMT), which considers powerful cross-lingual interpretations. For instance, to translate two different languages, Korean and Spanish, we need to have a pivot language (intermediary language) representing the two dialects. Firstly, Korean must be initially translated into English and later to Spanish. Here, English is an intermediary between Korean and Spanish, known as the pivot language.

Therefore, to avoid all the turns and bends from one language to the other, zero-shot can utilize all the available data to understand the translational data applied and to decipher it into a new translational language [ 42 ].

  • Sentiment classification

Understanding hidden or visible feelings conveyed online or in social media is essential to customers, and users [ 43 ]. Sentiment classification is acknowledged as perhaps the most significant area in Natural Language Processing (NLP) research. Social media has surpassed as the essential way of generating opinion data and because of domain diversity, applying sentiment classification on social media has a great deal of potential, but it also has many challenges [ 44 ]. One of the most common sub-areas in sentiment classification is interpreting an individual’s feelings conveyed via media content. Sentiment classification of social media data is unquestionably a project of big data. Earlier research based on sentiment classification analyses texts within a linguistic expression.

Sentiment classification is an additional helpful tool that allows a user or any business organization to identify and know their client’s choices and reviews by understanding their sentiment based on negative, positive, or neutral reviews (as Fig.  5 ), which may also be labeled as good/bad, satisfactory/not satisfactory. It is tough to build an entirely new compilation of texts to analyze sentiments since it is not easy to prepare models for identifying their feelings. Therefore, a solution to these problems can be solved using TL. For instance, if x is the input text and y is the feeling or thought, we need to predict a film review. The deep learning models are prepared on x input via sentiment analysis of the content corpus and identifying every statement’s polarity. When the model is prepared to understand feelings through the extremity of x information, its basic language model and learned knowledge is then transferred onto the model allotted a task to examine sentiments to y , i.e., film reviews. Additionally, different techniques such as embedding are also used in identifying various jobs related to sentiment analysis by transferring information from one source ( x ) and re-applying the same algorithms in the targeted area ( y ) to fulfill the predicted task [ 45 ].

Figure  5 shows the polarity of sentiment analysis Unhappy with the service Neither happy nor sad Happy with the service Very happy and totally in love with the service.

figure 5

Polarities of sentiment analysis [ 44 ]

Contributions of TL

Contributions in medical sciences.

Medical imaging and MRI play essential roles in routine clinical diagnosis and treatment. In MRI the variation shows the difference between standard and disease tissue. With ML and TL knowledge, medical scientists can easily detect the disease. However, there are vast training data that can be expensive to utilize. So, TL uses medical imaging and makes it Convolution neural networks(CNN) are a great success in analyzing medical images and making variations in medical imaging protocols. TL shows outstanding performance in white matter hyperintensity (WMH) segmentation (vascular origin, which can find on brain MRI or computed tomography), tumor segmentation (detecting the location of tumor cells), microbleed segmentation (detecting traumatic brain injury). Therefore, in many cases, network training on one MRI data acquired does not work efficiently in other protocols [ 46 ]. However, domain adaptation helps to ensure the usability of trained models in actual practices. This limit can be solved using the trained models with large annotated legacy datasets on new datasets with different domains, and trained models will require the clinical setting [ 47 ]. MRI shows the high variation among soft tissue and contrasts. For example, the image database, Imagenet contains more than fourteen million annotated images with more than twenty thousand categories. From them, finetune is based on the instances from the target domain [ 48 ].

Contributions in bioinformatics

In the bioinformatics field, analyzing biological sequences plays an important role. To understand the organism’s function, biologists should analyze the gene sequence of the particular microorganisms. However, TL and domain adaptation show outstanding bioinformatics performance (i.e., gene expression analysis, sequence classification, and network reconstruction). Domain follows different models of organisms or different data-collecting sets to maintain the predicted sequences. If there is any error while predicting the series, prognosis systems alarms to replace the component, and the system changes its properties. It will alarm until the system changes or is replaced by a new setting and component [ 46 ]. However, the bioinformatics application has many different problems of distribution. For example, two organisms’ functions can be the same, but the substance can differ, let in marginal distribution differences. On the other hand, if two species belong to a common ancestor with long evolutionary history, in this case, conditional distribution difference will work significantly. The TL can be used to predict the mRNA splicing. In the mRNA splicing case, a source domain can be C.elegans organism, and the target domain is C.remanei and P.pacificus organism. Several TL approaches compare, i.e., FAM and the variant KMM can improve classification performance. Besides, gene expression analysis predicts the association between genotypes and phenotypes. However, in this case, it can have data sparsity problems(not observing enough datasets) as the nucleotides sequence is minimal data. To ensure outstanding performance, TL can be used to provide other information [ 48 ].

There have been numerous use of TL in bioinformatics; the dataset uses the algorithms of TL to showcase the contribution of TL in its section, respectively. Few of the samples of dataset includes; TL for BNER Bioinformatics 2018 [ 49 ], Exploiting TL for the reconstruction of the human gene regulatory network [ 50 ] , Parasitologist-level classification of apicomplexan parasites and host cell with deep cycle TL (DCTL) [ 51 ], AITL: Adversarial Inductive TL with input and output space adaptation for pharmacogenomics [ 52 ], Improved RNA secondary structure and tertiary base-pairing prediction using evolutionary profile [ 53 ], mutational coupling and two-dimensional TL where the Computational prediction of RNA secondary structure is performed with reference of TL, Optimized hybrid investigative based dimensionality reduction methods for malaria vector using KNN classifier [ 54 ], A survey of dimension reduction and classification methods for RNA-Seq data on malaria vector [ 55 ] , A hybrid heuristic dimensionality reduction methods for classifying malaria vector gene expression data [ 56 ] and others to follow. https://www.overleaf.com/project/61fd387cb8f6c7ccd64510aa

Contributions in transportation

TL has transportation applications, i.e., understanding the traffic scene and target driver behavior. Here, images are taken from specific locations [ 57 ]. However, the outputs could be incorrect in variation because of different weather conditions. TL can give an outstanding solution by pictures taken in the exact location at different weather conditions [ 23 ]. For this, firstly, the system trains the network to specify the feature of pictures. Secondly, a new feature is built by the feature transformation strategy. Then, the dimension reduction algorithm generates a low-dimension feature. However, in the last stage, among the tested image, recovered best-suiting image and the Markov model transfers, the cross-domain sets with a best-matching image to test [ 58 ]. TL can be applied in target driver behavior with sufficient personalized data of each target driver’s behavior. TL can demonstrate the result even when target domains are limited, and data are very small or very large [ 48 ].

Contributions in the recommendation system

In this contemporary world, the most heated topic for many industries is to build up an automatic question answering system that more likely works as a recommendation system. Recommend er systems are widely used in the e-commerce field. It helps people to answer all questions that are related to the merchandise [ 59 ]. E-commerce has been an essential part of everyday life for people, and they are familiar with this field. Based on different products, or services, it has different websites where people can shop. E-commerce can be divided into vertical e-commerce and integrated e-commerce websites. From vertical e-commerce websites, people can shop for the same sorts of products. In contrast, integrated e-commerce websites sell multiple products, including food, clothing, research service, etc. The e-commerce recommender system used three techniques to recommend products, i.e., collaborating filtering, content-based filtering, and hybrid recommendation. According to information retrieval, content-based filtering first analyses consequences obtains a set of features, and then builds product feature vectors. Then it calculates the similarity between users and products then recommends. In ML, clustering is used for content-based filtering. Secondly, collaborative filtering recommendation follows two techniques: memory-based filtering and model-based filtering. Memory-based algorithms work on users’ ratings and preferences for a particular product. It predicts the target product of a user. Nevertheless, initially the memory-based and model-based algorithms study according to the ratings of records as well concerning the target user’s rating. Hybrid recommendation works the same as a content-based recommendation, but it can better perform than collaborative filtering. Though recommendation application did a lot in e-commerce, it has been facing many data sparsity problems, leading to poor recommendations [ 60 ]. However, TL comes out with the best recommendation system that combines collaborative filtering proposals to alleviate sparsity problems. This method improves the accuracy of advice by transferring the knowledge learned from dense data sets to sparse ones. TL makes the new framework for e-commerce recommender systems in which knowledge is known from the source domain and source task target domains and target tasks. With the help of TL, people can use knowledge and can solve their problems faster [ 61 ].

Future directions of TL

A great future is awaiting further advancements in TL research. We find many modern visual learning algorithms on data, those of desired object categories. For instance, the Object-oriented paradigm algorithmically detects, recognizes, and describes the unseen images [ 48 ]. We need new data collections containing the precious label to execute those modern visual learning algorithms. However, many pre-existing large datasets, such as Imagenet, have about 150M images, a massive pool with more information.

TL aims to use the previous knowledge and related source task and emphasize extra source data to boost a poorly targeted set [ 2 ]. Besides, TL problems can be solved by dynamic settings for online learning and self-leveling data. Therefore, most often, pre-existing resources are ignored because of no overlapping. This pre-existing can be used for classification and localization; no past knowledge and datasets are useless. Therefore, we can contribute to taking some steps to ensure the best use of pre-existing data. We can revisit the past knowledge and generalize, which may make research potential and relevance for practical purposes and application [ 46 ]. Secondly, we may improve the previous TL methods when the least annotated samples of the new target domain are available. Zero-shot classification is the advanced step because it obtains classifiers for novel categories and arbitrary basis though less data is available. Besides, zero-shot is reliable for the textual embedding of image datasets, and it is faster, more accurate, and more economically active. Finally, we may combine the zero-shot and active learning in support vector machines with optimal query conditions. Additionally, a future study in the domain of TL can go in a variety of areas such as:

To begin, TL techniques can be investigated further and applied to a broader range of applications. New ways are required to overcome knowledge transfer challenges in more complicated circumstances. IFor example, in real-world settings, the client source-domain data may come from a different organization. In this scenario, the question of how to transmit knowledge from the source domain while maintaining user privacy is crucial

Secondly, determining ways to quantify the transfer of information across domains while avoiding negative transfer is crucial. Although few studies have been done on negative transfers, more systematic research is still needed [ 62 ].

Thirdly, the validity of TL requires further research [ 63 ].

In terms of Challenges and gaps, the figure below depicts them (Fig. 6 ).

figure 6

Challenges and gaps in the literatures of TL concerning this table

Furthermore, theoretical research can be performed to establish theoretical evidence for TL’s potency and validity. As a prominent and promising field of ML, TL has several advantages over classical/traditional ML, including reduced data need and less label reliance.

TL is based on data distribution where one task is used in another. It uses outdated data and regulates the source task and target task. It follows some specific strategies based on data and model interpretation. This paper discussed the goals and strategies of TL by introducing the objectives and some of its learning approaches. In addition, we also briefly mentioned the techniques of TL at a model level, along with its applications. Several TL applications have been presented, such as in medicine, bioinformatics, transportation, recommendation, e-commerce, etc. The application of TL in numerous fields indicates that it is an essential research topic and can pave the way for the future technological era. However, it may seem difficult in practice.

Availability of data and materials

N/A (no data used).

Torrey L, Shavlik J. Transfer learning. In: Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, 2010; 242–264. IGI global.

Weiss K, Khoshgoftaar TM, Wang D. A survey of transfer learning. J Big Data. 2016;3(1):1–40.

Article   Google Scholar  

Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2009;22(10):1345–59.

Taylor ME, Stone P. Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 2009;10(7).

Witten IH, Frank E. Data mining: practical machine learning tools and techniques with java implementations. Acm Sigmod Record. 2002;31(1):76–7.

Shimodaira H. Improving predictive inference under covariate shift by weighting the log-likelihood function. J Stat Plan Inference. 2000;90(2):227–44.

Article   MathSciNet   Google Scholar  

Thrun S, Pratt L. Learning to learn: introduction and overview. In: Learning to Learn. Springer, 1998;p. 3–17

Caruana R. Multitask learning. Mach Learn. 1997;28(1):41–75.

Trotter M. Machine learning deployment for enterprise 2021. https://www.seldon.io/transfer-learning/

Sarkar DD. A comprehensive hands-on guide to transfer learning with real-world applications in deep learning. Towards Data Sci. 2018;20:2020.

Google Scholar  

Brownlee J. A Gentle introduction to transfer learning for deep learning 2019. https://machinelearningmastery.com/transfer-learning-for-deep-learning/ .

Raina R, Battle A, Lee H, Packer B, Ng AY. Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 759–766 2007.

Dai W, Yang Q, Xue G-R, Yu Y. Self-taught clustering. In: Proceedings of the 25th International Conference on Machine Learning, 2008;200–207.

Wang Z, Song Y, Zhang C. Transferred dimensionality reduction. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2008:550–565. Springer

Ren J, Shi X, Fan W, Yu PS. Type independent correction of sample selection bias via structural discovery and re-balancing. In: Proceedings of the 2008 SIAM International Conference on Data Mining, 2008;565–576. SIAM

Huang J, Gretton A, Borgwardt K, Schölkopf B, Smola A. Correcting sample selection bias by unlabeled data. Adv Neural Inf Process Syst. 2006;19:601–8.

Tran V-T. Selection bias correction in supervised learning with importance weight. PhD thesis 2017.

Liu A, Ziebart B. Robust classification under sample selection bias. In: Advances in neural information processing systems, 2014;37–45.

Liu Z, Yang J-A, Liu H, Wang W. Transfer learning by sample selection bias correction and its application in communication specific emitter identification. JCM. 2016;11(4):417–27.

Zadrozny B. Learning and evaluating classifiers under sample selection bias. In: Proceedings of the Twenty-first International Conference on Machine Learning, p. 114 2004.

Fan W, Davidson I, Zadrozny B, Philip SY. An improved categorization of classifier’s sensitivity on sample selection bias. In: ICDM, 2005;5:605–608. Citeseer

Kamath U, Liu J, Whitaker J. Transfer learning: Domain adaptation. In: Deep Learning for NLP and Speech Recognition. Springer; 2019, p. 495–535.

Ghafoorian M, Mehrtash A, Kapur T, Karssemeijer N, Marchiori E, Pesteie M, Guttmann CR, de Leeuw F-E, Tempany CM, Van Ginneken B. Transfer learning for domain adaptation in mri: Application in brain lesion segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, 2017;516–524. Springer

Steinwart I. On the influence of the kernel on the consistency of support vector machines. J Mach Learn Res. 2001;2:67–93.

MathSciNet   MATH   Google Scholar  

Tercan H, Guajardo A, Heinisch J, Thiele T, Hopmann C, Meisen T. Transfer-learning: bridging the gap between real and simulation data for machine learning in injection molding. Procedia CIRP. 2018;72:185–90.

Matsubara T, Norinaga Y, Ozawa Y, Cui Y. Policy transfer from simulations to real world by transfer component analysis. In: 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE), 2018;264–269. IEEE

Wood RE, Beckmann JF, Birney DP. Simulations, learning and real world capabilities. Education+ Training 2009.

Cutler M, Walsh TJ, How JP. Real-world reinforcement learning via multifidelity simulators. IEEE Trans Robot. 2015;31(3):655–71.

Zhao Y, Li H, Wan S, Sekuboyina A, Hu X, Tetteh G, Piraud M, Menze B. Knowledge-aided convolutional neural network for small organ segmentation. IEEE J Biomed Health Inform. 2019;23(4):1363–73.

Gao Z, Xuan H-Z, Zhang H, Wan S, Choo K-KR. Adaptive fusion and category-level dictionary learning model for multiview human action recognition. IEEE Internet Things J. 2019;6(6):9280–93.

Ding S, Qu S, Xi Y, Wan S. Stimulus-driven and concept-driven analysis for image caption generation. Neurocomputing. 2020;398:520–30.

Chen JX. The evolution of computing: Alphago. Comput Sci Eng. 2016;18(4):4–7.

Sharma M, Holmes MP, Santamaría JC, Irani A, Isbell Jr CL, Ram A. Transfer learning in real-time strategy games using hybrid cbr/rl. In: IJCAI, 2007;7:1041–1046.

Jiang C, Sheng Z. Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system. Expert Syst Appl. 2009;36(3):6520–6.

Hussain M, Bird JJ, Faria DR. A study on cnn transfer learning for image classification. In: UK Workshop on Computational Intelligence, 2018;191–202. Springer

Rastegari M, Ordonez V, Redmon J, Farhadi A. Xnor-net: Imagenet classification using binary convolutional neural networks. In: European Conference on Computer Vision, 2016;525–542. Springer

Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 2014.

Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst. 2012;25:1097–105.

Neurohive: Convolutional Network for Classification and Detection (2018). https://neurohive.io/en/popular-networks/vgg16/ .

Chu C, Wang R. A survey of domain adaptation for neural machine translation. arXiv preprint arXiv:1806.00258 2018.

Kumar R, Jha P, Sahula V. An augmented translation technique for low resource language pair: Sanskrit to hindi translation. In: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence, pp. 377–383 2019.

Lobo S. 5 cool ways Transfer Learning is being used today 2018. https://hub.packtpub.com/5-cool-ways-transfer-learning-used-today/#: texttildelow :text=Transfer learning reduces the efforts,driving other two-wheeled vehicles

Tao J, Fang X. Toward multi-label sentiment analysis: a transfer learning based approach. J Big Data. 2020;7(1):1–26.

Fang X, Zhan J. Sentiment analysis using product review data. J Big Data. 2015;2(1):1–14.

Nabi J. Machine Learning!!!FIX ME!!!-!!!Word Embedding & Sentiment Classification using Keras. Towards Data Science 2018. https://towardsdatascience.com/machine-learning-word-embedding-sentiment-classification-using-keras-b83c28087456 .

Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, Xiong H, He Q. A comprehensive survey on transfer learning. Proc IEEE. 2020;109(1):43–76.

Shao L, Zhu F, Li X. Transfer learning for visual categorization: a survey. IEEE Trans Neural Netw Learn Syst. 2014;26(5):1019–34.

Kouw WM, Loog M. A review of domain adaptation without target labels. IEEE Trans Pattern Anal Mach Intell. 2019;43(3):766–85.

BaderLab: BaderLab/Transfer-Learning-BNER-Bioinformatics-2018: This repository contains supplementary data, and links to the model and corpora used for the paper: Transfer learning for biomedical named entity recognition with neural networks.

Mignone P, Pio G, D’Elia D, Ceci M. Exploiting transfer learning for the reconstruction of the human gene regulatory network. Bioinformatics. 2020;36(5):1553–61.

Li S, Yang Q, Jiang H, Cortés-Vecino JA, Zhang Y. Parasitologist-level classification of apicomplexan parasites and host cell with deep cycle transfer learning (dctl). Bioinformatics. 2020;36(16):4498–505.

Sharifi-Noghabi H, Peng S, Zolotareva O, Collins CC, Ester M. Aitl: adversarial inductive transfer learning with input and output space adaptation for pharmacogenomics. Bioinformatics. 2020;36(Supplement–1):380–8.

Singh J, Paliwal K, Zhang T, Singh J, Litfin T, Zhou Y. Improved RNA secondary structure and tertiary base-pairing prediction using evolutionary profile, mutational coupling and two-dimensional transfer learning. Bioinformatics. 2021;37:2589–600.

Arowolo MO, Adebiyi MO, Adebiyi AA, Olugbara O. Optimized hybrid investigative based dimensionality reduction methods for malaria vector using knn classifier. J Big Data. 2021;8(1):1–14.

Arowolo MO, Adebiyi MO, Aremu C, Adebiyi AA. A survey of dimension reduction and classification methods for RNA-seq data on malaria vector. J Big Data. 2021;8(1):1–17.

Arowolo MO, Adebiyi MO, Adebiyi AA, Okesola OJ. A hybrid heuristic dimensionality reduction methods for classifying malaria vector gene expression data. IEEE Access. 2020;8:182422–30.

Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data. 2019;6(1):1–48.

Wang M, Deng W. Deep visual domain adaptation: a survey. Neurocomputing. 2018;312:135–53.

Adão T, Hruška J, Pádua L, Bessa J, Peres E, Morais R, Sousa JJ. Hyperspectral imaging: a review on uav-based sensors, data processing and applications for agriculture and forestry. Remote Sensing. 2017;9(11):1110.

Gao Y, Mosalam KM. Deep transfer learning for image-based structural damage recognition. Comput Aided Civil Infrastruct Eng. 2018;33(9):748–68.

Tang J, Zhao Z, Bei J, Wang W. The application of transfer learning on e-commerce recommender systems. In: 2013 10th Web Information System and Application Conference, 2013;479–482. IEEE

Wang Z, Dai Z, Poczos B, Carbonell J. Characterizing and avoiding negative transfer. in 2019 IEEE. In: CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019;11285–11294.

Lipton ZC. The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery. Queue. 2018;16(3):31–57.

Download references

Acknowledgements

Partially supported by Khalifa University, UAE.

Author information

Asmaul Hosna1, Ethel Merry1 and Jigmey Gyalmo1 contributed equally.

Authors and Affiliations

Department of Computer Science, Asian University for Women, 20/A M. M. Ali Road, Chittogram, Bangladesh

Asmaul Hosna, Ethel Merry, Jigmey Gyalmo, Zulfikar Alom & Mohammad Abdul Azim

Department of Electrical Engineering and Computer Science, Khalifa University, Abu Dhabi, United Arab Emirates

You can also search for this author in PubMed   Google Scholar

Contributions

Equally contributed. Research lead by MAA. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mohammad Abdul Azim .

Ethics declarations

Ethics approval and consent to participate, consent to participate, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Hosna, A., Merry, E., Gyalmo, J. et al. Transfer learning: a friendly introduction. J Big Data 9 , 102 (2022). https://doi.org/10.1186/s40537-022-00652-w

Download citation

Received : 04 September 2021

Accepted : 19 September 2022

Published : 22 October 2022

DOI : https://doi.org/10.1186/s40537-022-00652-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Transfer learning
  • Multi-task learning

case study of transfer learning

Transfer Learning: Leveraging Trained Models on Novel Tasks

  • First Online: 24 February 2022

Cite this chapter

case study of transfer learning

  • Riyad Bin Rafiq 7 &
  • Mark V. Albert 7  

Part of the book series: Educational Communications and Technology: Issues and Innovations ((ECTII))

1094 Accesses

1 Citations

This chapter provides a brief introduction to transfer learning with history and its importance. As data collection and labeling in a new domain are challenging, transfer learning can play a vital role to build a reusable model. After explaining the fundamentals of transfer learning, the strategies are presented followed by different pre-trained models in the fields of computer vision and natural language processing. We explored prominent models like VGG-16, Inception, ULMFiT, and BERT. After mentioning the successful models, applications and limitations of transfer learning have been discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

case study of transfer learning

Transfer learning: a friendly introduction

A survey on heterogeneous transfer learning.

case study of transfer learning

Improving Transferability of Deep Neural Networks

Blitzer, J., Dredze, M., & Pereira, F. (2007). Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th annual meeting of the association of computational linguistics (pp. 440–447). United States Association for Computational Linguistics.

Google Scholar  

Bozinovski, S. (1981). Teaching space: A representation concept for adaptive pattern classification . Retrieved from COINS Technical Report, University of Massachusetts at Amherst

Bozinovski, S. (2020). Reminder of the First Paper on Transfer Learning in Neural Networks, 1976. Lithuanian Academy of Sciences. Informatica, 44 (3). Retrieved 28 April 2021 from. https://doi.org/10.31449/inf.v44i3.2828

Caruana, R. (1997). Multitask learning. Machine Learning, 28 (1), 41–75.

Article   Google Scholar  

Dai, W., Xue, G.-R., Yang, Q., & Yu, Y. (2007a). Co-clustering based classification for out-of-domain documents. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 210–219). Association for Computing Machinery. Retrieved 29 April 2021 from https://doi.org/10.1145/1281192.1281218

Dai, W., Xue, G.-R., Yang, Q., & Yu, Y. (2007b). Transferring naive Bayes classifiers for text classification. In AAAI (Vol. 7, pp. 540–545).

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255). IEEE.

Chapter   Google Scholar  

Deng, J., Zhang, Z., Marchi, E., & Schuller, B. (2013). Sparse autoencoder-based feature transfer learning for speech emotion recognition. In 2013 Humaine association conference on affective computing and intelligent interaction (pp. 511–516). IEEE.

Devlin, J., Chang, M. -W., Lee, K., & Toutanova, K. (2018, October 11). BERT: Pre-training of deep bidirectional transformers for language understanding . arXiv [cs.CL] . Retrieved from http://arxiv.org/abs/1810.04805

Eaton, E., desJardins, M., & Lane, T. (n.d.). Modeling transfer relationships between learning tasks for improved inductive transfer. Machine Learning and Knowledge Discovery in Databases . Retrieved from https://doi.org/10.1007/978-3-540-87479-9_39 .

Farhadi, A., Forsyth, D., & White, R. (2007). Transfer learning in sign language. In 2007 IEEE conference on computer vision and pattern recognition (pp. 1–8). ieeexplore.ieee.org

Ge, L., Gao, J., Ngo, H., Li, K., & Zhang, A. (2014). On handling negative transfer and imbalanced distributions in multiple source transfer learning. Statistical Analysis and Data Mining, 7 (4), 254–271.

Howard, J., & Ruder, S. (2018, January 18). Universal language model fine-tuning for text classification . arXiv [cs.CL] . Retrieved from http://arxiv.org/abs/1801.06146

Kan, M., Wu, J., Shan, S., & Chen, X. (2014). Domain adaptation for face recognition: Targetize source domain bridged by common subspace. International Journal of Computer Vision, 109 (1-2), 94–109.

Kannan, A., Kurach, K., Ravi, S., Kaufmann, T., Tomkins, A., Miklos, B., et al. (2016). Smart reply: Automated response suggestion for email. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 955–964). Association for Computing Machinery. Retrieved 28 April 2021 from https://doi.org/10.1145/2939672.2939801

Ling, X., Xue, G.-R., Dai, W., Jiang, Y., Yang, Q., & Yu, Y. (2008). Can Chinese web pages be classified with English data source? In Proceedings of the 17th international conference on World Wide Web (pp. 969–978). Association for Computing Machinery. Retrieved 29 April 2021 from https://doi.org/10.1145/1367497.1367628

Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013, October 16). Distributed representations of words and phrases and their compositionality . arXiv [cs.CL] . Retrieved from http://arxiv.org/abs/1310.4546

Milton-Barker, A. (2019). Inception V3 deep convolutional architecture For classifying acute Myeloid/Lymphoblastic Leukemia. Accessed on: February, 17.

Mou, L., Meng, Z., Yan, R., Li, G., Xu, Y., Zhang, L., & Jin, Z. (2016, March 19). How transferable are neural networks in NLP applications? arXiv [cs.CL] . Retrieved from http://arxiv.org/abs/1603.06111

Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22 (10), 1345–1359.

Pratt, L. Y. (1993). Discriminability-based transfer between neural networks. Advances in Neural Information Processing Systems , 204–204.

Raina, R., Ng, A. Y., & Koller, D. (2006). Constructing informative priors using transfer learning. In Proceedings of the 23rd international conference on Machine learning (pp. 713–720). Association for Computing Machinery. Retrieved 29 April 2021 from https://doi.org/10.1145/1143844.1143934

Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. Y. (2007). Self-taught learning: Transfer learning from unlabeled data. In Proceedings of the 24th international conference on Machine learning (pp. 759–766). Association for Computing Machinery. Retrieved 28 April 2021 from https://doi.org/10.1145/1273496.1273592

Rosenstein, M. T., Marx, Z., Kaelbling, L. P., & Dietterich, T. G. (2005). To transfer or not to transfer. In NIPS 2005 workshop on transfer learning (Vol. 898, pp. 1–4). engr.oregonstate.edu

Ruan, S., Wobbrock, J. O., Liou, K., Ng, A., & Landay, J. (2016). Speech is 3x faster than typing for english and mandarin text entry on mobile devices. arXiv Preprint arXiv:1608. 07323 . Retrieved from https://hci.stanford.edu/research/speech/paper/speech_paper.pdf

Simonyan, K., & Zisserman, A. (2014, September 4). Very deep convolutional networks for large-scale image recognition . arXiv [cs.CV] . Retrieved from http://arxiv.org/abs/1409.1556

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9).

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818–2826).

Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2017). Inception-v4, Inception-ResNet and the impact of residual connections on learning. Proceedings of the AAAI Conference on Artificial Intelligence, 31 (1) Retrieved 28 April 2021 from https://ojs.aaai.org/index.php/AAAI/article/view/11231

Thrun, S., & Pratt, L. (2012). Learning to learn . Springer.

Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., et al. (2016, September 26). Google’s neural Machine translation system: Bridging the gap between human and machine translation . arXiv [cs.CL] . Retrieved from http://arxiv.org/abs/1609.08144

Xie, M., Jean, N., Burke, M., Lobell, D., & Ermon, S. (2016). Transfer learning from deep features for remote sensing and poverty mapping. Proceedings of the AAAI Conference on Artificial Intelligence, 30 (1). Retrieved 30 April 2021 from https://ojs.aaai.org/index.php/AAAI/article/view/9906

Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014, November 6). How transferable are features in deep neural networks? arXiv [cs.LG] . Retrieved from http://arxiv.org/abs/1411.1792

Download references

Author information

Authors and affiliations.

Department of Computer Science and Engineering, University of North Texas, Denton, TX, USA

Riyad Bin Rafiq & Mark V. Albert

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Riyad Bin Rafiq .

Editor information

Editors and affiliations.

Computer Science and Engineering, University of North Texas, Denton, TX, USA

Mark V. Albert

Department of Learning Technologies, University of North Texas, Denton, TX, USA

Lin Lin , Michael J. Spector  & Lemoyne S. Dunn ,  & 

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Rafiq, R.B., Albert, M.V. (2022). Transfer Learning: Leveraging Trained Models on Novel Tasks. In: Albert, M.V., Lin, L., Spector, M.J., Dunn, L.S. (eds) Bridging Human Intelligence and Artificial Intelligence. Educational Communications and Technology: Issues and Innovations. Springer, Cham. https://doi.org/10.1007/978-3-030-84729-6_4

Download citation

DOI : https://doi.org/10.1007/978-3-030-84729-6_4

Published : 24 February 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-84728-9

Online ISBN : 978-3-030-84729-6

eBook Packages : Education Education (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

case study of transfer learning

Digital Discovery

Tackling data scarcity with transfer learning: a case study of thickness characterization from optical spectra of perovskite thin films †.

ORCID logo

* Corresponding authors

a Low Energy Electronic Systems (LEES), Singapore-MIT Alliance for Research and Technology (SMART), 1 Create Way, Singapore 138602, Singapore E-mail: [email protected] , [email protected]

b Solar Energy Research Institute of Singapore (SERIS), National University of Singapore, 7 Engineering Drive, Singapore 117574, Singapore

c Institute of Materials Research and Engineering (IMRE), Agency for Science, Technology and Research (A*STAR), 2 Fusionopolis Way, Singapore 138634, Singapore

d Department of Mechanical Engineering, Massachusetts Institute of Technology (MIT), 77 Massachusetts Ave., Cambridge, MA 02139, USA

e Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, Singapore 138632, Singapore

f Institute of Sustainability for Chemicals, Energy and Environment, Agency for Science, Technology and Research (A*STAR), 1 Pesek Rd, Singapore 627833, Singapore

g Department of Chemistry, The University of British Columbia (UBC), 2036 Main Mall, Vancouver, BC V6T 1Z1, Canada

h Department of Mathematics, National University of Singapore (NUS), 21 Lower Kent Ridge Rd, Singapore 119077, Singapore

Transfer learning (TL) increasingly becomes an important tool in handling data scarcity, especially when applying machine learning (ML) to novel materials science problems. In autonomous workflows to optimize optoelectronic thin films, high-throughput thickness characterization is often required as a downstream process. To surmount data scarcity and enable high-throughput thickness characterization, we propose a transfer learning workflow centering an ML model called thicknessML that predicts thickness from UV-Vis spectrophotometry. We demonstrate the transfer learning workflow from a generic source domain (of materials with various bandgaps) to a specific target domain (of perovskite materials), where the target-domain data are from just 18 refractive indices from the literature. While featuring perovskite materials in this study, the target domain easily extends to other material classes with a few corresponding literature refractive indices. With accuracy defined as being within-10%, the accuracy rate of perovskite thickness prediction reaches 92.2 ± 3.6% (mean ± standard deviation) with TL compared to 81.8 ± 11.7% without. As an experimental validation, thicknessML with TL yields a 10.5% mean absolute percentage error (MAPE) for six deposited perovskite films.

Graphical abstract: Tackling data scarcity with transfer learning: a case study of thickness characterization from optical spectra of perovskite thin films

Associated articles

  • Correction: Tackling data scarcity with transfer learning: a case study of thick…

Supplementary files

  • Supplementary information PDF (348K)

Article information

case study of transfer learning

Download Citation

Permissions.

case study of transfer learning

Tackling data scarcity with transfer learning: a case study of thickness characterization from optical spectra of perovskite thin films

S. I. P. Tian, Z. Ren, S. Venkataraj, Y. Cheng, D. Bash, F. Oviedo, J. Senthilnath, V. Chellappan, Y. Lim, A. G. Aberle, B. P. MacLeod, F. G. L. Parlane, C. P. Berlinguette, Q. Li, T. Buonassisi and Z. Liu, Digital Discovery , 2023,  2 , 1334 DOI: 10.1039/D2DD00149G

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence . You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication , please go to the Copyright Clearance Center request page .

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page .

Read more about how to correctly acknowledge RSC content .

Social activity

Search articles by author, advertisements.

A case study on transfer learning in convolutional neural networks

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

All Courses

  • Interview Questions
  • Free Courses
  • Career Guide
  • PGP in Data Science and Business Analytics
  • PG Program in Data Science and Business Analytics Classroom
  • PGP in Data Science and Engineering (Data Science Specialization)
  • PGP in Data Science and Engineering (Bootcamp)
  • PGP in Data Science & Engineering (Data Engineering Specialization)
  • Master of Data Science (Global) – Deakin University
  • MIT Data Science and Machine Learning Course Online
  • Master’s (MS) in Data Science Online Degree Programme
  • MTech in Data Science & Machine Learning by PES University
  • Data Analytics Essentials by UT Austin
  • Data Science & Business Analytics Program by McCombs School of Business
  • MTech In Big Data Analytics by SRM
  • M.Tech in Data Engineering Specialization by SRM University
  • M.Tech in Big Data Analytics by SRM University
  • PG in AI & Machine Learning Course
  • Weekend Classroom PG Program For AI & ML
  • AI for Leaders & Managers (PG Certificate Course)
  • Artificial Intelligence Course for School Students
  • IIIT Delhi: PG Diploma in Artificial Intelligence
  • Machine Learning PG Program
  • MIT No-Code AI and Machine Learning Course
  • Study Abroad: Masters Programs
  • MS in Information Science: Machine Learning From University of Arizon
  • SRM M Tech in AI and ML for Working Professionals Program
  • UT Austin Artificial Intelligence (AI) for Leaders & Managers
  • UT Austin Artificial Intelligence and Machine Learning Program Online
  • MS in Machine Learning
  • IIT Roorkee Full Stack Developer Course
  • IIT Madras Blockchain Course (Online Software Engineering)
  • IIIT Hyderabad Software Engg for Data Science Course (Comprehensive)
  • IIIT Hyderabad Software Engg for Data Science Course (Accelerated)
  • IIT Bombay UX Design Course – Online PG Certificate Program
  • Online MCA Degree Course by JAIN (Deemed-to-be University)
  • Cybersecurity PG Course
  • Online Post Graduate Executive Management Program
  • Product Management Course Online in India
  • NUS Future Leadership Program for Business Managers and Leaders
  • PES Executive MBA Degree Program for Working Professionals
  • Online BBA Degree Course by JAIN (Deemed-to-be University)
  • MBA in Digital Marketing or Data Science by JAIN (Deemed-to-be University)
  • Master of Business Administration- Shiva Nadar University
  • Post Graduate Diploma in Management (Online) by Great Lakes
  • Online MBA Programs
  • Cloud Computing PG Program by Great Lakes
  • University Programs
  • Stanford Design Thinking Course Online
  • Design Thinking : From Insights to Viability
  • PGP In Strategic Digital Marketing
  • Post Graduate Diploma in Management
  • Master of Business Administration Degree Program
  • MS in Business Analytics in USA
  • MS in Machine Learning in USA
  • Study MBA in Germany at FOM University
  • M.Sc in Big Data & Business Analytics in Germany
  • Study MBA in USA at Walsh College
  • MS Data Analytics
  • MS Artificial Intelligence and Machine Learning
  • MS in Data Analytics
  • Master of Business Administration (MBA)
  • MS in Information Science: Machine Learning
  • MS in Machine Learning Online
  • MS in Computer Science
  • MS in Computer Science and MS in Data Analytics
  • MIT Data Science Program
  • AI For Leaders Course
  • Data Science and Business Analytics Course
  • Cyber Security Course
  • PG Program Online Artificial Intelligence Machine Learning
  • PG Program Online Cloud Computing Course
  • Data Analytics Essentials Online Course
  • MIT Programa Ciencia De Dados Machine Learning
  • MIT Programa Ciencia De Datos Aprendizaje Automatico
  • Program PG Ciencia Datos Analitica Empresarial Curso Online
  • Mit Programa Ciencia De Datos Aprendizaje Automatico
  • Online Data Science Business Analytics Course
  • Online Ai Machine Learning Course
  • Online Full Stack Software Development Course
  • Online Cloud Computing Course
  • Cybersecurity Course Online
  • Online Data Analytics Essentials Course
  • Ai for Business Leaders Course
  • Mit Data Science Program
  • No Code Artificial Intelligence Machine Learning Program
  • MS Information Science Machine Learning University Arizona
  • Wharton Online Advanced Digital Marketing Program

What are the types of transfer learning?

What are some of the parameters that need to be adjusted to ensure optimal performance, what are the steps to be followed for training, transfer learning: type 1, transfer learning: type 2, transfer learning: type 3, computer vision: a case study- transfer learning.

The conclusion to the series on computer vision talks about the benefits of transfer learning and how anyone can train networks with reasonable accuracy. Usually, articles and tutorials on the web don’t include methods and hacks to improve accuracy. The aim of this article is to help you get the most information from one source. Stick on till the end to build your own classifier. 

The ImageNet moment was remarkable in computer vision and deep learning , as it created opportunities for people to reuse the knowledge procured through several hours or days of training with high-end GPUs. The different architectures can recognise over 20,000 classes of various objects and have achieved better accuracy than humans. How do we use this knowledge that scientists across the globe have gathered? The solution is transfer learning. Just as how a teacher teaches us class 8 mathematics which is built upon concepts learnt from classes 1-7, similarly, we can use the existing knowledge to suit our own needs. In this article, we will discuss transfer learning in its entirety and some common hacks that are required to increase the accuracy of outputs. Also, check out this computer vision essentials course and equip yourself with a hands on set of skills.

We will take an experimental approach with data, hyper-parameters and loss functions. Through the process of experimentation, we will discover the various techniques, concepts and hacks that would be helpful during the process of transfer learning. We will work with food-101 dataset that has 1000 images per class, and comprises 101 classes of food. 

We performed a series of experiments in every step of the training to identify the ideal loss, ideal hyper-parameters to achieve better results. The role of experimentation is to find out what works best according to the dataset. It requires this because not all datasets have the same features and type of data. Thus, a common approach for the same is to split the dataset into training, testing, and validation sets. The model is trained on the training set and then tested on the validation set to ensure overfitting/underfitting has not occurred. Once, we have a good score on both training and validation set; Only then do we expose our model to the test set. Thus, the validation set can be thought of as part of a dataset that is used to find the optimal conditions for best performance. 

Before we understand the parameters that need to be adjusted, let’s dive deep into transfer learning. Revise your concepts with Introduction to Transfer Learning .

  • Freeze Convolutional Base Model
  • Train selected top layers in the base model
  • Combination of steps a and b.

The convolutional base model refers to the original model architecture that we will use. It is a choice between using the entire model along with its weights, or freezing the model partially. In the first case, the initial weights are the model’s trained weights, and we will fine-tune all the layers according to our dataset. In the latter case, although the initial weights are the model’s pre-trained weights itself, the initial layers in the model are frozen. By freezing a layer, we are referring to the property of not updating the weights during training. This is to ensure that the number of trainable parameters is less. We freeze the initial layers as they identify low-level features such as edges, corners, and thus these features are independent of the dataset. 

  • Learning Rate
  • Model Architecture
  • Type of transfer learning
  • Optimisation technique
  • Regularisation 

We will consider a variety of experiments regarding the choice of optimiser, learning rate values, etc. We encourage readers to think of more ways to understand and implement. The experiments that have been performed are as follow:

1. Choice of optimiser

SGD with momentum update

  • SGD with Nesterov Momentum update

2. Learning Rate Scheduling

  • Same learning rate
  • Polynomial Decay #works well initially
  • Cyclical Learning Rate # used this finally

3. Model Selection

  • Resnet50 – Tried, but took massive amounts of time per epoch, hence didn’t proceed further
  • InceptionV3 – Stuck with this model and decreased image size to 96*96*3

4. Transfer Learning Type

  • Train selected the top layers in the base model
  • Combination of steps a and b. # This model worked well in increasing validation accuracy

5. Number of neurons and Dropout values

  • 128 – number of neurons + 0.5 – probability
  • 128 – number of neurons +0.25 – probability # Used this combination, as others increased the number of parameters massively.
  • 256 – number of neurons + 0.25 – probability
  • 256 – number of neurons + 0.5 – probability
  • 512 – number of neurons + 0.5 – probability
  • 512 – number of neurons + 0.25 – probability

6. GlobalAveragePooling2D vs GlobalMaxPooling2D

GlobalMaxPooling2D works better as a regularisation agent and also improves training accuracy when compared to GlobalAveragePooling2D. We did a comparison among the pooling techniques to study the role of pooling techniques as regularisation agent. 

Before starting a project, we should come up with an outline of the project deliverables and outcomes expected. Based on the conclusions made, list out the possible logical steps needed to be taken to complete the task.  

  • Define a model
  • Find ideal initial learning rate
  • Create a module for scheduling the learning rate

Augment the Images

  • Apply the transformation(mean subtraction) for better fine-tuning
  • Test on a smaller set
  • Fit the model
  • Test the model on random images
  • Visualise the kernels to validate if the training has been successful.

We will begin coding right away. We suggest you open your text editor or IDE and start coding as you read the blog. You can download the dataset from the official website, which can be found via a simple Google search: Food-101 dataset.

In the lines 1-32, we have imported all the libraries that will be required.

In lines 33-37, we define the parameters that will be used frequently within the article.

Line 38 loads the inception model with imagenet weights, to begin with, and include_top argument refers to the exclusion of the final layers as the model predicted 1000 classes, and we only have 101 classes.

Line 52 creates an ImageDataGenerator object, which is used to directly obtain images from a directory. It performs various operations on all the images in the directory mentioned. The operations mentioned here are normalisation, which is mentioned as the argument rescale = 1.0/255.0. The augmentation is done because CNNs are spatially invariant. If we rotate an image and send it to the network for prediction, the chances of mis-classification are high as the network hasn’t learned that during the training phase. Hence, augmentation leads to a better generalisation in learning. 

Line 53 and 54 similarly create ImageDataGenerator objects for loading images from test and validation directories, respectively. 

In lines 55-57, we specify the mean for the model which is used for the pre-processing of images. Mean-subtraction ensures that the model learns better. In Lines58-61, we load the data into respective variables. The next step is to find the ideal learning rate.

Let’s find the initial learning rate 

Model checkpoint refers to saving model after each round of training.

Early stopping is a technique to stop training if the decrease in loss value is negligible. We wait for a certain patience period, and then if the loss doesn’t decrease, we stop the training process.

The above snippet of code deals with the learning rate scheduling. Let’s talk about Learning Rate Scheduling:

Learning Rate Scheduling

Learning rate scheduling refers to making the learning rate adapt to the change in the loss values. Usually, the loss decreases its value until a certain epoch, when it stagnates. This is because the learning rate at that instant is very large comparatively, and thus, the optimisation isn’t able to reach the global optimum. Hence, the learning rate needs to be decreased. This tuning of the learning rate is necessary to get the lowest error percentage.

We have experimented with three types of learning rate scheduling techniques:

  • Polynomial decay
  • Cyclical learning rate scheduler

Polynomial decay, as the name suggests, decays the learning rate or step size polynomially, and step decay is decayed uniformly. Cyclical learning rate scheduler works by varying the learning rate between a minimum and a maximum range of values during the training process. It is to avoid local minimums. Usually, the cost functions are non-convex and it is desirable to get the global minimums.

We perform the same in Lines 62-88. To find the initial learning rate, we have used Adrian Rosebrock’s module from his tutorial on learning rate scheduling. For further insights into the topic, we suggest going through his blog on the same. 

Sanity Checks:

Overfit a tiny subset of data, to make sure the model fits the data, and make sure loss after first epoch is around -ln(1/n) as a safety metric. In this case n=101, hence, initial loss = 4.65

Since the loss value is nearly zero for the validation set without any regularisation method, the model is suitable to be fitted to a larger dataset. Overfitting occurs in the latter case, which can be administered by the use of dropouts and regularisers in the ultimate and penultimate layers.

As mentioned earlier, we are freezing the first few layers to ensure the number of trainable parameters are less.

Fit generator refers to model being trained and fit to the given dataset at hand.

In lines 110-130 we re-defined our model because this time we have frozen the first few layers and then proceeded with training. Lines 131-141 check if the model is overfitting or not.

The figure shows that the training accuracy is high, whereas the validation accuracy is low. Thus, applying regularisation techniques is necessary to avoid overfitting. We apply dropout to manage the same. 

Type 3 refers to the combination of both types of transfer learning, initially fine-tuning the entire network for a few epochs, and then freezing the top layers for next N number of epochs.

Cyclical Learning Rate

During training, the validation loss did not decrease irrespective of the variation in the initial learning rate. Hence, the logical assumption that can be made is that the cost function must have hit a local minimum, and to get it out of there, we use cyclical learning rate which performed much better than before.

Type of Transfer Learning Used

  • Type 1: Number of epochs: 180 epochs : Accuracy: 58.07 after 180 epochs
  • Type 2: Number of epochs: 100 epochs : Accuracy : 58.62 after 100 epochs
  • Type 3: Number of epochs: 150 epochs : Accuracy: 58.05 after 150 epochs

Thus, Type 2 is the most suitable type of transfer learning for this problem.

Optimiser Used

  • Polynomial Decay# works well initially

Model Selection

InceptionV3 – Used this model and decreased image size to 96*96*3

Transfer Learning Type

Combination of Type 1 and Type 2 models of transfer learning results in increasing the validation accuracy. The way to experiment with this would be to train the model with Type 1 for 50 epochs and then re-train with Type-2 transfer learning.

Number of neurons and Dropout values

b. 128 – number of neurons +0.25 – probability  #Used this combination, as others increased the number of parameters massively.

Some additional experiments that the user can do are try adding noise to images during the data augmentation phase to make the model independent of noise. 

We suggest the readers go through the entire article at least two times to get a thorough understanding of deep learning and computer vision, and the way it is implemented and used. We can go a step further and visualise the kernels to understand what is happening at a basic level. How are networks learning? The answer to that is: Kernels are smooth when the network has learned the classification right and are noisy and blurry when the classification learnt is wrong. We suggest the user figure out ways to visualise the kernels. It will add credibility and competence.

Please go through the entire series once, and then come back to this article, as it surely will get you a head start in computer vision, and we hope you gain the ability to understand and comprehend research papers in computer vision. 

If you wish to learn more about transfer learning and other computer vision concepts, upskill with Great Learning’s PG program in Artificial Intelligence and Machine Learning . If you want to only study machine learning concepts with a course of shorter duration, join Great Learning’s PG program in Machine Learning.

Avatar photo

Top Free Courses

Top 10 in-demand ai jobs roles and skills for 2024.

How to use chatGPT

OpenAI Unveils GPT-4o: A Leap in AI Capabilities

case study of transfer learning

What is Artificial Intelligence in 2024? Types, Trends, and Future of it?

ridge regression

What is Ridge Regression?

Applicatiosn of Generative AI

Top 20 Generative AI Applications/ Use Cases Across Industries

What is time complexity

What is Time Complexity And Why Is It Essential?

Leave a comment cancel reply.

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Recommended AI Courses

MIT Logo

MIT No Code AI and Machine Learning Program

Learn Artificial Intelligence & Machine Learning from University of Texas. Get a completion certificate and grow your professional career.

Course Duration

AI and ML Program from UT Austin

Enroll in the PG Program in AI and Machine Learning from University of Texas McCombs. Earn PG Certificate and and unlock new opportunities

Towards Transfer Learning Techniques-BERT, DistilBERT, BERTimbau, and DistilBERTimbau for Automatic Text Classification from Different Languages: A Case Study

Affiliation.

  • 1 Postgraduate Program in Urban Infrastructure Systems and Telecommunication Networks Management, Centre for Exact Sciences, Technology and the Environment (CEATEC), Pontifical Catholic University of Campinas (PUC-Campinas), 1516 Professor Dr. Euryclides de Jesus Zerbini, Campinas 13086900, SP, Brazil.
  • PMID: 36365883
  • PMCID: PMC9655936
  • DOI: 10.3390/s22218184

The Internet of Things is a paradigm that interconnects several smart devices through the internet to provide ubiquitous services to users. This paradigm and Web 2.0 platforms generate countless amounts of textual data. Thus, a significant challenge in this context is automatically performing text classification. State-of-the-art outcomes have recently been obtained by employing language models trained from scratch on corpora made up from news online to handle text classification better. A language model that we can highlight is BERT (Bidirectional Encoder Representations from Transformers) and also DistilBERT is a pre-trained smaller general-purpose language representation model. In this context, through a case study, we propose performing the text classification task with two previously mentioned models for two languages (English and Brazilian Portuguese) in different datasets. The results show that DistilBERT's training time for English and Brazilian Portuguese was about 45% faster than its larger counterpart, it was also 40% smaller, and preserves about 96% of language comprehension skills for balanced datasets.

Keywords: BERT; BERTimbau; DistilBERT; DistilBERTimbau; big data; pre-trained model; transformer-based machine learning.

  • Deep Learning*
  • Natural Language Processing
  • Social Media*

Grants and funding

  • 405498/2021-7/National Council for Scientific and Technological Development
  • Newsletter Sign Up

case study of transfer learning

Drive Equity with a Solution Designed for Students

We offer a comprehensive, digital-first, core language arts program for grades K–8, in which students focus on mastery of knowledge and skills and demonstrate high-quality work while building habits of character.

Rated ‘All Green’ by EdReports

The research-informed program earned all green scores from independent reviewer EdReports, meeting standards for text quality, usability, and building knowledge.

Achieve More with Imagine Learning EL Education

Imagine Learning EL Education content based

Content-Based Language Arts

The curriculum is organized to make literacy instruction effective and easy to implement, fostering student growth and success. Students in grades K–2 find support for phonics, fluency, and more in the Reading Foundational Skills block and through the optional Labs hands-on activities. In grades 3–5, students receive language and literacy practice in the ALL Block, and students in grades 6–8 engage in the Module Lessons.

case study of transfer learning

Foundational Literacy Instruction

Imagine Learning EL Education ®  is the only program with group foundational skills lessons in a digital platform, with whole group and differentiated small group instruction for grades K–2. Students build foundations in phonics, phonemic awareness, phonological awareness, vocabulary, fluency, and comprehension, all based on the Science of Reading.

Imagine Learning EL Education

Students in grades 3–5 receive additional literacy practice with ALL Block lessons — teacher-directed small group lessons based on students’ needs — and in small, heterogeneous groups for differentiated instruction. ALL Block provides independence, choice, the responsibility of learning, community and collaboration, and opportunities for differentiation.

Imagine Learning EL Education cohesive learning

Cohesive Learning

Educators have everything they need to teach and engage their students, whether in person, remote, or hybrid. We offer supports for district and school administration, robust data and reporting, digitized handouts, organized classroom content, and many opportunities for personalization and customization.

Imagine Learning EL Education overview video

Module Overview Videos

Families can engage in their students’ learning with exciting videos that set the stage for what they will be exploring in that module. Module Overview Videos provide an outline of the module and suggestions for communicating with the student and extending their learning at home.

The Four T’s

Each unit and lesson is planned with the 4T’s framework in mind. The 4T’s ensure that students are able to comprehend the requirements and goals of each lesson, so they spend more time learning and less time figuring out what to do. Our scaffolded approach ensures students build knowledge as they progress culminating in mastery of the unit concept.

The compelling topic brings the content to life and is based on the content standards students are expected to meet.

The culminating task is a scaffolded project or performance task that gauges students’ mastery of content and literacy.

Learning targets are derived from the language and rigor of the grade-level standards students are expected to meet.

Lessons feature complex texts to ensure students experience a volume of reading at their independent reading level.

Create a learning experience that leverages the impact of content-based literacy through a unique combination of digital and print materials.

Support and Achieve Greater Learning

Content-based, integrated approach.

The deeper their content knowledge, the more students can understand, read, speak, and write about the content. Rather than teaching comprehension skills in isolation, students build skills across content areas and disciplines to build knowledge.

Real-World Content and Engaging Texts

Students engage with compelling, complex texts to learn academic vocabulary, gain fluency, master increasingly challenging syntax, and build reading comprehension. The texts feature a variety of races and genders with various world perspectives.

Expanded View of Student Achievement

Students excel in diverse and inclusive settings that encourage them to learn from and respect one another. Learning targets allow students to lead their own learning and focus on critical thinking, effective communication, and deeper understanding.

Promote Student Thinking, Collaboration, and Respect

Throughout the curriculum, protocols and conversation cues encourage students to engage in meaningful and collaborative conversations to deepen their learning, understanding, and appreciation for their peers’ perspectives.

Backward Design

The curriculum centers learning around what students will know and be able to do at the end of instruction. Built-in assessments give educators valuable information to reflect on key learnings and provide support where needed.

Help | Advanced Search

Computer Science > Machine Learning

Title: a cross-city federated transfer learning framework: a case study on urban region profiling.

Abstract: Data insufficiency problems (i.e., data missing and label scarcity) caused by inadequate services and infrastructures or imbalanced development levels of cities have seriously affected the urban computing tasks in real scenarios. Prior transfer learning methods inspire an elegant solution to the data insufficiency, but are only concerned with one kind of insufficiency issue and fail to give consideration to both sides. In addition, most previous cross-city transfer methods overlook inter-city data privacy which is a public concern in practical applications. To address the above challenging problems, we propose a novel Cross-city Federated Transfer Learning framework (CcFTL) to cope with the data insufficiency and privacy problems. Concretely, CcFTL transfers the relational knowledge from multiple rich-data source cities to the target city. Besides, the model parameters specific to the target task are firstly trained on the source data and then fine-tuned to the target city by parameter transfer. With our adaptation of federated training and homomorphic encryption settings, CcFTL can effectively deal with the data privacy problem among cities. We take the urban region profiling as an application of smart cities and evaluate the proposed method with a real-world study. The experiments demonstrate the notable superiority of our framework over several competitive state-of-the-art methods.

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Azure training and certifications

Sharpen your cloud skills to accelerate your career and business results.

Create a culture of learning

Foster a growth mindset and build cloud skills to maximize impact across your entire org—from individual career advancement to improved business outcomes. Discover technical training, certifications, and personalized learning programs that will boost the performance of your teams.

Self-paced and role-based training

Learn on your own terms, in your timeframe—with personalized learning paths. Get the training and skills you need to build your career and be a leader in your org—whatever your role may be.

A person looking in the monitor in server room

You implement, manage, and monitor your company’s cloud environments and solutions.

A women thinking with his hands on his face while looking at the laptop

You design, build, and test the software and systems that make technology work, from commercial apps to enterprise cloud solutions.

Two people looking in the desktop and one person pointing at screen with finger

Data and AI professionals

You design and build data models to discover trends and develop data-driven insights that inform more meaningful business decisions.

A person looking in the monitor

Getting started?

Start with Azure Fundamentals— a six-part series exploring basic cloud concepts. Get a streamlined overview of Azure services and practical exercises to deploy your very first services at no cost.

Instructor-led training

With dedicated, personal attention and support from technical subject matter experts, instructor-led training provides an in-depth, structured learning environment.

People discussing with multiple desktops open and opened graph chart, world map in big screen board

Security technologies

Get the knowledge and skills you need to implement security controls, maintain your security posture, and identify and remediate security vulnerabilities in this course for IT security professionals. You’ll learn about identity and access, platform protection, data and application security, and security operations.

People discussing the presentation with text written over white board

Data fundamentals

Gain foundational knowledge of core data concepts such as relational and nonrelational data, big data, warehousing, real-time analytics, and data visualization. Explore relational data concepts and relational database services in Azure and learn nonrelational database fundamentals with Azure Cosmos DB.

A person using touch screen monitor

Developing solutions

Get started creating end-to-end solutions in Azure with this course for developers. Explore Azure App Service and learn how to create, maintain, and deploy web apps more efficiently. Learn about great solutions for data processing, systems integration, and building simple APIs and microservices with Azure Functions.

Certifications

Gain new cloud skills to boost your productivity and accelerate your team’s success. Advance your career and showcase your skills by completing certifications and challenges that demonstrate your expertise.

A person working on computer with multiple monitors

Get a guide to training and certifications

Explore your options to advance your career, earn recognition, and validate your technical knowledge with Azure training and certifications. Get this comprehensive resource guide to identify the right Azure certifications based on current or future roles and projects.

A person working on laptop with multiple monitors open

Get a visual roadmap

Explore an infographic of how Azure certifications intersect with core skill areas and roles. Get a quick overview of certifications to decide which are the best fit for your role and skill set.

A person looking at desktop having multiple monitors

Make a training plan

Get help to identify the right starting point for your training journey and make an individualized plan to build your skills for the future. Find recommended tracks and exam prep resources for certifications based on your role or your projects.

Azure Virtual Training Days

Sharpen your technical skills, help your org adapt to new ways of working, solve business challenges, and earn certifications.

Microsoft Build

Come together with peers and experts to learn about the latest innovations in code and app development, hear announcements and news, and ask your technical questions in live, interactive tutorials.

Microsoft Ignite

Explore the latest tools and get deep technical training from Microsoft experts at this event for developers, IT pros, implementors, architects, data professionals, and decision makers.

Unit4

Unit4 creates a skill-building culture to improve agility and performance
Enterprise software company, Unit4, developed a ‘choose curiosity, embrace challenges’ mindset that empowers employees to deliver on the job—while advancing their careers with Microsoft courses and certifications.

Documentation

  • Azure documentation
  • Microsoft Cloud Adoption Framework for Azure
  • Microsoft Azure Well-Architected Framework

Azure shows

  • Browse all Azure shows
  • Inside Azure for IT
  • All Around Azure
  • The AI Show
  • Data Exposed
  • SAP on Azure training videos

Start building on Azure for free

Get popular services free for 12 months and 55+ other services free always—plus $200 credit to use in your first 30 days.

Find Azure learning resources

Grow your cloud computing and Azure skills with helpful guides, demos, and learning modules from Microsoft Learn.

IMAGES

  1. A Case Study of Transfer of Learning

    case study of transfer learning

  2. Case study: transfer learning

    case study of transfer learning

  3. (PDF) The Case for Case-Based Transfer Learning

    case study of transfer learning

  4. Transfer learning types corresponding to transfer learning approach

    case study of transfer learning

  5. Transfer learning

    case study of transfer learning

  6. An Introduction to Transfer Learning in Machine Learning

    case study of transfer learning

VIDEO

  1. EfficientML.ai Lecture 19: On-Device Training and Transfer Learning (MIT 6.5940, Fall 2023)

  2. Webinar : Transfer Learning for Image and Text Classification

  3. Webinar : Transfer Learning for Image and Text Classification session2

  4. 12 -Transfer Learning with TensorFlow

  5. Transfer learning

  6. CS 285: Lecture 22, Part 1: Transfer Learning & Meta-Learning

COMMENTS

  1. A Comprehensive Hands-on Guide to Transfer Learning with Real-World

    Let's explore some real-world case studies now and build some deep transfer learning models! Case Study 1: Image Classification with a Data Availability Constraint. In this simple case study, will be working on an image categorization problem with the constraint of having a very small number of training samples per category. The dataset for ...

  2. Transfer Learning Guide: A Practical Tutorial With Examples for Images

    Case study How Neptune gave Waabi organization-wide visibility on experiment data. Case study How Elevatus uses Neptune to check experiment results in under 1 minute. See all case studies. ... Now, this is specific to transfer learning in natural language processing. First, let's download the pre-trained word embeddings. ...

  3. PDF The Common Intuition to Transfer Learning Can Win or Lose: Case Studies

    We study our transfer learning approach under two different assumptions: •For a partial knowledge of the statistical relation between the tasks: (i) we consider ... misspecification, which in our transfer learning case also implies a partial knowledge of the task relation. In Section6we extend our analysis by considering an unknown task relation

  4. [2103.03166] Contrastive Learning Meets Transfer Learning: A Case Study

    Contrastive Learning Meets Transfer Learning: A Case Study In Medical Image Analysis. Yuzhe Lu, Aadarsh Jha, Yuankai Huo. Annotated medical images are typically rarer than labeled natural images since they are limited by domain knowledge and privacy constraints. Recent advances in transfer and contrastive learning have provided effective ...

  5. Mastering Transfer Learning: A Rock-Paper-Scissors Case Study

    Transfer learning has already found practical applications in improving generative AI models. It has been used to adapt text-based models like GPT-3 to generate images and write code. In the case of GANs, transfer learning can help create hyper-realistic images.

  6. Transfer learning: a friendly introduction

    Transfer learning (TL), one of the categories under ML, has received much attention from the research communities in the past few years. Traditional ML algorithms perform under the assumption that a model uses limited data distribution to train and test samples. ... Inductive learning—case studies on multi-task learning and self-learning ...

  7. PDF Transfer learning: a friendly introduction

    transfer," "knowledge integration," "knowledge-based inductive bias learning," "super-vised learning," "meta-learning," and "semi-supervised learning" [7]. Among such, the 3, multi-task learning model is seen to have a strong learning strategy that is similar to TL because both learning models strive to learn multiple ...

  8. Transfer Learning: Scenarios, Self-Taught Learning, and ...

    In this chapter, we will first go over the definitions and fundamental scenarios of transfer learning. We will cover the techniques involved in self-taught learning and multitask learning. In the end, we will carry out a detailed case study with multitask learning using NLP tasks to get hands-on experience on the various concepts and methods ...

  9. 1 A Comprehensive Survey on Transfer Learning

    domains, transfer learning can be further divided into two categories, i.e., homogeneous and heterogeneous transfer learning [4]. Homogeneous transfer learning approachesare developed and proposed for handling the situations where the domains are of the same feature space. In homogeneous transfer learning, some studies assume that domains differ

  10. Transfer Learning: Leveraging Trained Models on Novel Tasks

    According to the availability of labeled and unlabeled data in the source domain, two case studies arise in the inductive transfer learning setting. Case 1. In this case, the source domain is unavailable for labeled data. So, the inductive transfer learning setting works like a self-taught learning setting, addressed by Raina et al. . As the ...

  11. A conceptual study of transfer learning with linear models for data

    3.1.2.Case 2: transfer learning assuming P 1 ⊇ P 2 but P 1 is unknown. We demonstrated the effectiveness of transfer learning where we know the important features a priori. In reality, such information may not be readily available and P 1 needs to be estimated via unsupervised or supervised learning. Here, we use LASSO on ρ 1 to generate a model M 1 and thereby identify the important ...

  12. Tackling data scarcity with transfer learning: a case study of

    Transfer learning (TL) increasingly becomes an important tool in handling data scarcity, especially when applying machine learning (ML) to novel materials science problems. In autonomous workflows to optimize optoelectronic thin films, high-throughput thickness characterization is often required as a downstr

  13. A case study on transfer learning in convolutional neural networks

    In this work, a case study is performed on transfer learning approach in convolutional neural networks. Transfer learning parameters are examined on AlexNet, VGGNet and ResNet architectures for marine vessel classification task on MARVEL dataset. The results confirmed that transferring the parameter values of the first layers and fine-tuning the other layers, whose weights are initialized from ...

  14. Transfer Learning in Computer Vision a case Study

    Computer Vision: A Case Study- Transfer Learning. The conclusion to the series on computer vision talks about the benefits of transfer learning and how anyone can train networks with reasonable accuracy. Usually, articles and tutorials on the web don't include methods and hacks to improve accuracy.

  15. Towards a Better Understanding of Transfer Learning for Medical Imaging

    One of the main challenges of employing deep learning models in the field of medicine is a lack of training data due to difficulty in collecting and labeling data, which needs to be performed by experts. To overcome this drawback, transfer learning (TL) has been utilized to solve several medical imaging tasks using pre-trained state-of-the-art models from the ImageNet dataset. However, there ...

  16. Transfer learning in environmental remote sensing

    Prior shift: in the case of prior shift, the conditional distributions have high similarity but the prior distributions of the label space in the source domain and target domain are different, i.e., p s y x ≈ p t y x and p s y ≠ p t y (Fig. 2).Prior shift happens when the label spaces are different in the source and target domains. For instance, in landcover classification, the source ...

  17. Identification of a Suitable Transfer Learning Architecture for

    Identification of a Suitable Transfer Learning Architecture for Classification: A Case Study with Liver Tumors. B. Lakshmi Priya, B. Lakshmi Priya. [email protected]; Department of ECE, Manakula Vinayagar Institute of Technology, Puducherry, India. Search for more papers by this author.

  18. Does learning from language family help? A case study on a low-resource

    This paper proposes a customized approach to leverage task-specific data of low-resource language families via transfer learning from RRL and has demonstrated benchmark-compatible performance in a zero-shot setup with single-epoch task learning. Multilingual pre-trained models make it possible to develop natural language processing (NLP) applications for low-resource languages (LRLs) using the ...

  19. Full article: Learning through policy transfer? Reviewing a decade of

    Qualitative methods, and particularly single-issue case studies, dominate this area of research. Most of the papers in the sample (48) examined instances of policy transfer and policy learning ex-post, or retrospectively, and 11 articles conducted ex-ante, or hypothetical, research. In all cases, interviews are the preferred method of data ...

  20. Towards Transfer Learning Techniques-BERT, DistilBERT ...

    Towards Transfer Learning Techniques-BERT, DistilBERT, BERTimbau, and DistilBERTimbau for Automatic Text Classification from Different Languages: A Case Study ... In this context, through a case study, we propose performing the text classification task with two previously mentioned models for two languages (English and Brazilian Portuguese) in ...

  21. Transfer Learning Approach to Seed Taxonomy: A Wild Plant Case Study

    This study investigated transfer learning techniques to analyze images of plants and extract features that can be used to cluster the species hierarchically using the k-means clustering algorithm. ... and Mariam AlKharraa. 2023. "Transfer Learning Approach to Seed Taxonomy: A Wild Plant Case Study" Big Data and Cognitive Computing 7, no. 3: 128 ...

  22. Case Studies

    Case Studies - Lever - Transfer of Learning. We have a number of other case studies created through our partnerships with learning providers. Our partners prefer that we don't share these widely, however if you want to see a case study more relatable to your industry, then contact us so we can share them with you.

  23. A Deep Transfer Learning Network for Structural Condition

    In this study, a deep transfer learning network, e.g., TL-SHMnet, has been proposed as a structural condition identification scheme. In this study, the dimensions and architecture of TL-SHMnet follow SHMnet , as shown in Figure 1. Nevertheless, the proposed approach is expected to be scalable and general. ... Through the case study, the optimal ...

  24. CUNY transfer practice community turns collaboration into action

    The CUNY Transfer Explorer (CUNY T-Rex) Community of Practice (CoP), a collaboration between the City University of New York (CUNY) and nonprofit Ithaka S+R, is one of those spaces. CUNY T-Rex is a website that shows how all credits and past learning transfer and apply to program requirements. The CUNY T-Rex CoP has become a catalyst for change ...

  25. Semantic-fused multi-granularity cross-city traffic prediction

    To address this issue, we propose a Semantic-Fused Multi-Granularity Transfer Learning (SFMGTL) model to achieve knowledge transfer across cities with fused semantics at different granularities. In detail, we design a semantic fusion module to fuse various semantics while conserving static spatial dependencies via reconstruction losses.

  26. Content-Based Literacy Instruction For Grades K-8

    The curriculum centers learning around what students will know and be able to do at the end of instruction. Built-in assessments give educators valuable information to reflect on key learnings and provide support where needed. Imagine Learning's content-based language arts and literacy program helps students in grades K-8 grow their personal ...

  27. A Cross-City Federated Transfer Learning Framework: A Case Study on

    Data insufficiency problems (i.e., data missing and label scarcity) caused by inadequate services and infrastructures or imbalanced development levels of cities have seriously affected the urban computing tasks in real scenarios. Prior transfer learning methods inspire an elegant solution to the data insufficiency, but are only concerned with one kind of insufficiency issue and fail to give ...

  28. Electronic and Nuclear Quantum Effects on Proton Transfer Reactions of

    Rare tautomeric forms of nucleobases can lead to Watson-Crick-like (WC-like) mispairs in DNA, but the process of proton transfer is fast and difficult to detect experimentally. NMR studies show evidence for the existence of short-time WC-like guanine-thymine (G-T) mispairs; however, the mechanism of proton transfer and the degree to which nuclear quantum effects play a role are unclear. We ...

  29. Department of Human Services

    Overview. Our mission is to assist Pennsylvanians in leading safe, healthy, and productive lives through equitable, trauma-informed, and outcome-focused services while being an accountable steward of commonwealth resources. Report Abuse or Neglect. Report Assistance Fraud. Program Resources & Information.

  30. Azure Cloud Skills—Trainings and Certifications

    Find Azure learning resources. Grow your cloud computing and Azure skills with helpful guides, demos, and learning modules from Microsoft Learn. Discover technical training, Azure certifications, and personalized learning programs from Microsoft to boost personal cloud skills and maximize business impact for your entire organization.