logo

ISSN: 1735-188X

  • Call for Special Issue
  • Important Announcements
  • Online Submission
  • Indexing/Abstracting

logo

Volume 15, No 1, 2018

Strategies For Migrating Large, Mission-Critical Database Workloads To The Cloud

Sandeep reddy narani , madan mohan tito ayyalasomayajula , sathishkumar chintala.

This comprehensive research paper explores strategies for migrating large, mission-critical database workloads to cloud environments. By examining various aspects of the migration process, including assessment and planning, data migration strategies, security considerations, performance optimization, and post-migration management, this study provides valuable insights into effective methodologies for organizations undertaking complex database migrations to cloud platforms. The paper analyzes current industry practices and emerging trends (Zhang et al., 2010), offering a holistic view of the challenges and opportunities presented by cloud database migration. Through an in-depth examination of cloud database offerings, migration strategies, and real-world case studies, this research aims to guide IT professionals and decision-makers in successfully transitioning their critical database workloads to the cloud while minimizing risks and maximizing benefits.

Pages : 298-317

Keywords : Cloud migration, database workloads, mission-critical systems, data integrity, performance optimization, hybrid cloud, multi-cloud, lift and shift, re-platforming, re-architecting, cloud database services, data security, disaster recovery

Metamodels to support database migration between heterogeneous data stores

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, index terms.

Information systems

Data management systems

Database design and models

Recommendations

Journey of database migration from rdbms to nosql data stores.

Migration is a complex process involving many challenges while migrating from an existing system to a new one. Database migration involves schema transformation, migration of data, complex query support, and indexing. This paper presents a) ...

Migration of Relational Database to Document-Oriented Database: Structure Denormalization and Data Transformation

Relational databases remain the leading data storage technology. Nevertheless, many companies want to reduce operating expenses, to make scalable applications that use cloud computing technologies. Use of NoSQL database is one of the possible solutions, ...

An Approach to Heterogeneous Database Migration

The core problems of database migration are the data integrity, data accuracy and business continuity. We discussed these problems during heterogeneous database migration in this article. We designed and implemented a migration project for Tsinghua ...

Information

Published in.

  • Conference Chairs:

Karlsruhe Institute of Technology, Germany

University of Montréal, Canada

  • SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

  • Univ. of Montreal: University of Montreal

Association for Computing Machinery

New York, NY, United States

Publication History

Permissions, check for updates, author tags.

  • data migration
  • database migration
  • schema migration
  • Research-article

Funding Sources

  • Deutsche Forschungsgemeinschaft

Acceptance Rates

Contributors, other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 149 Total Downloads
  • Downloads (Last 12 months) 44
  • Downloads (Last 6 weeks) 1

View Options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

View options.

View or Download as a PDF file.

View online with eReader .

Share this Publication link

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Journal Proposal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

BDCC-logo

Article Menu

database migration research article

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Without data quality, there is no data migration.

database migration research article

1. Introduction

2. concept of data migration, 2.1. definition of data migration, 2.2. requirements, 2.3. goals of data migration.

  • To analyze and clean up the existing data and documents (by the project and the core organization),
  • Correct automated, semi-automated, and manual migrations of the relevant data and documents, including linking the business objects with the documents,
  • Understand the migration and validate the results obtained. The data protection requirements must be observed.

2.4. Types of Data Migration

2.5. strategies of data migration.

  • The new system offers full functionality but is only available to a limited group of users. New and old systems run in parallel. The group of users is expanded with each level. The problem here is the parallel use of the old and the new system and, in particular, the maintenance of data consistency.
  • Another type of introduction is the provision of partial functions for all users. The users work in parallel on new and old systems. With each step, the functionality of the new system is expanded until the old system has been completely replaced.

3. Data Quality and Its Impact on Data Migration

  • The consolidation and quality improvement takes place before the project of introducing new software. The separation of these two projects is an important success factor.
  • As part of the analysis of the existing data landscape, requirements for the new system are identified, which flow into the selection or the initial adjustment of this system.
  • Data consolidation and data quality improvement is a project with some factors that cannot be precisely planned in terms of time. By separating the process from the actual migration project, it is easier to plan and be more successful.
  • Since the project pressure of the implementation process is largely eliminated, the data can be better prepared with more time and brought to a significantly higher level.
  • Data quality methods and tools rank the errors with the greatest impact on the overall result first. Therefore, the time available for error corrections can be used more efficiently.
  • The time span between replacing the old documentation and using the new system is optimized. The target system is filled from a data source. Errors due to different versions and errors in links were transparent and cleared up in advance.
  • Direct and indirect costs are saved through good data quality in data migration projects (e.g., waste of budgets, costs due to wrong decisions and lost sales, etc.).
  • Data collection: Data collection is often the greatest source of errors in terms of data quality. This includes the incorrect use of input masks both by internal employees and by customers who enter information in incorrect input areas (e.g., confusion of first and last name fields). Typing errors, phonetically similar sounds (e.g., ai and ei in Maier or Meier) or inadequate inquiries from service employees are also potential sources of error. Many of these errors can only occur due to a poor design or poor mandatory field protection or plausibility checks in the input masks. However, the import of inadequate external data such as purchased address or customer data can also lead to deterioration in data quality.
  • Processes: Processes become the cause of poor data quality if they are incorrect or incomplete (e.g., incorrect processing of existing data or missing check routines).
  • Data architecture: Data architecture describes the data processing technologies (e.g., various application software) and the data flow between these technologies. Many of these programs require their own, special data representation, such as formatting or the order of the input and output arguments. Therefore, a conversion of the data is often necessary, which can lead to inconsistencies and thus poor data quality.
  • Data definitions: In order for large companies to work effectively, there must be a common understanding of frequently used terms. For example, there is often no uniform scheme for calculating sales or different views as to which data are used. These heterogeneous interpretations can lead to inconsistent data descriptions, table definitions, and field formats.
  • Use of data: Errors in application programs can give a user the impression of poor data quality, although the underlying operational system provides almost perfect data in terms of content. Such an impression can occur on the one hand through incorrect interpretation of the data by the user or the creator of the application program from the source system. A supposed correction resulting from this could, contrary to the original intention, introduce new errors into the information system. Apart from an incorrect interpretation, ready-made rules should be created and adhered to for the correction process, such as that data corrections are always made in the source systems and not in the application programs.
  • Data expiration: This factor occurs automatically in some areas, as certain data can lose their validity after a certain period. This mainly includes address and telephone data, but also bank details, price lists, and many other areas are affected by data deterioration, which obviously also limits the data quality.

4. Relationship between Data Quality and Data Migration

  • Have you already worked on a data migration project? Or are you currently on a data migration project?
  • How important is the data quality in the context of the data migration project?
  • Was it a goal to improve and increase the data quality in the course of the data migration?
  • Which data quality criteria [correctness, completeness, consistency and timeliness] are taken into account in the data migration project? How do you rate the degree of fulfillment of the following criteria in relation to the quality of the data at the time or after the data migration project is completed?
  • Which data quality criteria do you take into account as part of the data migration project in order to control and improve the quality?
  • What methods and tools do you use in the data migration project to clean up the dirty data?
  • How do you rate the degree of fulfillment of the following success criteria [project budget, timing, top management, communication and involvement with the end user, training of employees, and employee satisfaction] with regard to the data migration project? Which success criteria do you also consider in the context of the data migration project?
  • Correctness: The data must match the reality.
  • Completeness: Attributes must contain all the necessary data.
  • Consistency: A data record must not have any contradictions in itself or with other data records.
  • Timeliness: All data records must correspond to the current state of the depicted reality.
  • A migration project always has a budget. The budget contains all the cost-effective resources necessary to achieve the goals.
  • A migration project always has an end. It is often carried out under great time pressure.
  • An efficient and successful migration can be difficult without the support of top management. Migration often changes processes and behavior. Top management must commit to change and be ready to take risks. If the decision to migrate to a new system is not made by top management, this is not very motivating for everyone involved. The complexity of exchanging a system is very high, and the advantages of exchanging only become apparent after its introduction.
  • Open communication and involvement with the end user is an important factor right from the start, because the new system must be accepted in order to be successful. The decision to replace the old system may not be easy for everyone to understand. End users need to understand why the existing system is being replaced so there is no aversion to the new system. Therefore, the involvement of end users in the migration project is an important factor.
  • In order for end users to be able to use the new software right from the start and to feel safe, it must be ensured that employees are trained in relation to the new system at an early stage.
  • After the introduction of a new system, the satisfaction of the employees can be a criterion for the success of the migration project. The goal with a new system is to give employees a new system that makes their work easier.

5. Conclusions

Author contributions, institutional review board statement, informed consent statement, data availability statement, acknowledgments, conflicts of interest.

  • Jha, S.; Jha, M.; O’Brien, L.; Wells, M. Integrating legacy system into big data solutions: Time to make the change. In Proceedings of the Asia-Pacific World Congress on Computer Science and Engineering, Nadi, Fiji, 4–5 November 2014; pp. 1–10. [ Google Scholar ] [ CrossRef ]
  • Jha, S.; Jha, M.; O’Brien, L.; Cowling, M.; Wells, M. Leveraging the Organisational Legacy: Understanding How Businesses Integrate Legacy Data into Their Big Data Plans. Big Data Cogn. Comput. 2020 , 4 , 15. [ Google Scholar ] [ CrossRef ]
  • Matthes, F.; Schulz, C.; Haller, K. Testing & quality assurance in data migration projects. In Proceedings of the 27th IEEE International Conference on Software Maintenance (ICSM’11), Williamsburg, VA, USA, 25–30 September 2011; pp. 438–447. [ Google Scholar ]
  • Azeroual, O.; Saake, G.; Abuosba, M. Data quality measures and data cleansing for research information systems. J. Digit. Inf. Manag. 2018 , 16 , 12–21. [ Google Scholar ]
  • Verhulst, S.G.; Young, A. The Potential and Practice of Data Collaboratives for Migration. In Guide to Mobile Data Analytics in Refugee Scenarios ; Salah, A., Pentland, A., Lepri, B., Letouzé, E., Eds.; Springer: Cham, Switzerland, 2019; pp. 465–476. [ Google Scholar ]
  • Leloup, F. Migration, a complex phenomenon. Int. J. Anthropol. 1996 , 11 , 101–115. [ Google Scholar ] [ CrossRef ]
  • Stahlknecht, P.; Hasenkamp, U. Einführung in die Wirtschaftsinformatik ; Springer: Berlin/Heidelberg, Germany, 1999. [ Google Scholar ]
  • Meier, A.; Mercerat, J.; Muriset, A.; Untersinger, J.; Eckerlin, R.; Ferrara, F. Hierarchical to Relational Database Migration. IEEE Softw. 1994 , 11 , 21–27. [ Google Scholar ] [ CrossRef ]
  • Meier, A. Providing Database Migration Tools—A Practicioner’s Approach. In Proceedings of the 21th International Conference on Very Large Data Bases (VLDB’95), Zürich, Switzerland, 11–15 September 1995; pp. 635–641. [ Google Scholar ]
  • Sarmah, S.S. Data Migration. Sci. Technol. 2018 , 8 , 1–10. [ Google Scholar ]
  • Khajeh-Hosseini, A.; Sommerville, I.; Bogaerts, J.; Teregowda, P. Decision Support Tools for Cloud Migration in the Enterprise. In Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing, Washington, DC, USA, 4–9 July 2011; pp. 541–548. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • McAdam, J. The concept of crisis migration. Forced Migr. Rev. 2014 , 45 , 10–11. [ Google Scholar ]
  • Morris, J. Practical Data Migration , 3rd ed.; British Informatics Society Ltd.: Swindon, UK, 2006. [ Google Scholar ]
  • Derr, E.; Bugiel, S.; Fahl, S.; Acar, Y.; Backes, M. Keep me Updated: An Empirical Study of Third-Party Library Updatability on Android. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS ‘17) ; Association for Computing Machinery: New York, NY, USA, 2017; pp. 2187–2200. [ Google Scholar ]
  • Laranjeiro, N.; Soydemir, S.N.; Bernardino, J. A Survey on Data Quality: Classifying Poor Data. In Proceedings of the 21st IEEE Pacific Rim International Symposium on Dependable Computing, (PRDC 2015), Zhangjiajie, China, 18–20 November 2015; pp. 179–188. [ Google Scholar ]
  • Morris, J. Practical Data Migration ; BCS, The Chartered Institute: Swindon, UK, 2012; Available online: https://ws1.nbni.co.uk/fusion/v2.0/supplement/5d6e240d646eb18c10cb4e84.pdf (accessed on 17 May 2021).
  • Karnitis, G.; Arnicans, G. Migration of Relational Database to Document-Oriented Database: Structure Denormalization and Data Transformation. In Proceedings of the 7th International Conference on Computational Intelligence, Communication Systems and Networks, Riga, Latvia, 3–5 June 2015; pp. 113–118. [ Google Scholar ] [ CrossRef ]
  • Hudicka, J.R. An Overview of Data Migration Methodology. 1998. Available online: https://dulcian.com/articles/overview_data_migration_methodology.htm (accessed on 22 March 2021).
  • Latt, W.Z. Data Migration Process Strategies. Available online: https://onlineresource.ucsy.edu.mm/handle/123456789/1226 (accessed on 17 May 2021).
  • Lin, C.Y. Migrating to Relational Systems: Problems, Methods, and Strategies. Contemp. Manag. Res. 2008 , 4 , 369–380. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • English, L.P. Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits ; Wiley: New York, NY, USA, 1999. [ Google Scholar ]
  • Würthele, V. Datenqualitätsmetrik für Informationsprozesse: Datenqualitätsmanagement Mittels Ganzheitlicher Messung der Datenqualität ; ETH Zurich: Zurich, Switzerland, 2003. [ Google Scholar ]
  • Apel, D.; Behme, W.; Eberlein, R.; Merighi, C. Datenqualität Erfolgreich Steuern: Praxislösungen für Business-Intelligence-Projekte , 3rd Revised and Extended Edition; dpunkt.verlag: Heidelberg, Germany, 2015. [ Google Scholar ]
  • Eppler, M.J. Managing Information Quality: Increasing the Value of Information in Knowledge-Intensive Products and Processes ; Springer: Berlin/Heidelberg, Germany, 2006. [ Google Scholar ]
  • Haller, K. Towards the industrialization of data migration: Concepts and patterns for standard software implementation projects. In Proceedings of the 21st International Conference on Advanced Information Systems Engineering (CAISE), Amsterdam, The Netherlands, 8–12 June 2009; pp. 63–78. [ Google Scholar ]
  • Manjunath, T.N.; Hegadi, R.S.; Archana, R.A. A study on sampling techniques for data testing. Int. J. Comput. Sci. Commun. 2012 , 3 , 13–16. [ Google Scholar ]
  • Paygude, P.; Devale, P.R. Automated data validation testing tool for data migration quality assurance. Int. J. Mod. Eng. Res. 2013 , 3 , 599–603. [ Google Scholar ]
  • Clément, D.; Ben Hassine-Guetari, S.; Laboisse, B. Data Quality as a Key Success Factor for Migration Projects. In Proceedings of the 15th International Conference on Information Quality (ICIQ) 2010, Little Rock, AR, USA, 12–14 November 2010. [ Google Scholar ]
  • Kreis, L. Datenqualität als kritischer Erfolgsfaktor bei Datenmigrationen. Bachelor’s Thesis, Zurich University of Applied Sciences, Zurich, Switzerland, 2017. [ Google Scholar ]
  • Azeroual, O.; Saake, G.; Abuosba, M.; Schöpfel, J. Data Quality as a Critical Success Factor for User Acceptance of Research Information Systems. Data 2020 , 5 , 35. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Hoyle, R.H. The structural equation modeling approach: Basic concepts and fundamental issues. In Structural Equation Modeling: Concepts, Issues, and Applications ; Hoyle, R.H., Ed.; Sage Publications, Inc.: Washington, DC, USA, 1995; pp. 1–15. Available online: https://psycnet.apa.org/record/1995-97753-001 (accessed on 17 May 2021).

Click here to enlarge figure

IndicatorsCronbach AlphaFactor LoadingEigenvalues(%) of Variance
Correctness0.91050.72286.99736.751
Completeness0.92440.7210
Consistency0.91600.7111
Timeliness0.90900.7129
Project Budget0.91740.585311.01263.249
Timing0.90640.8190
Top Management0.91830.8480
Communication and Engagement of End Users0.91520.7819
Training of Employees0.91110.8100
Employee Satisfaction0.90440.5211
Total Cronbachs Alpha0.9120Kaiser–Meyer–Olkin Criterion (KMO)1.000
MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

Azeroual, O.; Jha, M. Without Data Quality, There Is No Data Migration. Big Data Cogn. Comput. 2021 , 5 , 24. https://doi.org/10.3390/bdcc5020024

Azeroual O, Jha M. Without Data Quality, There Is No Data Migration. Big Data and Cognitive Computing . 2021; 5(2):24. https://doi.org/10.3390/bdcc5020024

Azeroual, Otmane, and Meena Jha. 2021. "Without Data Quality, There Is No Data Migration" Big Data and Cognitive Computing 5, no. 2: 24. https://doi.org/10.3390/bdcc5020024

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Lessons learned: on the challenges of migrating a research data repository from a research institution to a university library

  • Original Paper
  • Published: 20 September 2019
  • Volume 55 , pages 191–207, ( 2021 )

Cite this article

database migration research article

  • Thorsten Trippel   ORCID: orcid.org/0000-0002-7211-7393 1 &
  • Claus Zinn   ORCID: orcid.org/0000-0002-6067-5451 1  

431 Accesses

2 Citations

Explore all metrics

The transfer of research data management from one institution to another infrastructural partner is all but trivial, but can be required, for instance, when an institution faces reorganization or closure. In a case study, we describe the migration of all research data, identify the challenges we encountered, and discuss how we addressed them. It shows that the moving of research data management to another institution is a feasible, but potentially costly enterprise. Being able to demonstrate the feasibility of research data migration supports the stance of data archives that users can expect high levels of trust and reliability when it comes to data safety and sustainability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

database migration research article

Similar content being viewed by others

database migration research article

How to Design a Research Data Management Platform? Technical, Organizational and Individual Perspectives and Their Relations

Advancing research data publishing practices for the social sciences: from archive activity to empowering researchers, a metadata-driven approach to data repository design, explore related subjects.

  • Artificial Intelligence

As a starting point, depositing agreements can draw upon templates that are prepared by infrastructure providers. The legal evaluation of an instantiated template, however, often depends on the very instantiations, that is, specific criteria that involve the character of the data, the legal status of depositor and depositee, third parties etc. To avoid any form of liability, templates are rarely shared across institutions. If templates are shared, then with an explicit disclaimer (“do not use it as is”) and the strong suggestion to seek for independent professional legal advice.

It is possible that the new archive may move the resources at a much later point in time to yet another location, so the capability to manipulate PID-URL mappings should be transferred from the giving archive to the receiving archive.

In the Fedora repository, the deletion of a digital object yields a “tombstone”. The PID associated with this object then points to a tombstone notifying users that the resource has been deleted. Note that tombstones still require migration, meaning that PIDs still need to resolve to inform users that their associated digital objects have been removed.

Usually, researchers working in the same organization that also hosts the archive do not need depositing agreements.

See https://wiki.duraspace.org/display/FF/Training+-+Migrating+from+Fedora+3+to+Fedora+4 .

The ontology has, for instance, the concept ’MediaObject’ which can be described with properties such as ’encodingFormat’, ’bitrate’, and ’duration’, among many others.

Dima, E., Henrich, V., Hinrichs, E., Hinrichs, M., Hoppermann, C., Trippel, T., Zastrow, T., Zinn, C. (2012a). A Repository for the sustainable management of research data. In: Calzolari N, Choukri K, Declerck T, Doğan MU, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S (Eds) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), ELRA .

Dima, E., Hoppermann, C., Hinrichs, E., Trippel, T., Zinn, C. (2012b). A metadata editor to support the description of linguistic resources. In: Calzolari N, Choukri K, Declerck T, Doğan MU, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S (Eds) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), ELRA .

ISO 24619 (2011). Language resource management—Persistent identification and sustainable access (PISA). International Standard.

ISO 24622-1 (2015). Language resource management—Component Metadata infrastructure (CMDI)—Part 1: the component metadata model. International Standard.

Kamocki, P., Ketzan, E. (2014). Creative commons and language resources: general issues and what’s new in CC 4.0. Tech. rep., CLARIN Legal Issues Committee (CLIC), White Paper Series. see https://www.clarin-d.de/images/legal/CLIC_white_paper_1.pdf .

Lyse, G. I., Meurer, P., Smedt, K. D. (2015). Comedi: A component metadata editor. In Selected Papers from the CLARIN 2014 Conference , Linköping University Electronic Press 116 (8):82–98.

Trippel, T., Zinn, C. (2016). Enhancing the quality of metadata by using authority control. In 5th Workshop on Linked Data in Linguistic (LDL-2016) at LREC-2016 .

Wilkinson, M. D. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data ,. https://doi.org/10.1038/sdata.2016.18 .

Article   Google Scholar  

Zinn, C., Trippel, T., Kaminski, S., Dima, E. (2016). Crosswalking from CMDI to Dublin Core and MARC 21. In: Calzolari N, Choukri K, Declerck T, Goggi S, Grobelnik M, Maegaard B, Mariani J, Mazo H, Moreno A, Odijk J, Piperidis S (Eds) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), ELRA .

Web resources

[U1] The Dublin Core Metadata Initiative, see https://www.dublincore.org .

[U2] The MARC 21 standard, see https://www.loc.gov/marc/bibliographic .

[U3] The EAD standard, see https://www.loc.gov/ead/ .

[U4] The MARC to EAD crosswalk, see https://www.loc.gov/ead/ag/agappb.html#sec4 .

[U5] The Handle system, see https://www.handle.net .

[U6] The Fedora repository platform, see fedorarepository.org .

[U7] ProAI, see proai.sourceforge.net .

[U8] The OAI-PMH protocol, see https://www.openarchives.org/pmh .

[U9] Apache Lucene and Solr, see lucene.apache.org/solr .

[U10] Docuteam packer, see https://www.docuteam.ch/en/products/it-for-archives/software .

[U11] The FAIR principles, see https://www.force11.org/group/fairgroup/fairprinciples .

[U12] The Virtual International Authority File, see viaf.org .

[U13] Example of a deposit agreement (University of Reading, UK), see researchdata.reading.ac.uk/deposit_agreement.html .

[U14] Integrated Authority File (GND) at the German National Library, see https://www.dnb.de/EN/Standardisierung/GND/gnd.html .

[U15] The Library of Congress Control Number, see id.loc.gov/authorities/names.html .

[U16] The International Standard Name Identifier, see isni.org .

[U17] On micro-formats, see https://en.wikipedia.org/wiki/Microformat .

[U18] The Schema.org vocabulary, see schema.org .

Download references

Acknowledgements

This work has been supported by the German Research Foundation (DFG reference no. 88614379), and the SFB 833 data management project INF (DFG reference no. 75650358). The data centre cooperates closely with the CLARIN-D centre in Tübingen which is funded by the German Federal Ministry of Education and Research (BMBF).

Author information

Authors and affiliations.

University of Tübingen, Wilhelmstrasse 19, 72074, Tübingen, Germany

Thorsten Trippel & Claus Zinn

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Claus Zinn .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Trippel, T., Zinn, C. Lessons learned: on the challenges of migrating a research data repository from a research institution to a university library. Lang Resources & Evaluation 55 , 191–207 (2021). https://doi.org/10.1007/s10579-019-09474-4

Download citation

Published : 20 September 2019

Issue Date : March 2021

DOI : https://doi.org/10.1007/s10579-019-09474-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Research data management
  • Data repositories
  • Data migration
  • Find a journal
  • Publish with us
  • Track your research

Research Article

Database Migration on Premises to AWS RDS

  • @ARTICLE{10.4108/eai.11-4-2018.154463, author={Lakshmi Narasimhan G.}, title={Database Migration on Premises to AWS RDS}, journal={EAI Endorsed Transactions on Cloud Systems}, volume={3}, number={e11}, publisher={EAI}, journal_a={CS}, year={2018}, month={4}, keywords={Data migration; AWS RDS, cloud database, information accountability, database activity monitoring}, doi={10.4108/eai.11-4-2018.154463} }
  • Lakshmi Narasimhan G. Year: 2018 Database Migration on Premises to AWS RDS CS EAI DOI: 10.4108/eai.11-4-2018.154463
  • 1: M.E CSE II YEAR, Department of Computer Science and Engineering, Sri Ramanujar Engineering College, Kolapakkam, Chennai 600127

For the past four decades, the traditional relational databases have been in use in Information Technology industry. There was a phenomenal conversion in the IT industry in terms of commercial applications in the previous years. The applications that were running on a Single server in organizations IT infrastructure have been replaced or migrated with e-apps. Also, the dedicated storages are replaced with system storages. The model of pay per use, flexibility and lesser cost are the main reasons, which caused the distributed computing pick up into reality. Cloud databases, Simple DB, and Amazon RDS are getting to be more familiar to communities because they have brought up and highlighted the issues and problems of current social databases in terms of usability, flexibility, and provisioning. Basically now, the cloud databases are at present considered as a solution for programmers, designers, and architects since they need to store the information of their applications in an adaptable and exceptionally accessible from backend when required. These Database-as-a-Service (DBaaS) administrations are cloud-based information stockpiling administrations can be arranged into two principle classifications: benefits that backing conventional social databases (RDB) (e.g., Amazon RDS, Google SQL, Microsoft Azure), and key/quality pair information stockpiling administrations (e.g., Amazon Simple DB, Google Data Store), which are otherwise called NoSQL Databases [Harrison John Bhatti and Babak Bashari Rad 2017]. In this paper, we are going to analyze and perform one such On-Premises to AWS RDS To support, Cloud migration which helps the users on performance, cost, and scalability

Copyright © 2018 Lakshmi Narasimhan G., licensed to EAI. This is an open-access article distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/3.0/), which permits unlimited use, distribution and reproduction in any medium so long as the original work is properly cited.

  • Search Menu
  • Sign in through your institution
  • Advance articles
  • Editor's Choice
  • Focus Issue Archive
  • Open Access Articles
  • JAMIA Journal Club
  • Author Guidelines
  • Submission Site
  • Open Access
  • Call for Papers
  • About Journal of the American Medical Informatics Association
  • About the American Medical Informatics Association
  • Journals Career Network
  • Editorial Board
  • Advertising and Corporate Services
  • Self-Archiving Policy
  • Dispatch Dates
  • For Reviewers
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Article Contents

Background and significance, materials and methods, author contributions, acknowledgments.

  • < Previous

Migrating a research data warehouse to a public cloud: challenges and opportunities

ORCID logo

  • Article contents
  • Figures & tables
  • Supplementary Data

Michael G Kahn, Joyce Y Mui, Michael J Ames, Anoop K Yamsani, Nikita Pozdeyev, Nicholas Rafaels, Ian M Brooks, Migrating a research data warehouse to a public cloud: challenges and opportunities, Journal of the American Medical Informatics Association , Volume 29, Issue 4, April 2022, Pages 592–600, https://doi.org/10.1093/jamia/ocab278

  • Permissions Icon Permissions

Clinical research data warehouses (RDWs) linked to genomic pipelines and open data archives are being created to support innovative, complex data-driven discoveries. The computing and storage needs of these research environments may quickly exceed the capacity of on-premises systems. New RDWs are migrating to cloud platforms for the scalability and flexibility needed to meet these challenges. We describe our experience in migrating a multi-institutional RDW to a public cloud.

This study is descriptive. Primary materials included internal and public presentations before and after the transition, analysis documents, and actual billing records. Findings were aggregated into topical categories.

Eight categories of migration issues were identified. Unanticipated challenges included legacy system limitations; network, computing, and storage architectures that realize performance and cost benefits in the face of hyper-innovation, complex security reviews and approvals, and limited cloud consulting expertise.

Cloud architectures enable previously unavailable capabilities, but numerous pitfalls can impede realizing the full benefits of a cloud environment. Rapid changes in cloud capabilities can quickly obsolete existing architectures and associated institutional policies. Touchpoints with on-premise networks and systems can add unforeseen complexity. Governance, resource management, and cost oversight are critical to allow rapid innovation while minimizing wasted resources and unnecessary costs.

Migrating our RDW to the cloud has enabled capabilities and innovations that would not have been possible with an on-premises environment. Notwithstanding the challenges of managing cloud resources, the resulting RDW capabilities have been highly positive to our institution, research community, and partners.

Data-intensive programs in personalized medicine, learning health systems, and data-driven research have driven explosive growth in clinical research databases. Research data warehouses (RDWs) now store data from electronic health records; clinical images, videos, and physiological signals; genomic panels expanding to whole-exome/whole-genome sequences; and patient-generated data from mobile apps, home monitors, and wearable devices. 1 , 2 Many RDWs also integrate nonclinical data such as social media postings and public data sets (census, environmental, traffic, crime). 3–8 The volume of relevant electronic data and the computational requirements to perform advanced analytics using these data easily overwhelm the computing resources of large and small research organizations. Many healthcare organizations are investigating migrating research computing systems from on-premises, locally managed environments to public clouds. We describe the University of Colorado Anschutz Medical Campus’s (CU-AMC) experience implementing a large RDW combining administrative, clinical, genomic, and population-level data from 4 organizations plus commercial, governmental, and public third-party data sources into Google Cloud Platform (Google Cloud Platform). Public cloud providers offer different, rapidly evolving technologies; however, lessons learned from our implementation should apply to organizations considering transitioning to any public cloud provider.

In August 2013, CU-AMC established a new research data warehouse called Health Data Compass (HDC: www.healthdatacompass.org ) in partnership with 2 affiliated health systems and an independent faculty practice plan. After an extensive market search, in July 2014, HDC began implementing an on-premises, vendor-supported, healthcare-specific hardware and software stack that included a specialized database engine and extraction-transform-load (ETL) pipeline. Full-scale go-live commenced March 2015.

By summer 2015, despite being just over 1 year into implementation and 3 months into deployment, substantial HDC personnel time and consulting costs were being consumed reacting to unexpected system failures due to computational and storage constraints. Frequent upgrades and patches would take the entire HDC environment offline for more than 24 h. Loading data from hospital sources and running the vendor-provided master person index also overloaded on-premises resources, causing frequent delays in loading new data. In addition, HDC faced unplanned costs to implement redundant hardware and software to support system reliability and disaster recovery. Because of these challenges, HDC initiated 2 formal pilot studies using GCP from April through October 2016. Study 1 targeted high-throughput data processing; study 2 focused on computational scalability. Both pilots also provided insights into technical effort, data security, regulatory compliance, and estimated immediate and long-term costs. Based on these findings, in November 2016, the HDC executive sponsors approved reimplementing the HDC environment within GCP. The transition to a cloud-only infrastructure was completed by February 2017. In late 2020, CU-AMC’s on-premises high-performance compute cluster (HPCC), used for large-scale genomic analyses, reached end-of-life. All next-generation bioinformatics computing and storage needs were migrated to Compass GCP.

Much literature, mostly from marketing or consulting sources, highlights the benefits of cloud-based infrastructures. 9–11 Generalized guidance for on-premises to in-cloud transitions are harder to find. Given the nascent state of clinical data warehousing in the cloud in late 2016, there was minimal literature and hands-on experience with cloud implementations. Our journey as early cloud adopters provides useful insight into developing a cloud healthcare ecosystem, with emphasis on the additional requirements of a clinical research environment.

We present the key objectives that HDC articulated at the beginning of the migration from an existing on-premises RDW to a de novo cloud-based reimplementation. We share insights that may be useful to others considering a cloud-based research data warehouse. We also provide usage and cost metrics and describe examples where design decisions can significantly impact the overall costs of a cloud-based deployment.

This is a descriptive study. Primary materials date from early 2016 to early 2021, including internal and public presentations before and after the transition, analysis documents, and historical billing records. Costs were aggregated over time and by GCP service. Not included were internal personnel and external consulting costs, although staffing considerations are discussed. Findings were aggregated into topical categories such as networking and security, computation, storage, and staffing. Key performance indicators were generated in June 2021.

Figure  1 is the graphic that summarized the findings from the 2016 GCP pilot studies. Findings were separated into “met expectations,” “lower than expectations,” and “exceeded expectations.” Assessments were qualitative except for financial projections. Through the intervening 5 years, these 2016 findings have been re-confirmed. Additional findings have emerged as HDC’s size and functionality expanded from pilot to enterprise-scale.

Key findings from 2016 pilot studies comparing Google Cloud Platform with existing on-premises systems as presented to nontechnical executive sponsors. Superlative were used to emphasize particularly distinctive findings that supported the migration proposal.

Key findings from 2016 pilot studies comparing Google Cloud Platform with existing on-premises systems as presented to nontechnical executive sponsors. Superlative were used to emphasize particularly distinctive findings that supported the migration proposal.

Figure  2 (top) is the high-level description of HDC’s current capabilities as presented to nontechnical executive audiences. Figure  2 (bottom) is a technical overview that illustrates data flows, network interfaces, and key GCP technologies currently used.

Top, The executive view of Health Data Compass highlighting data inputs, outputs and key GCP technologies for nontechnical audiences. Bottom, Technical view of data flows, network boundaries, and internal GCP technologies used in the current Health Data Compass research data warehouse. Google Cloud icons labels available at https://docs.google.com/presentation/d/1aGOTpNdCoO4GXZ2es38ZFO5qPGEAjTtDSVeHaDpwsas/edit#slide=id.g5e923c6224_190_56. Abbreviations: APCD: Colorado All Payers Claims Database; CDPHE: State death registry; GCP: Google Cloud Platform; Melissa: Melissa Inc.

Top, The executive view of Health Data Compass highlighting data inputs, outputs and key GCP technologies for nontechnical audiences. Bottom, Technical view of data flows, network boundaries, and internal GCP technologies used in the current Health Data Compass research data warehouse. Google Cloud icons labels available at https://docs.google.com/presentation/d/1aGOTpNdCoO4GXZ2es38ZFO5qPGEAjTtDSVeHaDpwsas/edit#slide=id.g5e923c6224_190_56 . Abbreviations: APCD: Colorado All Payers Claims Database; CDPHE: State death registry; GCP: Google Cloud Platform; Melissa: Melissa Inc.

In the lower right corner of Figure  2 is an application created by HDC named Eureka. Eureka is a secure, scalable cloud computing and storage platform designed to enable advanced analytics on large or sensitive data sets without data leaving HDC’s secure cloud environment. Eureka instances can be created with a wide range of CPUs/TPUs, RAM memory, and persistent storage. Unlike standard cloud virtual machines (VMs), Eureka images have strict access controls designed to prevent data egress while allowing restricted access to core Internet software libraries and repositories. Extensive logging and auditing controls are embedded in the Eureka image. More details about Eureka are available at https://www.healthdatacompass.org/cloud-analytics-infrastructure .

In August 2020, HDC’s parent organization, the Colorado Center for Personalized Medicine (CCPM), established the Translational Informatics Services (TIS). TIS provides computational services for large-scale genomic data, processes genotypes and genomic data into data sets useful for research and clinical use, implements standard and one-off bioinformatics pipelines, and supports partnerships between academia and industry. TIS migrated to a fully cloud-native infrastructure using HDC’s secure environment to store large files with raw and processed genetic data, accommodate diverse file formats, and leverage on-demand HPCCs to perform genome-wide association studies (GWAS), phenome-wide association studies (PheWAS), and other analyses. TIS core cloud components and data flows are shown in Figure  3 .

Data flows and key Google Cloud Platform (GCP) technologies used by the Translational Informatics Service (TIS). Although TIS uses fewer GCP technologies, TIS deploys more “forward-facing” (App Engine GUI, R Studio), high-performance computing (Eureka HPC), and cloud storage resources than does the RDW.

Data flows and key Google Cloud Platform (GCP) technologies used by the Translational Informatics Service (TIS). Although TIS uses fewer GCP technologies, TIS deploys more “forward-facing” (App Engine GUI, R Studio), high-performance computing (Eureka HPC), and cloud storage resources than does the RDW.

Table  1 lists common RDW key performance measures illustrating the magnitude of data flows into and within HDC, current storage, data sources, and data requests/data delivery volumes. Also included are row counts from key clinical tables.

Health data compass key performance indicators as of June 30, 2021

Health data compass key metrics
Tables790
Storage/clinical16 TB
Storage/genomic55TB
Extraction-transform-load jobs2000+
Data sources6 (3 internal; 3 external)
Unique persons7.3M
Visits (all types)51M
Conditions/Diagnoses (all types)171M
Medications (ordered, administered, dispensed)240M
Measurements (laboratory test)1.3B
Observations (includes flowsheets)6.6B
Clinical notes (all types)210M
Custom data sets delivered1286
Custom data marts/registries (local, national)15
End-user applications9
Health data compass key metrics
Tables790
Storage/clinical16 TB
Storage/genomic55TB
Extraction-transform-load jobs2000+
Data sources6 (3 internal; 3 external)
Unique persons7.3M
Visits (all types)51M
Conditions/Diagnoses (all types)171M
Medications (ordered, administered, dispensed)240M
Measurements (laboratory test)1.3B
Observations (includes flowsheets)6.6B
Clinical notes (all types)210M
Custom data sets delivered1286
Custom data marts/registries (local, national)15
End-user applications9

Figure  4 (top) displays growth in HDC’s total spend across all GCP products from July 2017 to March 2021. Figure  4 (center) is a cost breakdown by GCP services, and Figure  4 (bottom) illustrates the relative spend by GCP service from January to March 2021.

Top, Growth in Google Cloud Platform (GCP) total spend across all GCP services from July 2017. Middle, Growth of GCP monthly costs by specific GCP service October 2020–March 2021. Bottom, Proportion of charges across GCP services January–March 2021.

Top, Growth in Google Cloud Platform (GCP) total spend across all GCP services from July 2017. Middle, Growth of GCP monthly costs by specific GCP service October 2020–March 2021. Bottom, Proportion of charges across GCP services January–March 2021.

It is daunting to move complex data flows, computations, and applications that support a wide range of translational research to any new environment. Differences in features and cost structures between on-premises and cloud infrastructures allow for unique opportunities and unexpected pitfalls. A simple “lift-and-shift” model that replicates on-premises hardware and software directly to cloud-hosted VMs may be most straightforward and comfortable for current teams to execute. But overlooking the cloud’s capability to instantly assign, alter, or release storage, computing, and networking resources or to provide entirely new services via simple application programming interface (API) changes can result in missed opportunities for cost savings or new innovations. Conversely, overlooking how new cloud products integrate into the existing cloud architecture or how usage charges accrue can quickly result in wasted resources, unexpected costs, unanticipated security and compliance risks, or conflicts with security controls, policies, or procedures.

Table  2 categorizes key areas of discovery during our RDW migration to GCP (details follow). Some issues are not unique to cloud-based implementations but are accentuated by the hyperdynamic technical advances in the public cloud marketplace. Other issues reflect platform limitations that existed when HDC architectural decisions were made. Given the continuous expansion of cloud capabilities, RDW teams responsible for cloud implementations must create processes and policies that anticipate a state of ongoing architectural and technical redesign while simultaneously supporting a large, complex, and heavily used operational RDW.

Categories of underappreciated challenges that emerged during migration from on-premises to cloud data warehouse

Networking/Network security
Data engineeringPerformance mismatch between source and cloud-based environments
Computation
Storage
Secure analytics
Sandboxes/Public data
Innovation/Consulting services
Costs/utilization
Networking/Network security
Data engineeringPerformance mismatch between source and cloud-based environments
Computation
Storage
Secure analytics
Sandboxes/Public data
Innovation/Consulting services
Costs/utilization

Networking and network security

Cloud tools and infrastructure listed by GCP as HIPAA compliant meet or exceed HIPAA security and privacy requirements. Institutional on-premises network security manages access control, threat detection, real-time alerting, compliance, and auditing. Existing on-premises security configurations have evolved over many years into deeply embedded infrastructure with approved policies tied to auditing and compliance procedures. HDC significantly underestimated the work necessary to translate the on-premises security environment (firewall, intrusion detection, logging, monitoring, alerting) using unfamiliar cloud-native security tools and the effort to integrate the new GCP security capabilities with the deployed institutional network design and security tools.

The security officers of our stakeholder institutions agreed upon NIST 800-53a as our target security compliance framework, a decision that significantly impacted costs, resources, and timelines. 12 As early adopters, Compass faced significant concerns about institutional risks associated with large-scale fully identified patient data in the cloud. Because of internal experience with NIST 800-53a from participation in the National Children’s Study, the decision to implement NIST 800-53a controls helped accelerate acceptance. However, NIST 800-53a is a complicated, costly compliance framework to both implement and maintain. It is not strictly required to achieve sufficient technical security for HIPAA compliance. Specific security threats, the systems and processes that address each threat, and monitoring procedures to ensure compliance with the proposed solutions are contained in a Systems Security Plan (SSP). Security guidance documents for HIPAA, NIST 800-53 and HITRUST list hundreds of mandatory or recommended system and network security threats that require explicit implemented controls and compliance oversight. 13–15 HDC’s current SSP consists of approximately 140 “moderate” NIST 800-53 controls, approved by our stakeholders security officers. Changes to the SSP require high-level institutional technical, regulatory, and legal engagement and approval. Thus, the long list of GCP HIPAA compliant products belies an enormous amount of additional work to ensure that a product is deployed in compliance within institutional security policies.

In retrospect, CU-AMC and HDC jointly significantly understaffed this activity. We allocated only 0.5 FTE across all tasks associated with creating a new SSP, policies, implementation, auditing procedures, and tools for the initial years. Our current estimate is that a combined effort of 2.5 FTEs across numerous institutional groups (HDC, network security, regulatory, compliance, legal) is a more realistic estimate in a multi-institutional environment managing highly confidential clinical and genomic data.

Another early decision was to limit network access to high-security VMs that performed critical ETL functions (ETL VMs). ETL VMs have network access only to institutional source systems (eg, hospital electronic medical records systems) and HDC-specific GCP networks. However, limited network access conflicts with a fundamental design assumption incorporated into many GCP products. These products are designed to pull the most recent version of software or containers from Google-managed repositories at the time the tool is activated—code repositories that were not accessible to the ETL VMs. Therefore, GCP tools failed with standard deployment designs. While not an ideal solution, hard-coding firewall rules to allow access to specific IP addresses was required for these tools to work.

Data engineering: source versus cloud performance mismatch

HDC’s primary clinical data sources are electronic health record systems that house data in traditional relational database systems (RDBMS). These databases are also used by operational reporting units who compete with HDC for the same resources. Resource limitation policies control access to these high-demand databases. Thus, despite HDC’s access to scalable cloud computing resources, the initial extraction and transfer into the cloud is wholly determined by the on-premises RDBMS resource allocation to HDC. HDC has devised multiple optimization strategies to enable extractions to complete within the allowed restrictions. Once within HDC’s environment, resource constraints are nonexistent.

A second performance issue was handling increased network volumes. Due to source data model limitations, full table pulls rather than incremental loads are required. For very large tables, existing routers became network bottlenecks, requiring upgrades to the network infrastructure. A redesigned network architecture moved more network functions and traffic to scalable cloud-based routers, minimizing the amount of traffic between on-premises and cloud servers. ETL redesigns using incremental data extraction based on transaction logs may greatly decrease the amount of data moving across networks.

Computation: virtual machines and managed services

Modern IT architectures use virtual machines or containers to enable allocating resources dynamically. With fixed hardware, adding new VMs is a zero-sum competition addressed either by restricting resources or purchasing more hardware. Public cloud vendors remove this resource competition, replacing the fixed upfront costs of acquiring new hardware with the variable costs of using more cloud resources.

To comply with our SSP each VM or container requires HDC to configure security settings and manage patches and upgrades to the operating system and hosted applications. An alternative to VMs are managed services, which encompasses software-as-a-service (SaaS), platform-as-a-service (PaaS), and infrastructure-as-a-service (IaaS). A managed service provides capabilities to a customer on an as-needed basis. SaaS requires the least amount of HDC management; IaaS requires the most. All current HDC design decisions prioritize SaaS over PaaS and PaaS over IaaS.

In practice, minimizing security and management overhead through higher level managed services has had mixed results. Figure  4 (bottom) shows Google BigQuery (GBQ), GCP’s SaaS large-scale database to be the highest GCP cost. By using BigQuery, HDC personnel no longer spend time fine-tuning DBMS parameters or scheduling activities around resource constraints. Similarly, technical personnel no longer spend time optimizing one-time queries. HDC does not employ a database administrator (DBA) despite its massive size. More technical services are focused on higher value use of resources. Managed services costs are offset by increased programmer productivity and more end-user services.

Managed services can also scale according to needs. For example, the TIS team generated genome-wide association studies (GWAS) summary statistics for more than 1000 phenotypes to support phenome-wide exploration of genetic associations (PheWAS). Fifty-four billion summary statistics for genetic variant/phenotype associations were stored in GBQ. It was possible to establish this 5.3TB repository without competing with other HDC resources.

The dynamic HPCC hosted within HDC’s HIPAA-compliant cloud, called Eureka HPC, enables genomic analytics performed by TIS to be used with fully identified biobank and clinical phenotype data. Eureka HPCC uses inexpensive preemptible VMs, which are standard VMs but with the caveat that Google can deallocate with a few minutes notice. This extremely cost-effective model is now used to run 2 production GWAS pipelines that utilize between 4 and 60 CPUs. Eureka HPCC allows large one-time jobs to be executed using the same HPCC infrastructure. For example, TIS deployed a large Eureka HPCC to compute 1260 GWAS analyses for ∼34 000 genotyped Biobank participants. This ephemeral HPCC cost $8730 or $6.93 per phenotype.

Despite a strong preference for using SaaS or PaaS managed services over IaaS virtual machines, Figure  4 shows that GCP VMs (Compute Engines) are HDC’s second largest cost. HDC hosts approximately 300 VMs. The majority of VM images are Eureka analytic engines which are created and terminated as-needed by end-users. Additionally, most data engineering development projects require 3 environments—development, test, and production—which multiplies the number of VMs. Other VMs host applications that only run on dedicated VMs such as OHDSI ATLAS, 16 sandbox projects (described below), and the reluctance of some GCP SaaS vendors to sign Business Associate Agreements (BAA), forcing HDC to host the application.

Access to essentially limitless storage eliminates a zero-sum competition for disk space. Cloud storage can use multiple geographical regions to ensure “5-9s” (99.999%) availability, a performance level that would be cost-prohibitive for a single institution. Backups are automatic with multiregion designs.

Because storage is inexpensive, HDC tends to keep everything. However limitless storage has downsides. An infrastructure with hundreds of users often results in duplication of the same or very similar data with little ability to reconstruct the chain of transformations. Archives and refreshes of these duplications can accumulate significant storage costs. It is difficult to know which data sets are in active use versus which can be archived or deleted. Even users have trouble keeping abreast their various data resources.

Since launching TIS, storage of large files with raw and processed genetic data in multiple file formats has highlighted the need to implement tiered storage to reduce costs. However, defining a tiered storage strategy that maximizes data availability while minimizing storage costs has been more challenging than envisioned. Data stored in high-latency tiers can be very cost-effective. However, the cost of migrating data from high-latency tiers back into online storage is expensive. Moving data between storage tiers only once or twice can obliterate the original cost savings. Uncertainty about data reuse has caused HDC to be cautious about using high-latency storage. Given the tendency of investigators to reanalyze old data with new hypotheses or tools, the amount of data deemed truly safe to put into cold storage has been surprisingly small.

Secure analytics

Eureka is an analytics platform based on a high-security version of CentOS (Linux) designed to enable advanced analytics on large, PHI-containing data sets within HDC’s HIPAA compliant environment. Eureka enables users to scale both CPU and storage capacity to meet their analytic needs. Eureka costs are charged to the user based on cloud resources consumed. End-users control costs by “right-sizing” resources and turning off Eureka instances when not in use. Eureka instances can be deleted when no longer needed. Currently at Version 3.0, HDC has deployed approximately 100 Eureka instances.

Due to network security concerns, Eureka Version 1.0 blocked direct internet access. Eureka users were unable to pull directly from software repositories, like GitHub, to assemble packages or to update software. To offer more flexibility, Eureka 3.0 contains a growing safelist of public web resources to which a user can request time-limited access (60 minutes). The safe-list includes 9 major repository sites for data science, such as CRAN, Anaconda, PyPI, GitHub, and Bioconductor. Eureka continues to grow and evolve in response to user feedback.

Sandboxes/public data sets

To enable broader access to GCP resources, HDC established lower security sandboxes where de-identified or synthetic data, such as Synthea 17 and MIMIC 18 can be made available and accessed directly along with the tools and capabilities of GCP. Sandboxes are used to “kick the tires” of new tools or standard VM- or container-based applications, such as OHDSI ATLAS 16 and the University of Washington Leaf, 19 to explore functionality and determine the value and effort required to integrate into a more secure environment. Cloud-based sandboxes do not compete with computational or storage resources used by existing projects.

However, ensuring users do not misuse sandbox environments by uploading sensitive data obtained outside HDC oversight is a challenge. Newer tools, such as GCP’s Data Loss Prevention, which scans data sets for sensitive information, may detect sensitive data in sandbox databases. In addition, there are no automated tools to determine when a sandbox is no longer needed other than examining when it was last assessed.

All public cloud vendors make a wide range of public and commercial data sets available for querying on their platforms. Google Marketplace currently lists 216 data sets available in BigQuery ( https://console.cloud.google.com/marketplace/browse?filter=solution-type:dataset&pli = 1 ), including 43 data sets labeled as healthcare specific. The richness of readily available data resources has been a double-edged sword. Given scalable resources and easy availability, accessing these resources is trivial within the HDC platform. Our current challenges are understanding the strengths and weaknesses of each data source, what types of problems are best addressed by each resource, and how to query the data tables which limits our ability to leverage these resources. Thus, zero or minimal access costs have not translated into high or novel utilization.

Innovation/consulting services

The speed and magnitude of new functionality in the cloud marketplace is daunting for HDC cloud architects to evaluate the utility of new offerings. In today’s cloud ecosystems, implementations are almost instantaneously legacy designs. The hyper-innovation of the cloud enables previously unattainable capabilities to become available simply via a new set of APIs. Determining what offering is a distraction versus a transformative opportunity takes time and carefully planned tests within sandboxes that replicate the existing architecture for head-to-head comparisons.

Once a new technology is deemed sufficiently promising to incorporate into production pipelines, extensive institutional review processes to comply with HDC’s SSP must be completed, including creating design documentation, risk analyses and assessments, and updating the SSP system boundaries. This requirement is not different from approval processes required for implementing new on-premises systems. However, the substantial personnel time across multiple institutional entities before production implementation extends the time between a new innovation and its availability in HDC. In the meantime, new product releases continue to occur, resulting in an sense of always falling behind to rapid-fire innovations.

Similarly, the rapid evolution of cloud-based technologies makes it difficult for internal architects and external consultants to obtain the deep experience with the leading-edge tools and technologies to leverage new capabilities. Overall, our experiences with consultants have been disappointing who tend to bring previous experiences with “out-of-the-box” designs. Few have experienced healthcare settings; none have implemented complex GCP-based solutions in a large-scale clinical research environment. The anticipated efficiencies of outside cloud expertise have been negated by prolonged knowledge transfer—the time and effort local resources consume educating consulting personnel on the nuances of our environment. When we have skipped extensive technical on-boarding by GCP technical members, initial implementations have not worked. HDC has learned how to better engage with external experts to ensure that local architectural features are highlighted from the beginning.

Costs/utilization

Cloud-based resources are usually charged on a pay-as-you-use basis. Services can be turned on as needed, but also can be inadvertently kept active when not in use. Many services use different metrics to determine usage charges. BigQuery charges are based on data rows queried; Google Cloud Storage charges are based on size, access tier, and regions; Google Compute Engines are based on availability (pre-emptible), CPU, permanent storage needs, operating system, and uptime. Other services charge per-API call, per licensed user, or as a percent of other system charges. As the number of cloud services used by HDC has grown, aggregating and summarizing cloud charges has required more internal financial resources than anticipated. However, without careful oversight, unnecessary consumption-based costs can grow insidiously. For example, HDC did a comprehensive inventory of unused BigQuery data sets, virtual machines, and cloud storage. The resulting purge reduced monthly charges by approximately 20%.

New cloud capabilities also open new cost savings strategies as long as these opportunities are recognized, incorporated into daily practice, and displace more expensive practices. For example, because of the lack of a data orchestration tool, HDC’s ETL pipelines were initially constructed using a large Windows-based VM. The per-minute charge for this dedicated VM was high and it was used continuously for 3–4 days. A recent redesign uses a new data orchestration managed service that dynamically instantiates an array of inexpensive pre-emptible virtual machines which terminate in hours. The cost difference between the 2 ETL designs is significant, but cost savings were only realized after the new design was analyzed, approved, and implemented and the old design was retired.

RDWs have become mission-critical strategic assets for advancing data-driven discoveries and next-generation clinical care. Given the explosive size and diversity of data in RDWs and the complexity of the data science now being applied to these data, traditional architectural designs are being displaced by cloud-based solutions. But the migration from traditional on-premises hardware and software is not as simple as moving the same tools and processes into a cloud-based environment. Public cloud vendors offer a tremendous array of new capabilities and access to resources on an as-needed basis, enabling innovation at scales and speeds not previously possible. At the same time, leveraging and managing this dynamic environment raises unique issues or accentuates similar issues seen in traditional settings.

HDC made an early decision to move to a fully-cloud RDW. At that time, it was the first significant foray into patient data management on a public cloud for HDC’s participating institutions. It also was the first enterprise-scale health data warehouse on GCP. HDC has never regretted this decision.

This work was supported by the National Center for Advancing Translational Sciences (NCATS) grant number UL1 TR002535 to the Colorado Clinical and Translational Sciences Institute. Contents are the authors’ sole responsibility and do not necessarily represent official NIH views. Funds also provided by UCHealth, Childrens Hospital Colorado, and the University of Colorado.

MGK, MJA, and NP developed the initial drafts of the manuscript. JYM, AKY, and NR provided detailed content related to governance, architecture, and infrastructure, respectively. All authors reviewed and approved the submitted manuscript and have agreed to be accountable for its contents.

Health Data Compass is grateful to its 4 data partners, the University of Colorado School of Medicine, UC Health, Children’s Hospital Colorado, and CU Medicine, for entrusting Health Data Compass as a data steward of fully identified patient clinical and genomic data. We also thank our partners at Google Cloud for Higher Education and Research for their unfailing support and expert guidance.

CONFLICT OF INTEREST STATEMENT

MJA currently is a full-time employee of Sada. Sada provides consulting services for Google Cloud Platform. MJA was a full-time employee of the University of Colorado during the development of Health Data Compass. The remaining authors have no competing interests to declare.

DATA AVAILABILITY

The financial data underlying Figure  4 cannot be shared publicly because these data are considered proprietary commercial information.

Kohane IS. Ten things we have to do to achieve precision medicine . Science 2015 ; 349 ( 6243 ): 37 – 8 .

Google Scholar

Campion TR , Craven CK , Dorr DA , Knosp BM. Understanding enterprise data warehouses to support clinical and translational research . J Am Med Inform Assoc 2020 ; 27 ( 9 ): 1352 – 8 .

National Research Council. Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease [Internet] . Washington, DC : National Academies Press ; 2011 . https://doi.org/10.17226/13284 . Accessed December 11, 2021.

Google Preview

Choi IY , Kim T-M , Kim MS , Mun SK , Chung Y-J. Perspectives on clinical informatics: integrating large-scale clinical, genomic, and health information for clinical care . Genomics Inform 2013 ; 11 ( 4 ): 186 – 90 .

Wade TD. Traits and types of health data repositories . Health Inf Sci Syst 2014 ; 2 ( 1 ): 4 .

Weber GM , Mandl KD , Kohane IS. Finding the missing link for big biomedical data . JAMA 2014 ; 311 ( 24 ): 2479 – 80 .

Cantor MN , Chandras R , Pulgarin C. FACETS: using open data to measure community social determinants of health . J Am Med Inform Assoc 2018 ; 25 ( 4 ): 419 – 22 .

Chen M , Tan X , Padman R. Social determinants of health in electronic health records and their impact on analysis and risk prediction: a systematic review . J Am Med Inform Assoc 2020 ; 27 ( 11 ): 1764 – 73 .

Top Cloud Trends for 2021 and Beyond | Accenture [Internet]. WordPressBlog. https://www.accenture.com/nl-en/blogs/insights/cloud-trends. Accessed December 11, 2021.

Afgan E , Baker D , Coraor N , Chapman B , Nekrutenko A , Taylor J. Galaxy CloudMan: delivering cloud compute clusters . BMC Bioinform 2010 ; 11 ( S12 ): S4 .

Cloud Computing Market Size, Share & Growth [2021–2028] [Internet]. https://www.fortunebusinessinsights.com/cloud-computing-market-102697 . Accessed December 11, 2021 .

NIST. NIST SP 800-53A [Internet]. NIST Privacy Framework; 2020 . https://www.nist.gov/privacy-framework/nist-sp-800-53a . Accessed December 11, 2021.

Office for Civil Rights (OCR). HIPAA Security Rule Guidance Material [Internet]. HHS.gov ; 2009 . https://www.hhs.gov/hipaa/for-professionals/security/guidance/index.html . Accessed December 11, 2021.

Rafaels R. Guide to Understanding Security Controls: NIST SP 800-53 Rev 5 [Internet]. Independently published by Amazon Digital Services; 2019 . https://www.amazon.com/Guide-Understanding-Security-Controls-800-53/dp/1094901040. Accessed December 11, 2012.

HITRUST Alliance: General Documents Archives [Internet]. HITRUST Alliance. https://hitrustalliance.net/download-center/general-documents/ . Accessed December 11, 2021 .

Observational Health Data Science and Informatics. ATLAS – A unified interface for the OHDSI tools – OHDSI [Internet]. https://www.ohdsi.org/atlas-a-unified-interface-for-the-ohdsi-tools/ . Accessed December 11, 2021 .

Walonoski J , Kramer M , Nichols J , et al.  Synthea: an approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record . J Am Med Inform Assoc JAMIA 2018 ; 25 ( 3 ): 230 – 8 .

Johnson AE , Stone DJ , Celi LA , Pollard TJ. The MIMIC Code Repository: enabling reproducibility in critical care research . J Am Med Inform Assoc 2018 ; 25 ( 1 ): 32 – 9 .

Dobbins NJ , Spital CH , Black RA , et al.  Leaf: an open-source, model-agnostic, data-driven web application for cohort discovery and translational biomedical research . J Am Med Inform Assoc JAMIA 2020 ; 27 ( 1 ): 109 – 18 .

Month: Total Views:
December 2021 502
January 2022 471
February 2022 239
March 2022 514
April 2022 382
May 2022 247
June 2022 222
July 2022 132
August 2022 134
September 2022 174
October 2022 256
November 2022 251
December 2022 149
January 2023 210
February 2023 154
March 2023 223
April 2023 218
May 2023 190
June 2023 152
July 2023 216
August 2023 183
September 2023 196
October 2023 230
November 2023 198
December 2023 256
January 2024 202
February 2024 131
March 2024 165
April 2024 218
May 2024 198
June 2024 146
July 2024 128
August 2024 141
September 2024 158

Email alerts

Citing articles via.

  • Recommend to your Library

Affiliations

  • Online ISSN 1527-974X
  • Copyright © 2024 American Medical Informatics Association
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

  • Español – América Latina
  • Português – Brasil
  • Cloud Architecture Center

Database migration: Concepts and principles (Part 1)

This document introduces concepts, principles, terminology, and architecture of near-zero downtime database migration for cloud architects who are migrating databases to Google Cloud from on-premises or other cloud environments.

This document is part 1 of two parts. Part 2 discusses setting up and executing the migration process, including failure scenarios.

Database migration is the process of migrating data from one or more source databases to one or more target databases by using a database migration service. When a migration is finished, the dataset in the source databases resides fully, though possibly restructured, in the target databases. Clients that accessed the source databases are then switched over to the target databases, and the source databases are turned down.

The following diagram illustrates this database migration process.

Flow of data from source to target databases through the migration service.

This document describes database migration from an architectural standpoint:

  • The services and technologies involved in database migration.
  • The differences between homogeneous and heterogeneous database migration.
  • The tradeoffs and selection of a migration downtime tolerance.
  • A setup architecture that supports a fallback if unforeseen errors occur during a migration.

This document does not describe how you set up a particular database migration technology. Rather, it introduces database migration in fundamental, conceptual, and principle terms.

Architecture

The following diagram shows a generic database migration architecture.

Architecture of migration service accessing source and target databases.

A database migration service runs within Google Cloud and accesses both source and target databases. Two variants are represented: (a) shows the migration from a source database in an on-premises data center or a remote cloud to a managed database like Spanner; (b) shows a migration to a database on Compute Engine.

Even though the target databases are different in type (managed and unmanaged) and setup, the database migration architecture and configuration is the same for both cases.

Terminology

The most important data migration terms for these documents are defined as follows:

source database: A database that contains data to be migrated to one or more target databases.

target database: A database that receives data migrated from one or more source databases.

database migration: A migration of data from source databases to target databases with the goal of turning down the source database systems after the migration completes. The entire dataset, or a subset, is migrated.

homogeneous migration: A migration from source databases to target databases where the source and target databases are of the same database management system from the same provider.

heterogeneous migration: A migration from source databases to target databases where the source and target databases are of different database management systems from different providers.

database migration system: A software system or service that connects to source databases and target databases and performs data migrations from source to target databases.

data migration process: A configured or implemented process executed by the data migration system to transfer data from source to target databases, possibly transforming the data during the transfer.

database replication: A continuous transfer of data from source databases to target databases without the goal of turning down the source databases. Database replication (sometimes called database streaming ) is an ongoing process.

Classification of database migrations

There are different types of database migrations that belong to different classes. This section describes the criteria that defines those classes.

Replication versus migration

In a database migration , you move data from source databases to target databases. After the data is completely migrated, you delete source databases and redirect client access to the target databases. Sometimes you keep the source databases as a fallback measure if you encounter unforeseen issues with the target databases. However, after the target databases are reliably operating, you eventually delete the source databases.

With database replication , in contrast, you continuously transfer data from the source databases to the target databases without deleting the source databases. Sometimes database replication is referred to as database streaming. While there is a defined starting time, there is typically no defined completion time. The replication might be stopped or become a migration.

This document discusses only database migration.

Partial versus complete migration

Database migration is understood to be a complete and consistent transfer of data. You define the initial dataset to be transferred as either a complete database or a partial database (a subset of the data in a database) plus every change committed on the source database system thereafter.

Heterogeneous migration versus homogeneous migration

A homogeneous database migration is a migration between the source and target databases of the same database technology, for example, migrating from a MySQL database to a MySQL database, or from an Oracle® database to an Oracle database. Homogeneous migrations also include migrations between a self-hosted database system such as PostgreSQL to a managed version of it such as Cloud SQL for PostgreSQL or AlloyDB for PostgreSQL.

In a homogenous database migration, the schemas for the source and target databases are likely identical. If the schemas are different, the data from the source databases must be transformed during migration.

Heterogeneous database migration is a migration between source and target databases of different database technologies, for example, from an Oracle database to Spanner. Heterogeneous database migration can be between the same data models (for example, from relational to relational), or between different data models (for example, from relational to key-value).

Migrating between different database technologies doesn't necessarily involve different data models. For example, Oracle, MySQL, PostgreSQL, and Spanner all support the relational data model. However, multi-model databases like Oracle, MySQL, or PostgreSQL support different data models. Data stored as JSON documents in a multi-model database can be migrated to MongoDB with little or no transformation necessary, as the data model is the same in the source and the target database.

Although the distinction between homogeneous and heterogeneous migration is based on database technologies, an alternative categorization is based on database models involved. For example, a migration from an Oracle database to Spanner is homogeneous when both use the relational data model; a migration is heterogeneous if, for example, data stored as JSON objects in Oracle is migrated to a relational model in Spanner.

Categorizing migrations by data model more accurately expresses the complexity and effort required to migrate the data than basing the categorization on the database system involved. However, because the commonly used categorization in the industry is based on the database systems involved, the remaining sections are based on that distinction.

Migration downtime: zero versus minimal versus significant

After you successfully migrate a dataset from the source to the target database, you then switch client access over to the target database and delete the source database.

Switching clients from the source databases to the target databases involves several processes:

  • To continue processing, clients must close existing connections to the source databases and create new connections to the target databases. Ideally, closing connections is graceful, meaning that you don't unnecessarily roll back ongoing transactions.
  • After closing connections on the source databases, you must migrate remaining changes from the source databases to the target databases (called draining ) to ensure that all changes are captured.
  • You might need to test target databases to ensure that these databases are functional and that clients are functional and operate within their defined service level objectives (SLOs).

In a migration, achieving truly zero downtime for clients is impossible; there are times when clients cannot process requests. However, you can minimize the duration that clients are unable to process requests in several ways (near-zero downtime):

  • You can start your test clients in read-only mode against the target databases long before you switch the clients over. With this approach, testing is concurrent with the migration.
  • You can configure the amount of data being migrated (that is, in flight between the source and target databases) to be as small as possible when the switch over period approaches. This step reduces the time for draining because there are fewer differences between the source databases and the target databases.
  • If new clients operating on the target databases can be started concurrently with existing clients operating on the source databases, you can shorten the switch over time because the new clients are ready to execute as soon as all data is drained.

While it's unrealistic to achieve zero downtime during a switch over, you can minimize the downtime by starting activities concurrently with the ongoing data migration when possible.

In some database migration scenarios, significant downtime is acceptable. Typically, this allowance is a result of business requirements. In such cases, you can simplify your approach. For example, with a homogeneous database migration, you might not require data modification; export and import or backup and restore are perfect approaches. With heterogeneous migrations, the database migration system does not have to deal with updates of source database systems during the migration.

However, you need to establish that the acceptable downtime is long enough for the database migration and follow-up testing to occur. If this downtime cannot be clearly established or is unacceptably long, you need to plan a migration that involves minimal downtime.

Database migration cardinality

In many situations database migration takes place between a single source database and a single target database. In such situations, the cardinality is 1:1 ( direct mapping ). That is, a source database is migrated without changes to a target database.

A direct mapping, however, is not the only possibility. Other cardinalities include the following:

  • Consolidation ( n :1). In a consolidation , you migrate data from several source databases to a smaller number of target databases (or even one target). You might use this approach to simplify database management or employ a target database that can scale.
  • Distribution (1: n ). In a distribution , you migrate data from one source database to several target databases. For example, you might use this approach when you need to migrate a large centralized database containing regional data to several regional target databases.
  • Re-distribution ( n : m ). In a re-distribution , you migrate data from several source databases to several target databases. You might use this approach when you have sharded source databases with shards of very different sizes. The re-distribution evenly distributes the sharded data over several target databases that represent the shards.

Database migration provides an opportunity to redesign and implement your database architecture in addition to merely migrating data.

Migration consistency

The expectation is that a database migration is consistent. In the context of migration, consistent means the following:

  • Complete. All data that is specified to be migrated is actually migrated. The specified data could be all data in a source database or a subset of the data.
  • Duplicate free. Each piece of data is migrated once, and only once. No duplicate data is introduced into the target database.
  • Ordered. The data changes in the source database are applied to the target database in the same order as the changes occurred in the source database. This aspect is essential to ensure data consistency.

An alternative way to describe migration consistency is that after a migration completes, the data state between the source and the target databases is equivalent. For example, in a homogenous migration that involves the direct mapping of a relational database, the same tables and rows must exist in the source and the target databases.

This alternative way of describing migration consistency is important because not all data migrations are based on sequentially applying transactions in the source database to the target database. For example, you might back up the source database and use the backup to restore the source database content into the target database (when significant downtime is possible).

Active-passive versus active-active migration

An important distinction is whether the source and target databases are both open to modifying query processing. In an active-passive database migration, the source databases can be modified during the migration, while the target databases allow only read-only access.

An active-active migration supports clients writing into both the source as well as the target databases during the migration. In this type of migration, conflicts can occur. For example, if the same data item in the source and target database is modified so as to conflict with each other semantically, you might need to run conflict resolution rules to resolve the conflict.

In an active-active migration, you must be able to resolve all data conflicts by using conflict resolution rules. If you cannot, you might experience data inconsistency.

Database migration architecture

A database migration architecture describes the various components required for executing a database migration. This section introduces a generic deployment architecture and treats the database migration system as a separate component. It also discusses the features of a database management system that support data migration as well as non-functional properties that are important for many use cases.

Deployment architecture

A database migration can occur between source and target databases located in any environment, like on-premises or different clouds. Each source and target database can be in a different environment; it is not necessary that all are collocated in the same environment.

The following diagram shows an example of a deployment architecture involving several environments.

Migration architecture involving cloud and on-premises data centers.

DB1 and DB2 are two source databases, and DB3 and Spanner are the target databases. Two clouds and two on-premises data centers are involved in this database migration. The arrows represent the invocation relationships: the database migration service invokes interfaces of all source and target databases.

A special case not discussed here is the migration of data from a database into the same database. This special case uses the database migration system for data transformation only, not for migrating data between different systems across different environments.

Fundamentally, there are three approaches to database migration, which this section discusses:

  • Using a database migration system
  • Using database management system replication functionality
  • Using custom database migration functionality

Database migration system

The database migration system is at the core of database migration. The system executes the actual data extraction from the source databases, transports the data to the target databases, and optionally modifies the data during transit. This section discusses the basic database migration system functionality in general. Examples of database migration systems include Database Migration Service , Striim , Debezium , tcVision and Cloud Data Fusion .

Data migration process

The core technical building block of a database migration system is the data migration process. The data migration process is specified by a developer and defines the source databases from which data is extracted, the target databases into which data is migrated, and any data modification logic applied to the data during the migration.

You can specify one or more data migration processes and execute them sequentially or concurrently depending on the needs of the migration. For example, if you migrate independent databases, the corresponding data migration processes can run in parallel.

Data extraction and insertion

You can detect changes (insertions, updates, deletions) in a database system in two ways: database-supported change data capture (CDC) based on a transaction log, and differential querying of data itself using the query interface of a database management system.

CDC based on a transaction log

Database-supported CDC is based on database management features that are separate from the query interface. One approach is based on transaction logs (for example the binary log in MySQL ). A transaction log contains the changes made to data in the correct order. The transaction log is continuously read, and so every change can be observed. For database migration, this logging is extremely useful, as CDC ensures that each change is visible and is subsequently migrated to the target database without loss and in the correct order.

CDC is the preferred approach for capturing changes in a database management system. CDC is built into the database itself and has the least load impact on the system.

Differential querying

If no database management system feature exists that supports observing all changes in the correct order, you can use differential querying as an alternative. In this approach, each data item in a database gets an additional attribute that contains a timestamp or a sequence number. Every time the data item is changed, the change timestamp is added or the sequence number is increased. A polling algorithm reads all data items since the last time it executed or since the last sequence number it used. Once the polling algorithm determines the changes, it records the current time or sequence number into its internal state and then passes on the changes to the target database.

While this approach works without problems for inserts and updates, you need to carefully design deletes because a delete removes a data item from the database. After the data is deleted, it is impossible for the poller to detect that a deletion occurred. You implement a deletion by using an additional status field (a logical delete flag) that indicates the data is deleted. Alternatively, deleted data items can be collected into one or more tables, and the poller accesses those tables to determine if deletion occurred.

For variants on differential querying, see Change data capture .

Differential querying is the least preferred approach because it involves schema and functionality changes. Querying the database also adds a query load that does not relate to executing client logic.

Adapter and agent

The database migration system requires access to the source and to the database systems. Adapters are the abstraction that encapsulates the access functionality. In the simplest form, an adapter can be a JDBC driver for inserting data into a target database that supports JDBC. In a more complex case, an adapter is running in the environment of the target (sometimes called agent ), accessing a built-in database interface like log files. In an even more complex case an adapter or agent interfaces with yet another software system, which in turn accesses the database. For example, an agent accesses Oracle GoldenGate, and that in turn accesses an Oracle database.

The adapter or agent that accesses a source database implements the CDC interface or the differential querying interface, depending on the design of the database system. In both cases, the adapter or agent provides changes to the database migration system, and the database migration system is unaware if the changes were captured by CDC or differential querying.

Data modification

In some use cases, data is migrated from source databases to target databases unmodified. These straight-through migrations are typically homogeneous.

Many use cases, however, require data to be modified during the migration process. Typically, modification is required when there are differences in schema, differences in data values, or opportunities to clean up data while it is in transition.

The following sections discuss several types of modifications that can be required in a data migration—data transformation, data enrichment or correlation, and data reduction or filtering.

Data transformation

Data transformation transforms some or all data values from the source database. Some examples include the following:

  • Data type transformation. Sometimes data types between the source and target databases are not equivalent. In these cases, data type transformation casts the source value into the target value based on type transformation rules. For example, a timestamp type from the source might be transformed into a string in the target.
  • Data structure transformation. Data structure transformation modifies the structure in the same database model or between different database models. For example, in a relational system, one source table might be split into two target tables, or several source tables might be denormalized into one target table by using a join. A 1: n relationship in the source database might be transformed into a parent and child relationship in Spanner. Documents from a source document database system might be decomposed into a set of relational rows in a target system.
  • Data value transformation. Data value transformation is separate from data type transformation. Data value transformation changes the value without changing the data type. For example, a local time zone is converted to Coordinated Universal Time (UTC). Or a short zip code (five digits) represented as a string is converted to a long zip code (five digits followed by a dash followed by 4 digits, also known as ZIP+4 ).

Data enrichment and correlation

Data transformation is applied on the existing data without reference to additional, related reference data. With data enrichment , additional data is queried to enrich source data before it's stored in the target database.

  • Data correlation. It is possible to correlate source data. For example, you can combine data from two tables in two source databases. In one target database, for instance, you might relate a customer to all open, fulfilled, and canceled orders whereby the customer data and the order data originate from two different source databases.
  • Data enrichment. Data enrichment adds reference data. For example, you might enrich records that only contain a zip code by adding the city name corresponding to the zip code. A reference table containing zip codes and the corresponding city names is a static dataset accessed for this use case. Reference data can be dynamic as well. For example, you might use a list of all known customers as reference data.

Data reduction and filtering

Another type of data transformation is reducing or filtering the source data before migrating it to a target database.

  • Data reduction. Data reduction removes attributes from a data item. For example, if a zip code is present in a data item, the corresponding city name might not be required and is dropped, because it can be recalculated or because it is not needed anymore. Sometimes this information is kept for historical reasons to record the name of the city as entered by the user, even if the city name changes in time.
  • Data filtering. Data filtering removes a data item altogether. For example, all canceled orders might be removed and not transferred to the target database.

Data combination or recombination

If data is migrated from different source databases to different target databases, it can be necessary to combine data differently between source and target databases.

Suppose that customers and orders are stored in two different source databases. One source database contains all orders, and a second source database contains all customers. After migration, customers and their orders are stored in a 1: n relationship within a single target database schema—not in a single target database, however, but several target databases where each contains a partition of the data. Each target database represents a region and contains all customers and their orders located in that region.

Target database addressing

Unless there is only one target database, each data item that is migrated needs to be sent to the correct target database. A couple of approaches to addressing the target database include the following:

  • Schema-based addressing. Schema-based addressing determines the target database based on the schema. For example, all data items of a customer collection or all rows of a customer table are migrated to the same target database storing customer information, even though this information was distributed in several source databases.
  • Content-based routing. Content-based routing (using a content-based router , for example) determines the target database based on data values. For example, all customers located in the Latin America region are migrated to a specific target database that represents that region.

You can use both types of addressing at the same time in a database migration. Regardless of the addressing type used, the target database must have the correct schema in place so that data items are stored.

Persistence of in-transit data

Database migration systems, or the environments on which they run, can fail during a migration, and in-transit data can be lost. When failures occur, you need to restart the database migration system and ensure that the data stored in the source database is consistently and completely migrated to the target databases.

As part of the recovery, the database migration system needs to identify the last successfully migrated data item to determine where to begin extracting from the source databases. To resume at the point of failure, the system needs to keep an internal state on the migration progress.

You can maintain state in several ways:

  • You can store all extracted data items within the database migration system before any database modification, and then remove the data item once its modified version is successfully stored in the target database. This approach ensures that the database migration system can exactly determine what is extracted and stored.
  • You can maintain a list of references to the data items in transit. One possibility is to store the primary keys or other unique identifiers of each data item together with a status attribute. After a failure, this state is the basis for recovering the system consistently.
  • You can query the source and target databases after a failure to determine the difference between the source and target database systems. The next data item to be extracted is determined based on the difference.

Other approaches to maintaining state can depend on the specific source databases. For example, a database migration system can keep track of which transaction log entries are fetched from the source database and which are inserted into the target database. If a failure occurs, the migration can be restarted from the last successful inserted entry.

Persistence of in-transit data is also important for other reasons than errors or failures. For example, it might not be possible to query data from the source database to determine its state. If, for instance, the source database contained a queue, the messages in that queue might have been removed at some point.

Yet another use case for persistence of in-transit data is large window processing of the data. During data modification, data items can be transformed independently of each other. However, sometimes the data modification depends on several data items (for example, numbering the data items processed per day, starting at zero every day).

A final use case for persistence of in-transit data is to provide repeatability of the data during data modification when the database system cannot access the source databases again. For example, you might need to re-execute the data modifications with different modification rules and then verify and compare the results with the initial data modifications. This approach might be necessary if you need to track any inconsistencies in the target database because of an incorrect data modification.

Completeness and consistency verification

You need to verify that your database migration is complete and consistent. This check ensures that each data item is migrated only once, and that the datasets in the source and target databases are identical and that the migration is complete.

Depending on the data modification rules, it is possible that a data item is extracted but not inserted into a target database. For this reason, directly comparing the source and target databases is not a solid approach for verifying completeness and consistency. However, if the database migration system tracks the items that are filtered out, you can then compare the source and target databases along with the filtered items.

Replication feature of the database management system

A special use case in a homogeneous migration is where the target database is a copy of the source database. Specifically, the schemas in the source and target databases are the same, the data values are the same, and each source database is a direct mapping (1:1) to a target database.

In this case, you can use the built-in replication feature that comes with most database management systems to replicate one database to another.

There are two types of data replication: logical and physical.

Logical replication: In the case of logical replication, changes in database objects are transferred based on their replication identifiers (usually primary keys). The advantages of logical replication are that it is flexible, granular, and you can customize it. In some cases, logical replication lets you replicate changes between different database engine versions. Many database engines support logical replication filters, where you can define the set of data to be replicated. The main disadvantages are that logical replication might introduce some performance overhead and the latency of this replication method is usually higher than that of physical replication.

Physical replication: In contrast, physical replication works on the disk block level and offers better performance with lower replication latency. For large datasets, physical replication can be more straightforward and efficient, especially in the case of non-relational data structures. However, it is not customizable and depends highly on the database engine version.

Examples are MySQL replication , PostgreSQL replication (see also pglogical ), or Microsoft SQL Server replication .

However, if data modification is required, or you have any cardinality other than a direct mapping, a database migration system's capabilities are needed to address such a use case.

Custom database migration functionality

Some reasons for building database migration functionality instead of using a database migration system or database management system include the following:

  • You need full control over every detail.
  • You want to reuse the database migration capabilities.
  • You want to reduce costs or simplify your technological footprint.

Building blocks for building migration functionality include the following:

  • Export and import: If downtime is not a factor, you can use database export and database import to migrate data in homogenous database migrations. Export and import, however, requires that you quiesce the source database to prevent updates before you export the data. Otherwise, changes might not be captured in the export, and the target database won't be an exact copy of the source database.
  • Backup and restore: Like in the case of export and import, backup and restore incurs downtime because you need to quiesce the source database so that the backup contains all data and the latest changes. The downtime continues until the restore is completed successfully on the target database.
  • Differential querying: If changing the database schema is an option, you can extend the schema so that database changes can be queried at the query interface. An additional timestamp attribute is added, indicating the time of the last change. An additional delete flag can be added, indicating if the data item is deleted or not (logical delete). With these two changes, a poller executing in a regular interval can query all changes since its last execution. The changes are applied to the target database. Additional approaches are discussed in Change data capture .

These are only a few of the possible options to build a custom database migration. Although a custom solution provides the most flexibility and control over implementation, it also requires constant maintenance to address bugs, scalability limitations, and other issues that might arise during a database migration.

Additional considerations of database migration

The following sections briefly discuss non-functional aspects that are important in the context of database migration. These aspects include error handling, scalability, high availability, and disaster recovery.

Error handling

Failures during database migration must not cause data loss or the processing of database changes out of order. Data integrity must be preserved regardless of what caused the failure (such as a bug in the system, a network interruption, a VM crash, or a zone failure).

A data loss occurs when a migration system retrieves the data from the source databases and does not store it in the target databases because of some error. When data is lost, the target databases don't match the source databases and are thus inconsistent and incomplete. The completeness and consistency verification functionality flags this state ( Completeness and consistency verification ).

Scalability

In a database migration, time-to-migrate is an important metric. In a zero downtime migration (in the sense of minimal downtime), the migration of the data occurs while the source databases continue to change. To migrate in a reasonable timeframe, the rate of data transfer must be significantly faster than the rate of updates of the source database systems, especially when the source database system is large. The higher the transfer rate, the faster the database migration can be completed.

When the source database systems are quiesced and are not being modified, the migration might be faster because there are no changes to incorporate. In a homogeneous database, the time-to-migrate might be quite fast because you can use backup and restore or export and import features, and the transfer of files scales.

High availability and disaster recovery

In general, source and target databases are configured for high availability. A primary database has a corresponding read replica that is promoted to be the primary database when a failure occurs.

When a zone fails, the source or target databases fail over to a different zone to be continuously available. If a zone failure occurs during a database migration, the migration system itself is impacted because several of the source or target databases it accesses become inaccessible. The migration system must reconnect to the newly promoted primary databases that are running after a failure. Once the database migration system is reconnected, it must recover the migration itself to ensure the completeness and consistency of the data in the target databases. The migration system must determine the last consistent transfer to establish where to resume.

If the database migration system itself fails (for example, the zone it runs in becomes inaccessible), then it must be recovered. One recovery approach is a cold restart. In this approach, the database migration system is installed in an operational zone and restarted. The biggest issue to address is that the migration system must be able to determine the last consistent data transfer before the failure and continue from that point to ensure data completeness and consistency in the target databases.

If the database migration system is enabled for high availability, it can fail over and continue processing afterwards. If limited downtime of the database migration system is important, you need to select a database and implement high availability.

In terms of recovering the database migration, disaster recovery is very similar to high availability. Instead of reconnecting to newly promoted primary databases in a different zone, the database migration system must reconnect to databases in a different region (a failover region). The same holds true for the database migration system itself. If the region where the database migration system runs becomes inaccessible, the database migration system must fail over to a different region and continue from the last consistent data transfer.

Several pitfalls can cause inconsistent data in the target databases. Some common ones to avoid are the following:

  • Order violation. If scalability of the migration system is achieved by scaling out, then several data transfer processes are running concurrently (in parallel). Changes in a source database system are ordered according to committed transactions. If changes are picked up from the transaction log, the order must be maintained throughout the migration. Parallel data transfer can change the order because of varying speed between the underlying processes. It is necessary to ensure that the data is inserted into the target databases in the same order as it is received from the source databases.
  • Consistency violation. With differential queries, the source databases have additional data attributes that contain, for example, commit timestamps. The target databases won't have commit timestamps because the commit timestamps are only put in place to establish change management in the source databases. It is important to ensure that inserts into the target databases must be timestamp consistent, which means all changes with the same timestamp must be in the same insert or update or upsert transaction. Otherwise, the target database might have an inconsistent state (temporarily) if some changes are inserted and others with the same timestamp are not. This temporary inconsistent state does not matter if the target databases are not accessed for processing. However, if they are used for testing, consistency is paramount. Another aspect is the creation of the timestamp values in the source database and how they relate to the commit time of the transaction in which they are set. Because of transaction commit dependencies, a transaction with an earlier timestamp might become visible after a transaction with a later timestamp. If the differential query is executed between the two transactions, it won't see the transaction with the earlier timestamp, resulting in an inconsistency on the target database.
  • Missing or duplicate data. When a failover occurs, a careful recovery is required if some data is not replicated between the primary and the failover replica. For example, a source database fails over and not all data is replicated to the failover replica. At the same time, the data is already migrated to the target database before the failure. After failover, the newly promoted primary database is behind in terms of data changes to the target database (called flashback ). A migration system needs to recognize this situation and recover from it in such a way that the target database and the source database get back into a consistent state.
  • Local transactions. To have the source and target database receive the same changes, a common approach is to have clients write to both the source and target databases instead of using a data migration system. This approach has several pitfalls. One pitfall is that two database writes are two separate transactions; you might encounter a failure after the first finishes and before the second finishes. This scenario causes inconsistent data from which you must recover. Also, there are several clients in general, and they are not coordinated. The clients do not know the source database transaction commit order and therefore cannot write to the target databases implementing that transaction order. The clients might change the order, which can lead to data inconsistency. Unless all access goes through coordinated clients, and all clients ensure the target transaction order, this approach can lead to an inconsistent state with the target database.

In general, there are other pitfalls to watch out for. The best way to find problems that might lead to data inconsistency is to do a complete failure analysis that iterates through all possible failure scenarios. If concurrency is implemented in the database migration system, all possible data migration process execution orders must be examined to ensure that data consistency is preserved. If high availability or disaster recovery (or both) is implemented, all possible failure combinations must be examined.

What's next

  • Read Database migrations: Concepts and principles (Part 2) .
  • Migrating from PostgreSQL to Spanner
  • Migrating from an Oracle® OLTP system to Spanner
  • See Database migration for more database migration guides.
  • Explore reference architectures, diagrams, and best practices about Google Cloud. Take a look at our Cloud Architecture Center .

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024-03-07 UTC.

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

Publications

  • Our Methods
  • Short Reads
  • Tools & Resources

Read Our Research On:

Immigration & Migration

The religious composition of the world’s migrants.

The globe’s 280 million immigrants shape countries’ religious composition. Christians make up the largest share, but Jews are most likely to have migrated.

FEATURE: Religious composition of the world’s migrants, 1990-2020

How Mexicans and Americans view each other and their governments’ handling of the border

What we know about unauthorized immigrants living in the u.s., how the origins of america’s immigrants have changed since 1850, sign up for our weekly newsletter.

Fresh data delivered Saturday mornings

In Tight U.S. Presidential Race, Latino Voters’ Preferences Mirror 2020

More Latino registered voters back Kamala Harris (57%) than Donald Trump (39%), and supporters of each candidate prioritize different issues.

1 in 10 eligible voters in the U.S. are naturalized citizens

Naturalized citizens make up a record number of eligible voters in 2022, most of whom have lived here more than 20 years.

The globe’s 280 million immigrants shape countries’ religious composition. Christians make up the largest share, but Jews are most likely to have migrated.

Religious composition of the world’s migrants, 1990-2020

Explore our interactive table showing the religious composition of immigrants around the globe and how it’s changed from 1990 to 2020.

Mexicans hold generally positive views of the United States, while Americans hold generally negative views of Mexico – a reversal from 2017.

The unauthorized immigrant population in the U.S. grew to 11 million in 2022, but remained below the peak of 12.2 million in 2007.

In 2022, the number of immigrants living in the U.S. reached a high of 46.1 million, accounting for 13.8% of the population.

What the data says about immigrants in the U.S.

In 2022, roughly 10.6 million immigrants living in the U.S. were born in Mexico, making up 23% of all U.S. immigrants.

In some countries, immigration accounted for all population growth between 2000 and 2020

In 14 countries and territories, immigration accounted for more than 100% of population growth during this period.

Cultural Issues and the 2024 Election

Voters who support Biden and Trump have starkly different opinions on many issues, and these two groups are divided internally as well.

REFINE YOUR SELECTION

Research teams.

901 E St. NW, Suite 300 Washington, DC 20004 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 |  Media Inquiries

Research Topics

  • Email Newsletters

ABOUT PEW RESEARCH CENTER  Pew Research Center is a nonpartisan, nonadvocacy fact tank that informs the public about the issues, attitudes and trends shaping the world. It does not take policy positions. The Center conducts public opinion polling, demographic research, computational social science research and other data-driven research. Pew Research Center is a subsidiary of The Pew Charitable Trusts , its primary funder.

© 2024 Pew Research Center

  • Original Article
  • Open access
  • Published: 17 September 2024

Analysis and mapping of global research publications on migrant domestic workers

  • Waleed M. Sweileh   ORCID: orcid.org/0000-0002-9460-5144 1  

Comparative Migration Studies volume  12 , Article number:  38 ( 2024 ) Cite this article

187 Accesses

1 Altmetric

Metrics details

Recognizing the importance of evidence-based research in informing migration policies and empowering migrant domestic workers (MDWs), this study aims to provide a comprehensive analysis of MDW research patterns and trends. Using a descriptive cross-sectional study design, research articles on MDWs were retrieved from the Scopus database. The findings reveal a substantial increase in research output in recent years, with notable contributions from journals in the fields of social sciences and humanities. Key contributors include scholars from the United States, the United Kingdom, and institutions such as the National University of Singapore and the Chinese University of Hong Kong . Journals in the field of migration have prominent role in publishing research on MDWs. At the author level, Yeoh, B.S.A, at the National University of Singapore was the most prolific author. Academic activities were the main driver of research and that funding was suboptimal in this field. Highly cited articles focused on topics such as transnational motherhood, the international division of reproductive labor, and the negotiation of citizenship rights. Major research hotspots in the retrieved articles included mental health aspects, caregiving especially of the elderly, and struggles for legal rights. Specific nationalities, such as Filipina/o and Indonesian MDWs, have been the focus of numerous studies, shedding light on their narratives, challenges, and agency within transnational contexts. Overall, this study underscores the urgency of addressing the needs and rights of MDWs, advocate for human rights, and enhance understanding of occupational health and safety in the unique context of domestic work.

Introduction

Globalization has spurred an influx of individuals seeking improved livelihoods, with migrant workers, defined as those who relocate within or beyond their country in pursuit of employment, becoming increasingly prevalent (Douglas et al., 2019 ). Among these migrants, particular attention has recently focused on migrant domestic workers (MDWs), predominantly women who migrate to engage in household tasks such as cooking, cleaning, and caregiving in foreign countries or regions (International Labour Organization (ILO), 2022 ; Yeoh & Huang, 2000 ). The majority of MDWs originate from low- and middle-income nations, migrating to host countries across the Arab Gulf, Europe, the Western Pacific, and Northern America. Currently, there are approximately 67.1 million domestic workers worldwide, with 11.5 million classified as MDWs, comprising 17.2% of global domestic workers and 7.7% of total migrant laborers (United Nations, 2016 ), the majority of whom are female. This surge in female migrant workers over the past two decades has been termed the “feminization of migration (Gabaccia, 2016 ).

MDWs occupy a unique employment status characterized by several factors: they are typically bound to a single employer, work within the employer’s private residence, may lack official work documentation due to illegal entry facilitated by human trafficking or smuggling, and in some instances, may be minors, rendering their employment a form of slavery (Basnyat & Chang, 2017 ). Gender disparities, language barriers, cultural disparities, inadequate regulation in host countries, social discrimination, and limited access to healthcare services further endanger the physical and mental well-being of MDWs (Hall, Garabiles et al., 2019a ; Hall et al., 2019 ), rendering them susceptible to various forms of abuse and human rights violations (Hargreaves et al., 2019 ). Despite these challenges, the allure of earning abroad remains a compelling aspiration for millions of women, driven by poverty, adversity, and a desire for international experience to enhance social standing and empowerment.

Addressing the needs and empowering MDWs necessitates an understanding of the existing literature on this migrant group. Future migration policies, both in sending and receiving nations, must be informed by evidence-based research. Similarly, funding agencies rely on comprehensive literature reviews to identify research gaps and allocate financial resources accordingly. Despite the significant body of research on migration, (Gao & Wang, 2022 ; Sweileh, 2018 ; Sweileh et al., 2018 ), studies on research trends and patterns, specifically focused on MDWs are scarce (Malhotra et al., 2013 ). Thus, this study aims to provide academics, researchers, and policymakers in the field of migration, particularly labor migration, with a detailed analysis of MDW research patterns and trends to identify existing research gaps and avenues for future exploration. Aligned with the United Nations Sustainable Development Goals (SDGs), particularly SDG 08, which advocates for sustained economic growth while safeguarding labor rights and promoting decent work conditions, this research also supports the UN’s recognition of migration and mobility as integral components of national and global development (Aniche & order, 2020 ; Halisçelik & Soytas, 2019 ). Furthermore, it echoes the call from the International Organization for Migration for a comprehensive assessment of existing research, mapping of literature, and identification of focal points and deficiencies to inform a cohesive global research agenda on migration health (International Organization for Migration, 2017 ). Lastly, given the escalating numbers of MDWs and the gender disparities inherent in this field, there is an urgent need to spotlight research focal points to empower women, advocate for human rights and equality in labor laws, and enhance understanding of occupational health and safety, particularly in the unique context where home serves as the workplace for MDWs.

Study design

The current study was an observational, descriptive cross-sectional study.

All research papers published on MDWs available in Scopus database were retrieved and analyzed. Scopus has the advantage of being inclusive of PubMed and having double the number of indexed journals of Web of Science (Elsevier, 2023 ). Therefore, the use Scopus database alone is justifiable since it will retrieve the maximum number of publications.

Selection of research papers

We followed the recommendations laid out in PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) to include quality research papers in our review. Supplement 1 is a PISMA-adapted flow diagram showing the number of retrieved documents in each step in the search strategy (Page et al., 2021 ), as well as the keywords used in the search query. The retrieved documents were refined based on several inclusion and exclusion criteria. The search string was open for all times up to December 31st, 2023. The types of documents included in the analysis were journal research articles written in English. Other document types, such as editorials, notes, letters, conference abstracts, and review articles were excluded.

Investigated variables

The followings were the variables investigated and addressed in the current study:

The annual number of publications.

Key contributors to the retrieved publications such as core journals, countries, institutions, and authors.

Citation analysis and content analysis of top 10 cited articles.

Research hotspots identified through the most frequently encountered terms in the retrieved publications.

Content analysis of publications on various nationalities of MDWs.

Data management and visualization

The retrieved publications were exported from Scopus to Microsoft Excel and VOSviewer program for analysis and mapping. The VOSviewer map of research hot spots presents terms as nodes (circles) (van Eck & Waltman, 2010 ). Nodes with larger size are more frequently encountered in the retrieved articles. In the map, closely related terms have similar node color and exist in a cluster. Each cluster represents a research hotspot. The mapping of worldwide geographic distribution of publications was carried using Microsoft Excel Program. The data that support the findings of this study are available from the author upon reasonable request.

To confirm the absence of bias, the retrieved articles were sorted based on the number of citations received and the top 30 cited articles were reviewed to confirm that none was outside the scope of MDWs. Furthermore, the number of articles published by each of the top authors was matched to their actual contribution to the field and tested for correlation (Sweileh et al., 2018 ). Both approaches confirm the absence of bias regarding false-positive and false-negative results.

Statistical methods

Descriptive statistics and graphics were carried out using the Microsoft Excel and the Statistical Package for Social Sciences. For citation analysis, the mean and the Hirsch index (H-index) (Hirsch, 2005 ) were used. For mapping the content of the retrieved literature, VOSviewer was used in which the node size, color, and thickness of connecting lines reflect the frequency of occurrence, relatedness, and strength of relatedness respectively (van Eck & Waltman, 2010 ).

Annual number of publications and growth pattern

Based on the search query applied, 705 research articles were found. The earliest publication about MDWs appeared in 1988 and was about Filipina domestic workers in Hong Kong (French & Lam, 1988 ). From 1988 to 1996, only 8(1.1%) articles were published. More than half ( n  = 359, 50.9%) of the articles were published between 2018 and 2023 (Fig.  1 ). The annual number of publications reached the peak in 2021 with 70 (9.9%) articles. Of the total retrieved articles, 579 (82.1%) were published in journals identified in the field of social sciences, 167 (23.7%) in the field of humanities, 101 (14.3%) in the field of medicine, and 51 (7.2%) in the field of psychology with certain potential overlap between different subject areas.

figure 1

Linear graphic presentation of the annual growth of research publications on migrant domestic workers

Key contributors

The Journal of Ethnic and Migration Studies was the leading journal and published 22 (3.1%) articles, followed by the Asian and Pacific Migration Journal (21; 3.0%) (Table  1 ). The top active 10 countries in publishing articles on MDWs were shown in Table  2 . Scholars from the United States made the most contribution to the field with 107 (15.2%) articles, followed by scholars from the United Kingdom (UK) ( n  = 102, 14.5%). Figure  2 shows the worldwide distribution of the retrieved articles. The South-East Asian region, the Western Pacific region, the region of North America, and Latin American countries have noticeable contribution. In the African region, contributions of South Africa and Ethiopia were visible.

figure 2

Worldwide distribution of research publications on migrant domestic workers. The map was created by Microsoft Excel

The National University of Singapore ( n  = 42, 6.0%) and the Chinese University of Hong Kong ( n  = 23, 3.3%) were the most active institution in publishing articles on MDWs. The University of Toronto and the University of Hong Kong were among the top active institutions. The top 10 active institutions were academic institutions. At the author level, Yeoh, B.S.A, at the National University of Singapore was the most prolific author in the field with 25 (3.5%) articles. Regarding funding, the National University of Singapore was the most active funding sponsor in the field with 13 (1.8%) articles being funded by the university. Of the total retrieved articles, only 291 (41.3%) articles stated receiving funding.

Citation analysis and content analysis of top 10 cited articles on MDWs

The retrieved articles received 13522 citations and an H-index of 53. The top 10 cited articles on MDWs cover a range of topics related to the experiences, challenges, and impacts of migrant women engaged in domestic work. “I’m here, but I’m there: The meanings of Latina transnational motherhood” explores the concept of transnational motherhood among Latina immigrant women in Los Angeles, focusing on how they redefine motherhood due to spatial separation from their children (Hondagneu-Sotelo & Avila, 1997a ). “Migrant Filipina domestic workers and the international division of reproductive labor” discusses the global politics of reproductive labor, particularly the three-tier transfer of labor involving migrant Filipina domestic workers (Salazar Parreñas, 2000 ). “Negotiating public space: Strategies and styles of migrant female domestic workers in Singapore” examines the marginalized position of migrant domestic workers in Singapore and their negotiation of public space (Yeoh & Huang, 1998 ). “From registered nurse to registered nanny: Discursive geographies of Filipina domestic workers in Vancouver, B.C.” explores the occupational limitations faced by Filipina domestic workers in Vancouver, highlighting discursive constructions of identity (Pratt, 1999 ). “Migrant female domestic workers: Debating the economic, social and political impacts in Singapore” analyzes the socioeconomic and political implications of migrant domestic workers in Singapore (Yeoh & Huang, 1999 ). “A very private business: Exploring the demand for migrant domestic workers” investigates the demand for migrant domestic workers in the UK and the role of immigration status in recruitment and retention (Anderson, 2007 ). “Care workers, care drain, and care chains: Reflections on care, migration, and citizenship” discusses the care chain phenomenon and its impact on sending and receiving countries (Lutz & Palenga-Möllenbeck, 2012 ). " ‘Home’ and ‘away’: Foreign domestic workers and negotiations of diasporic identity in Singapore” explores the construction of diasporic identities among migrant domestic workers in Singapore. “Consuming the transnational family: Indonesian migrant domestic workers to Saudi Arabia” examines the narratives of Indonesian migrant domestic workers in Saudi Arabia and their consumption desires (Silvey, 2006 ). Finally, “Negotiating citizenship: The case of foreign domestic workers in Canada” argues for a re-conceptualization of citizenship as a negotiated relationship, focusing on the experiences of foreign domestic workers in Canada in navigating citizenship rights within restrictive national and international contexts (Stasiulis & Bakan, 1997 ).

Research hotspots in the retrieved articles on MDWs

Visualization of the most frequently encountered terms in the titles/abstracts of the retrieved articles showed the presence of the three main research hotspots (Fig.  3 ): (1) the green cluster focuses on the health aspects of MDWs; (2) the second blue cluster focused on the caregiving, especially the elderly, by MDWs; and (3) the third green cluster focused on the social aspects, specifically, the struggle for citizenship by MDWs.

figure 3

Network visualization map of terms in the titles/abstracts with a minimum occurrence of five times. Each cluster of grouped nodes represents a research hotspot. The map was created by VOSviewer

The health-related research hotspot explored (1) the mental health challenges and psychosocial impact experienced by MDWs, including stress, depression, anxiety, and coping strategies [43]; (2) the working conditions, occupational hazards, safety concerns, and experiences of aggression faced by MDWs in various countries [13, 44]; (3) the social support and coping mechanisms, which investigated the role of social support networks, coping mechanisms, and their impact on the mental well-being of migrant domestic workers [10, 45]; (4) the caregiving and elderly care that focused on the caregiving responsibilities, challenges, and relationships between MDWs and elderly care recipients [46, 47]; and finally (5) access to healthcare and behavior, which examined healthcare access, utilization, health behaviors, and the impact of stress on the health and quality of life of MDWs [48, 49]. For the caregiving research hotspot, the research publications collectively contributed to understanding the complex dynamics of caregiving and highlight the interconnectedness between MDWs, caregiving, and employment conditions (Akalin, 2007 ; Asis & Carandang, 2020 ; Ayalon et al., 2013 ; Basnyat & Chang, 2017 ; Tay & Kong, 2020 ). For the final research hotspot focusing on citizenship and legal issues of MDWs, articles within this cluster collectively contributed to a deeper understanding of the legal challenges and citizenship issues confronting MDWs, highlighting areas for policy reform and advocacy efforts (Cheng & Choo, 2015 ; Gabriel & Macdonald, 2014 ; Henderson, 2021 ; Husni & Suryani, 2018 ; Ito, 2016 ; Karachurina et al., 2019 ; Kontos, 2013 ; Lutz & Palenga-Möllenbeck, 2012 ; Stasiulis & Bakan, 1997 ; Tan, 2010 ).

Nationalities of investigated MDWs

Analysis showed that there were 153 (21.7%) articles about Filipino/Filipina MDWs. The collection of articles on Filipina/o MDWs delves into the experiences of Filipina/o, examining various aspects such as the international division of reproductive labor, discursive constructions shaping labor market experiences, representations of workers in receiving countries, negotiations of class identities, and determinants of health and well-being (Chin, 1997 ; Hall, Garabiles et al., 2019b; Holroyd et al., 2001 ; Lan, 2003 ; Pratt, 1999 ; Salazar Parreñas, 2000 ). They reveal the complexities and challenges faced by Filipina domestic workers, including their intermediary role within the global division of labor, marginalization within host societies, and adverse working conditions impacting their physical and mental health. These studies underscore the need for greater awareness, advocacy, and policy interventions to address the rights, well-being, and empowerment of migrant domestic workers, particularly those from the Philippines.

There were 87 (12.4%) articles about Indonesian MDWs. The articles on Indonesian migrant domestic workers offer insights into various aspects of their experiences and challenges. One article focused on women migrants’ narratives of transnational migration to Saudi Arabia, highlighting their consumption desires and practices, religious influences on mothering practices, and the larger struggles within Indonesian state relations with women and Islam (Silvey, 2006 ). Another article analyzed representations of foreign female domestic workers in Malaysia, uncovering how state officials, employers, and agencies depict these workers, obscuring their status as protected laborers (Chin, 1997 ). A separate study examined the gender politics of scale in activist approaches to MDWs’ rights, showcasing how activists construct and deploy different scales in advocating for these workers (Silvey, 2004 ). Another article explored migration governance in Asia, particularly the role of private recruitment agencies in shaping migration flows and circumventing formal cooperation with origin countries (Goh et al., 2017 ). Additionally, an investigation into grassroots movements and networks in Hong Kong sheds light on Indonesian MDWs’ agency, gender roles, and class formation within transnational political spaces (Rother, 2017 ). Finally, research on technology usage among Indonesian domestic workers in Singapore reveals their negotiation of social relations and agency within the constraints of Singapore’s migration regime (Platt et al., 2016 ). These studies collectively offer nuanced understandings of Indonesian migrant domestic workers’ lives, challenges, and agency within transnational contexts. There were 19 (2.7%) on Sri Lankan, 18 (2.6%) Latino/Latina MDWs, and 17 (2.4%) Ethiopian MDWs.

In the current study, research trends and patterns were investigated using the Scopus database. Despite that domestic work is an old phenomenon, research on MDWs started in the late 1980s and was not visible until after 2010. The increased research activity on MDWs in the last decade was secondary to human rights advocacy groups and feminist activist groups (Brigham, 2004 ; Figueiredo et al., 2018 ; Huang & Yeoh, 2007 ; MenjÍVar & Salcido, 2002 ; Momsen, 2003 ), the increasing number of MDWs in the context of labor migration (Bastia et al., 2019 ; M. Gallotti & I. J. G. Branch, Switzerland, International Labour Organization., 2015 ; Marchetti et al., 2021 ), and the feminization of migration which led to the increase in the number of females in the field of domestic work (Pande, 2021 ).

The current study indicated that research on MDWs was mainly published by journals in the field of migration. However, journals in the field of human rights, law, and occupational health were under-presented. The field of MDWs is a multidisciplinary field that involve social, legal, health, and human rights researchers. The under-presentation of journals in the field of law and health keeps the problems of MDWs uncovered and unseen.

The current study showed that research from the Middle East region was relatively low given the high number of MDWs in the Arab Gulf countries. Approximately 27.4% of MDWs across the world are found in Arab states, particularly the Arab Gulf (M. Gallotti & I. J. G. Branch, Switzerland, International Labour Organization., 2015 ). However, only 13 (2.0%) articles on MDWs were contributed by researchers in Arab countries. Research articles on MDWs from scholars in Arab countries discussed issues such as human trafficking (de Regt, 2010 ; Demetriou, 2019 ; Jureidini, 2010 ; Shah et al., 2002 ), health issues (Abder-Rahman et al., 2021 ; Shah et al., 2012 ; Zahreddine et al., 2014 ), and abuse and discrimination (Amin, 2022 ). Therefore, more research is needed on health, human rights, cultural adaptation, and psychology of MDWs in Arab countries. It is also important that male MDWs in the Arab States be adequately addressed by researchers, given that about half of all male migrant domestic workers are in the Arab States, working as gardeners, drivers, and security guards (M. Gallotti & I. Branch, 2015 ). The Asia Pacific region is another world region with a high percentage of MDWs, with a relatively high proportion being illegal and financially exploited by employers (GLOBE, 2022 ; International Labour Organization, 2022 ). The majority of Asian MDWs travel from Indonesia, the Philippines, Sri Lanka, and Thailand to the Arab Gulf States, particularly Saudi Arabia and Kuwait, Hong Kong, Japan, Taiwan, Singapore, Malaysia, and Brunei for domestic work (Lim & Oishi, 1996 ).

The article on transnational motherhood among Latina migrant domestic workers garnered an exceptionally high number of citations, indicating the profound significance of this topic (Hondagneu-Sotelo & Avila, 1997a ). The challenges and narratives surrounding working migrant mothers and their families back home, particularly regarding the children left behind, are complex and deeply impactful. These mothers often experience significant emotional strain due to the separation, grappling with feelings of guilt, sadness, and longing as they are unable to actively participate in their children’s daily lives and milestones (Pineros-Leano et al., 2021 ). Moreover, the experience of transnational motherhood involves reimagining traditional notions of maternal roles within the context of migration, as migrant mothers must find ways to maintain connections with their children across geographical distances, often through frequent communication and financial support. Financial pressures also weigh heavily on migrant mothers, who undertake domestic work abroad as a means to provide better financial support for their families but must balance financial obligations with personal well-being (Gamburd, 2000 ). Additionally, they may face social stigma both in their host countries and home communities, perceived as neglectful mothers or inadequate workers (Parrenas, 2001 ). Children left behind by migrant mothers may experience feelings of abandonment and loneliness, struggling with the absence of parental figures in their lives. Despite these challenges, many migrant mothers demonstrate resilience and adaptability, developing creative coping strategies to maintain strong bonds with their children, such as sending audio and video messages and organizing periodic visits whenever possible (Qu et al., 2020 ). Understanding the complexities of transnational motherhood among migrant domestic workers is crucial for developing policies and support systems that address their unique needs and challenges while recognizing their agency and contributions to both their host and home communities.

Research articles on MDWs working in Singapore received relatively a higher number of citations. In 2019, there were approximately 262 thousand MDWs in Singapore relative to 215 thousand in 2013 (Statista, 2022 ). The majority of MDWs working in Singapore originates from neighboring Asian countries such as the Philippines, Indonesia, Sri Lanka, Myanmar, India, Thailand, and Bangladesh (Dutta et al., 2018 ). Reports of abuse of MDWs working in Singapore were present in national news where MDWs reported to emotional abuse, violence, intimidation, and invasion of privacy (Strangio, 2022 ). The presence of relatively large numbers of MDWs and the absence of protective regulations placed the Singaporean scholars and academic institutions in the leading positions in the field. The highly cited articles emphasize the vulnerability of Asian MDWs, particularly the Filipino and Indonesian. Despite the fact that not all MDWs are legally employed, none of the top-cited articles discussed issues related to undocumented MDWs or the smuggling and human trafficking of MDWs (Raijman et al., 2003 ; Siruno et al., 2022 ). It is possible that these issues are difficult to research given that most MDWs are living-in and difficult to reach by researchers. It is also possible that MDWs themselves are not willing to talk since most of them are part of poor societies. Of particular interest is the top-cited article, which discussed Latina mothers working as MDWs in the United States and how they adapt and practice motherhood for children left behind in their original countries (Pierrette Hondagneu-Sotelo & Ernestine Avila, 1997 ). This is a serious social and health problem facing migrant workers who leave their family members behind in their original countries (Cortes, 2015 ; Peng & Wong, 2015 ). The price that family members left-behind pay may be greater than the benefits of remittances sent by the migrant parent(s).

The findings of the current study showed several research gaps in the field. Despite the fact that the majority of MDWs are women, research on women’s health issues was limited. Health-related issues of female MDWs should be the responsibility of both the sending and receiving countries (Malhotra et al., 2013 ). The working conditions negatively affect the general and sexual health of female MDWs (Anjara et al., 2017 ; Yi et al., 2020 ). Occupational stress and distance from family generate emotional stress and depression among MDWs with no or limited access to mental healthcare services. MDWs should receive pre-departure training on coping with stress and building resilience to stressful events at work (Regmi et al., 2020 ). A second research gap in the field is the relationship and communication of MDWs with family members left behind, especially children. Information about family members left behind in the context of migration needs to be collected to understand and evaluate the pros and cons of migration and leaving family behind (Hugo & Ukwatta, 2010 ). A third research gap is about MDWs in the Middle East. Reports of abuse and human rights violations of MDWs in the Middle East have been published (Al Rifai et al., 2015 ; Fernandez, 2018 ; Shewamene et al., 2022 ; Wickramage et al., 2017 ). More publications are needed to implement appropriate protective national regulations regarding MDWs. The “Human Rights Watch” reported thousands of human rights violations against female MDWs by their employers in Lebanon and other Arab countries. Such violations included long working hours with no breaks; less than average monthly wages; sexual and verbal abuse, and a lack of protective regulations under the Lebanese labor code (Blog, 2019 ). The Kafala or sponsorship system plays a negative role in the mistreatment of MDWs in several Arab countries (Malit Jr. & Naufal, 2016 ; Parreñas & Silvey, 2021 ). In many cases, the passport of the MDWs is confiscated, and the movement of the MDWs is restricted by the employer and the recruiting agency (Parreñas, 2021 ; Rak, 2021 ).

The finding that the National University of Singapore was the most active funding sponsor in the field of MDWs, supporting 1.8% of the articles, sheds light on the dynamics of research funding in this area. It suggests that academia, particularly institutions in Singapore, plays a significant role in driving research in the field of MDWs. This involvement of academia in funding research aligns with the academic community’s commitment to addressing pressing social issues and advancing knowledge in relevant domains. However, the fact that approximately 41% of publications received funding indicates a potential gap in research funding for MDWs. While academia, represented by institutions like National University of Sigapore, may provide some financial support for research projects, the overall level of funding appears to be insufficient to meet the needs of researchers in the field. This suggests that research publications in the area of MDWs may not be primarily driven by governmental bodies or external funding agencies. The lack of adequate funding for MDW research raises concerns about the scope and depth of studies conducted in this area. Insufficient funding may limit researchers’ ability to conduct comprehensive investigations, gather robust data, and address complex issues facing MDWs. It may also hinder efforts to explore innovative solutions, develop effective interventions, and influence policy decisions. Overall, while academia, exemplified by institutions like the National University of Singapore , plays a crucial role in supporting research on MDWs, there appears to be a need for increased funding from governmental bodies, non-governmental organizations, and other external sources. Adequate funding is essential to ensure that researchers have the resources they need to conduct high-quality studies, generate meaningful insights, and contribute to positive outcomes for MDWs and the communities they serve. The finding that several journals in the field of migration were among the top active journals publishing research on migrant domestic workers suggests a strong alignment between the research focus and the editorial scope of these journals. This alignment indicates that editorial decisions may have played a role in driving research publications in this area. When journals specialize in a particular field or topic, they are more likely to attract submissions related to that subject matter. In the case of migrant domestic workers, researchers may have chosen to submit their work to migration-focused journals because these outlets offer a targeted audience of scholars, policymakers, and practitioners interested in migration issues. Additionally, editorial boards of migration journals may actively seek out and encourage submissions on topics such as labor migration, gender dynamics, and human rights, which are central to the study of MDWs. Furthermore, the prominence of migration-focused journals among the top active list suggests that these outlets may have provided a platform for researchers to disseminate their findings and contribute to the ongoing discourse within the field. By publishing in these journals, researchers may have sought to reach a broader audience and engage with the academic community working on migration-related issues. Overall, while other factors such as funding priorities and academic mentorship may have influenced the research agenda on migrant domestic workers, the presence of migration-focused journals among the top active list suggests that editorial decisions likely played a role in driving research publications in this area.

The prevalence of articles focusing on Filipina and Indonesian MDWs in the bibliometric analysis likely arises from several interconnected factors. Firstly, both Filipina and Indonesian migrant workers constitute substantial proportions of the global domestic worker population, owing to the significant numbers of individuals from these countries engaged in labor migration. As major sending countries for migrant workers, particularly in the domestic work sector, their large presence in destination countries across various regions makes them prominent subjects of study. Additionally, Southeast Asia has a robust tradition of academic research on migration, driven by the region’s historical, economic, and social dynamics. Consequently, academic institutions and journals in Southeast Asia may prioritize research on MDWs from these countries, contributing to the higher number of articles focusing on them. Moreover, Filipina and Indonesian migrant domestic workers are employed in a diverse array of destination countries across different continents, including Asia, the Middle East, Europe, and North America. Their global presence attracts attention from researchers worldwide, resulting in a greater volume of literature examining their experiences, challenges, and contributions to host societies. Issues related to Filipina and Indonesian migrant domestic workers, such as labor rights violations, social integration, and health concerns, have gained international visibility due to advocacy efforts, media coverage, and policy debates. This visibility may drive academic interest and research funding, leading to a higher number of publications focusing on these specific migrant groups. Finally, the availability of researchers proficient in the languages and cultures of the Philippines and Indonesia may also contribute to the abundance of literature on Filipina and Indonesian MDWs. Scholars from these countries or with connections to these regions may be more inclined to study and publish research on migrant workers from their respective countries of origin.

While this bibliometric analysis provides valuable insights into the research landscape concerning migrant domestic workers (MDWs), it is important to recognize several limitations. Firstly, there is the potential for database bias, despite using Scopus, a comprehensive database, as some publications on MDWs may not have been indexed or included in the analysis. This could be due to variations in indexing practices across disciplines and regions. Secondly, the analysis focused solely on articles written in English, which may have excluded significant research published in other languages, leading to an incomplete representation of the global research landscape on MDWs. Thirdly, the inclusion criteria were limited to journal research articles, excluding other potentially relevant document types such as editorials and conference abstracts, which may have resulted in the omission of important perspectives or findings. Additionally, there may be a temporal bias, as efforts were made to include articles published up to December 31st, 2023, potentially overlooking recent publications and emerging research developments. Furthermore, despite efforts to minimize bias through rigorous methodology, the interpretation of data and identification of research hotspots involved subjective judgment, which could have led to variations in results among different analysts. Lastly, the analysis revealed regional imbalances in research output, with certain geographic areas being underrepresented, which may skew the overall understanding of MDW research trends and priorities.

Conclusions

In conclusion, the analysis reveals several significant findings regarding research on MDWs. Firstly, there has been a notable increase in research activity surrounding MDWs over the past decade, propelled by factors such as human rights advocacy and the feminization of migration. Secondly, while journals in the field of migration have been the primary contributors to MDW research, there is an evident underrepresentation in areas like human rights and health, suggesting potential avenues for further exploration. Thirdly, despite the global scope of MDW research, certain regions, notably the Middle East, remain underrepresented despite hosting a significant population of MDWs, indicating the necessity for region-specific research to address context-specific challenges. Additionally, articles focusing on MDWs in specific countries, such as Singapore, have received relatively higher citation counts, emphasizing the importance of localized research in generating scholarly impact. However, despite the growing body of literature, significant research gaps persist, particularly concerning women’s health, family communication dynamics, and the experiences of MDWs in the Middle East. Addressing these gaps is crucial for informing policy and advocacy efforts aimed at improving the rights, well-being, and working conditions of MDWs globally. Future research should prioritize investigating the physical and mental health challenges faced by female MDWs, examining the impact of migration on family dynamics, especially communication and relationships between MDWs and their family members left behind, and focusing on the Middle East region, where MDWs are significantly present, yet underrepresented in research. Encouraging interdisciplinary collaboration between researchers in fields such as migration, health, law, and human rights can provide more comprehensive insights into the multifaceted challenges facing MDWs, ultimately informing evidence-based policies and interventions aimed at advancing the well-being and rights of this vulnerable migrant population.

Data availability

Not Applicable.

Abder-Rahman, H. A., Al-Soleiti, M., Habash, I. H., Al-Abdallat, I. M., & Al-Abdallat, L. I. (2021). Patterns of death among migrant domestic workers in Jordan: Retrospective analysis of 63 cases in a tertiary hospital [Article]. Egyptian Journal of Forensic Sciences , 11 (1). https://doi.org/10.1186/s41935-021-00240-8

Akalin, A. (2007). Hired as a caregiver, demanded as a housewife: Becoming a migrant domestic worker in Turkey. European Journal of Women’s Studies , 14 (3), 209–225.

Article   Google Scholar  

Al Rifai, R., Nakamura, K., Seino, K., Kizuki, M., & Morita, A. (2015). Unsafe sexual behaviour in domestic and foreign migrant male workers in multinational workplaces in Jordan: Occupational-based and behavioural assessment survey. British Medical Journal Open , 5 (6), e007703. https://doi.org/10.1136/bmjopen-2015-007703

Amin, M. E. K. (2022). Addressing cultural competence and bias in treating migrant workers in pharmacies: Pharmacy students learning and changing norms [Article]. Research in Social and Administrative Pharmacy , 18 (8), 3362–3368. https://doi.org/10.1016/j.sapharm.2021.11.012

Anderson, B. (2007). A very private business: Exploring the demand for migrant domestic workers [Article]. European Journal of Women’s Studies , 14 (3), 247–264. https://doi.org/10.1177/1350506807079013

Aniche, E. T. J. M. c., regional integration, & order, d. A.-E. r. i. a. c. g. (2020). Migration and sustainable development: Challenges and opportunities .

Anjara, S. G., Nellums, L. B., Bonetto, C., & Van Bortel, T. (2017). Stress, health and quality of life of female migrant domestic workers in Singapore: A cross-sectional study. Bmc Women’s Health , 17 (1), 98. https://doi.org/10.1186/s12905-017-0442-7

Asis, E., & Carandang, R. R. (2020). The plight of migrant care workers in Japan: A qualitative study of their stressors on caregiving. J Migr Health , 1–2 , 100001. https://doi.org/10.1016/j.jmh.2020.100001

Ayalon, L., Halevy-Levin, S., Ben-Yizhak, Z., & Friedman, G. (2013). Family caregiving at the intersection of private care by migrant home care workers and public care by nursing staff. International Psychogeriatrics , 25 (9), 1463–1473. https://doi.org/10.1017/s1041610213000628

Basnyat, I., & Chang, L. (2017). Examining Live-In Foreign Domestic helpers as a Coping Resource for Family caregivers of people with dementia in Singapore. Health Communication , 32 (9), 1171–1179. https://doi.org/10.1080/10410236.2016.1220346

Bastia, T., Piper, N. J. G., & Development (2019). Women migrants in the global economy: A global overview (and regional perspectives). Gender and Development , 27 (1), 15–30.

Blog, L. S., o., E. D., & o., I. D. (2019, April 05). Migrant Domestic Workers in the Middle East: between state ignorance and obsolete laws . https://blogs.lse.ac.uk/internationaldevelopment/2019/07/04/migrant-domestic-workers-in-the-middle-east-between-state-ignorance-and-obsolete-laws/

Brigham, S. M. (2004). Women migrant workers in the global economy: The role of critical feminist pedagogy for Filipino domestic workers .

Cheng, C. M. C., & Choo, H. Y. (2015). Women’s Migration for Domestic Work and Cross-border Marriage in East and Southeast Asia: Reproducing domesticity, contesting citizenship [Article]. Sociology Compass , 9 (8), 654–667. https://doi.org/10.1111/soc4.12289

Chin, C. B. N. (1997). Walls of silence and late twentieth century representations of the foreign female domestic worker: The case of Filipina and Indonesian female servants in Malaysia [Article]. International Migration Review , 31 (2), 353–385. https://doi.org/10.2307/2547224

Cortes, P. (2015). The feminization of International Migration and its effects on the children left behind: Evidence from the Philippines. World Development , 65 , 62–78. https://doi.org/10.1016/j.worlddev.2013.10.021

de Regt, M. (2010). Ways to come, ways to leave: Gender, mobility, and il/legality among Ethiopian domestic workers in Yemen [Article]. Gender and Society , 24 (2), 237–260. https://doi.org/10.1177/0891243209360358

Demetriou, D. (2019). The Mens Rea of Human trafficking: The case of migrant domestic workers [Article]. International Criminal Justice Review , 29 (3), 262–283. https://doi.org/10.1177/1057567718788931

Douglas, P., Cetron, M., & Spiegel, P. (2019). Definitions matter: Migrants, immigrants, asylum seekers and refugees. Journal of Travel Medicine: Official Publication of the International Society of Travel Medicine and the Asia Pacific Travel Health Association , 26 (2). https://doi.org/10.1093/jtm/taz005

Dutta, M. J., Comer, S., Teo, D., Luk, P., Lee, M., Zapata, D., Krishnaswamy, A., & Kaur, S. (2018). Health meanings among Foreign Domestic Workers in Singapore: A Culture-centered Approach. Health Communication , 33 (5), 643–652. https://doi.org/10.1080/10410236.2017.1292576

Elsevier (2023). Scopus . Retrieved March 31 from https://www.elsevier.com/solutions/scopus

Fernandez, B. (2018). Health inequities faced by Ethiopian migrant domestic workers in Lebanon. Health & Place , 50 , 154–161. https://doi.org/10.1016/j.healthplace.2018.01.008

Figueiredo, M. C., Suleman, F., & Botelho, M. C. (2018). Workplace abuse and harassment: The vulnerability of Informal and migrant domestic workers in Portugal. Social Policy and Society , 17 (1), 65–85. https://doi.org/10.1017/S1474746416000579

French, C., & Lam, Y. M. (1988). Migration and job satisfaction - A logistic regression analysis of satisfaction of Filipina domestic workers in Hong Kong [Article]. Social Indicators Research , 20 (1), 79–90. https://doi.org/10.1007/BF00384219

Gabaccia, D. R. (2016). Feminization of Migration. In The Wiley Blackwell Encyclopedia of Gender and Sexuality Studies (pp. 1–3). https://doi.org/10.1002/9781118663219.wbegss732

Gabriel, C., & Macdonald, L. (2014). Domestic transnationalism’: Legal advocacy for Mexican migrant workers’ rights in Canada [Article]. Citizenship Studies , 18 (3–4), 243–258. https://doi.org/10.1080/13621025.2014.905264

Gallotti, M., & Branch, I. (2015). Migrant domestic workers across the world: Global and regional estimates .

Gallotti, M., Branch, I. J. G., & Switzerland (2015). International Labour Organization. Migrant domestic workers across the world: Global and regional estimates .

Gamburd, M. R. (2000). The kitchen spoon’s handle: Transnationalism and Sri Lanka’s migrant housemaids . Cornell University Press.

Gao, H., & Wang, S. (2022). The Intellectual Structure of Research on Rural-to-Urban migrants: A bibliometric analysis. International Journal of Environmental Research and Public Health , 19 (15). https://doi.org/10.3390/ijerph19159729

GLOBE (2022). WORKERS’ RIGHTS: Under-Protected Abroad, Domestic Workers Find Ways to Resist . Retrieved April 01 from https://southeastasiaglobe.com/underprotected-abroad-domestic-workers/

Goh, C., Wee, K., & Yeoh, B. S. A. (2017). Migration governance and the migration industry in Asia: Moving domestic workers from Indonesia to Singapore [Article]. International Relations of the Asia-Pacific , 17 (3), 401–433. https://doi.org/10.1093/irap/lcx010

Halisçelik, E., & Soytas, M. A. (2019). Sustainable development from millennium 2015 to Sustainable Development Goals 2030. Sustainable Development , 27 (4), 545–572.

Hall, B. J., Garabiles, M. R., & Latkin, C. A. (2019a). Work life, relationship, and policy determinants of health and well-being among Filipino domestic workers in China: A qualitative study. Bmc Public Health , 19 (1), 229. https://doi.org/10.1186/s12889-019-6552-4

Hall, B. J., Pangan, C. A. C., Chan, E. W. W., & Huang, R. L. (2019c). The effect of discrimination on depression and anxiety symptoms and the buffering role of social capital among female domestic workers in Macao, China. Psychiatry Research , 271 , 200–207. https://doi.org/10.1016/j.psychres.2018.11.050

Hargreaves, S., Rustage, K., Nellums, L. B., McAlpine, A., Pocock, N., Devakumar, D., Aldridge, R. W., Abubakar, I., Kristensen, K. L., Himmels, J. W., Friedland, J. S., & Zimmerman, C. (2019). Occupational health outcomes among international migrant workers: A systematic review and meta-analysis. Lancet Glob Health , 7 (7), e872–e882. https://doi.org/10.1016/s2214-109x(19)30204-9

Henderson, S. (2021). The legal protection of women migrant domestic workers from the Philippines and Sri Lanka: An intersectional rights-based approach [Article]. International Journal of Care and Caring , 5 (1), 65–83. https://doi.org/10.1332/239788220X15976836167721

Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proc Natl Acad Sci U S A , 102 (46), 16569–16572. https://doi.org/10.1073/pnas.0507655102

Holroyd, E. A., Molassiotis, A., & Taylor-Pilliae, R. E. (2001). Filipino domestic workers in Hong Kong: Health related behaviors, health locus of control and social support [Article]. Women and Health , 33 (1–2), 181–205. https://doi.org/10.1300/J013v33n01_11

Hondagneu-Sotelo, P., & Avila, E. (1997). I’m here, but i’m there: The meanings of latina transnational motherhood [Article]. Gender and Society , 11 (5), 548–571. https://doi.org/10.1177/089124397011005003

Hondagneu-Sotelo, P., & Avila, E. (1997a). I’m Here, but I’m there: The meanings of Latina transnational motherhood. Gender and Society , 11 (5), 548–571. http://www.jstor.org/stable/190339

Huang, S., & Yeoh, B. S. A. (2007). Emotional labour and transnational domestic work: The moving geographies of ‘Maid abuse’ in Singapore. Mobilities , 2 (2), 195–217. https://doi.org/10.1080/17450100701381557

Hugo, G., & Ukwatta, S. (2010). Sri Lankan female domestic workers overseas — the impact on their children. Asian and Pacific Migration Journal , 19 (2), 237–263. https://doi.org/10.1177/011719681001900203

Husni, L., & Suryani, A. (2018). Legal protection for woman domestic workers based on the international convention [Article]. Journal of Legal, Ethical and Regulatory Issues , 21 (2). https://www.scopus.com/inward/record.uri?eid=2-s2.0-85050271491&partnerID=40&md5=c751d3461a396058b230ad7020ef8338

International Labour Organization (2022). Domestic workers . Retrieved April 01 from https://www.ilo.org/asia/areas/domestic-workers/lang--en/index.htm

International Labour Organization (ILO) (2022). Child labour and domestic work . Retrieved March 4th, from https://www.ilo.org/ipec/areas/Childdomesticlabour/lang--en/index.htm

International Organization for Migration (2017). Report of the 2nd Global Consultation on Migrant Health: Resetting the Agenda. https://www.iom.int/sites/g/files/tmzbdl486/files/our_work/DMM/Migration-Health/GC2_SriLanka_Report_2017_FINAL_22.09.2017_Internet.pdf

Ito, R. (2016). Negotiating partial citizenship under Neoliberalism: Regularization struggles among Filipino domestic workers in France (2008–2012) [Article]. International Journal of Japanese Sociology , 25 (1), 69–84. https://doi.org/10.1111/ijjs.12046

Jureidini, R. (2010). Trafficking and contract migrant workers in the middle east [Article]. International Migration , 48 (4), 142–163. https://doi.org/10.1111/j.1468-2435.2010.00614.x

Karachurina, L., Florinskaya, Y., & Prokhorova, A. (2019). Higher wages Vs. Social and Legal Insecurity: Migrant domestic workers in Russia and Kazakhstan [Article]. Journal of International Migration and Integration , 20 (3), 639–658. https://doi.org/10.1007/s12134-018-0625-6

Kontos, M. (2013). Negotiating the Social Citizenship Rights of Migrant Domestic Workers: The right to Family Reunification and a Family Life in policies and debates [Article]. Journal of Ethnic and Migration Studies , 39 (3), 409–424. https://doi.org/10.1080/1369183X.2013.733861

Lan, P. C. (2003). They have more money but I speak better English! Transnational encounters between Filipina domestics and Taiwanese employers) [Article]. Identities , 10 (2), 133–161. https://doi.org/10.1080/10702890304325

Lim, L. L., & Oishi, N. (1996). International Labor Migration of Asian women: Distinctive characteristics and policy concerns. Asian and Pacific Migration Journal , 5 (1), 85–116. https://doi.org/10.1177/011719689600500105

Lutz, H., & Palenga-Möllenbeck, E. (2012). Care workers, care drain, and care chains: Reflections on care, migration, and citizenship [Article]. Social Politics , 19 (1), 15–37. https://doi.org/10.1093/sp/jxr026

Malhotra, R., Arambepola, C., Tarun, S., de Silva, V., Kishore, J., & Østbye, T. (2013). Health issues of female foreign domestic workers: A systematic review of the scientific and gray literature. International Journal of Occupational and Environmental Health , 19 (4), 261–277. https://doi.org/10.1179/2049396713y.0000000041

MalitJr., F. T., & Naufal, G. (2016). Asymmetric information under the Kafala Sponsorship System: Impacts on foreign domestic workers’ income and employment status in the GCC countries. International Migration , 54 (5), 76–90. https://doi.org/10.1111/imig.12269

Marchetti, S., Cherubini, D., & Geymonat, G., G (2021). Global domestic workers . Bristol University.

MenjÍVar, C., & Salcido, O. (2002). Immigrant women and domestic violence: Common experiences in different countries. Gender & Society , 16 (6), 898–920. https://doi.org/10.1177/089124302237894

Momsen, J. H. (2003). Gender, migration and domestic service . Routledge.

Page, M. J., Moher, D., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., & McKenzie, J. E. (2021). PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ , 372 , n160. https://doi.org/10.1136/bmj.n160

Pande, A. (2021). Feminization of Indian Migration: Patterns and prospects. Journal of Asian and African Studies , 57 (6), 1249–1266. https://doi.org/10.1177/00219096211049568

Parrenas, R. S. (2001). Mothering from a Distance: Emotions, gender, and intergenerational relations in Filipino transnational families. Feminist Studies , 27 (2), 361–390.

Parreñas, R. S., & Silvey, R. (2021). The governance of the Kafala system and the punitive control of migrant domestic workers. Population Space and Place , 27 (5), e2487. https://doi.org/10.1002/psp.2487

Peng, Y., & Wong, O. M. H. (2015). Who takes Care of my left-behind children? Migrant mothers and caregivers in transnational child care. Journal of Family Issues , 37 (14), 2021–2044. https://doi.org/10.1177/0192513X15578006

Pineros-Leano, M., Yao, L., Yousuf, A., & Oliveira, G. (2021). Depressive symptoms and emotional distress of transnational mothers: A scoping review. Frontiers in Psychiatry , 12 , 574100. https://doi.org/10.3389/fpsyt.2021.574100

Platt, M., Yeoh, B. S. A., Acedera, K. A., Yen, K. C., Baey, G., & Lam, T. (2016). Renegotiating migration experiences: Indonesian domestic workers in Singapore and use of information communication technologies [Article]. New Media and Society , 18 (10), 2207–2223. https://doi.org/10.1177/1461444816655614

Pratt, G. (1999). From registered nurse to registered nanny: Discursive geographies of Filipina domestic workers in Vancouver, B.C [Article]. Economic Geography , 75 (3), 215–236. https://doi.org/10.1111/j.1944-8287.1999.tb00077.x

Qu, X., Wang, X., Huang, X., Ashish, K. C., Yang, Y., Huang, Y., Chen, C., Gao, Y., Wang, Y., & Zhou, H. (2020). Socio-emotional challenges and development of children left behind by migrant mothers. J Glob Health , 10 (1), 010806. https://doi.org/10.7189/jogh.10.010806

Raijman, R., Schammah-Gesser, S., & Kemp, A. (2003). International Migration, domestic work, and Care Work: Undocumented latina migrants in Israel. Gender and Society , 17 (5), 727–749. http://www.jstor.org/stable/3594707

Rak, P. (2021). Modern Day Slavery: The Kafala System in Lebanon. J Harvard International Review , 42 (1), 57–61.

Google Scholar  

Regmi, P. R., Aryal, N., van Teijlingen, E., Simkhada, P., & Adhikary, P. (2020). Nepali Migrant Workers and the need for pre-departure training on Mental Health: A qualitative study. Journal of Immigrant and Minority Health , 22 (5), 973–981. https://doi.org/10.1007/s10903-019-00960-z

Rother, S. (2017). Indonesian migrant domestic workers in transnational political spaces: Agency, gender roles and social class formation [Article]. Journal of Ethnic and Migration Studies , 43 (6), 956–973. https://doi.org/10.1080/1369183X.2016.1274567

Salazar Parreñas, R. (2000). Migrant Filipina domestic workers and the international division of reproductive labor [Article]. Gender and Society , 14 (4), 560–580. https://doi.org/10.1177/089124300014004005

Shah, N. M., Shah, M. A., Chowdhury, R. I., & Menon, I. (2002). Foreign domestic workers in Kuwait. Who employs how many [Article]. Asian and Pacific Migration Journal , 11 (2), 247–269. https://doi.org/10.1177/011719680201100204

Shah, N., Badr, H., & Shah, M. (2012). Foreign live-in domestic workers as caretakers of older Kuwaiti men and women: Socio-demographic and health correlates [Article]. Ageing and Society , 32 (6), 1008–1029. https://doi.org/10.1017/S0144686X11000778

Shewamene, Z., Zimmerman, C., Hailu, E., Negeri, L., Erulkar, A., Anderson, E., Lo, Y., Jackson, O., & Busza, J. (2022). Migrant Women’s Health and Safety: Why Do Ethiopian Women Choose Irregular Migration to the Middle East for Domestic Work? International Journal of Environmental Research and Public Health , 19 (20). https://doi.org/10.3390/ijerph192013085

Silvey, R. (2004). Transnational migration and the gender politics of scale: Indonesian domestic workers in Saudi Arabia [Article]. Singapore Journal of Tropical Geography , 25 (2), 141–155. https://doi.org/10.1111/j.0129-7619.2004.00179.x

Silvey, R. (2006). Consuming the transnational family: Indonesian migrant domestic workers to Saudi Arabia [Article]. Global Networks , 6 (1), 23–40. https://doi.org/10.1111/j.1471-0374.2006.00131.x

Siruno, L., Swerts, T., & Leerkes, A. (2022). Personal recognition strategies of undocumented migrant domestic workers in the Netherlands. Journal of Immigrant & Refugee Studies , 1–14. https://doi.org/10.1080/15562948.2022.2077503

Stasiulis, D., & Bakan, A. B. (1997). Negotiating citizenship: The case of foreign domestic workers in Canada [Article]. Feminist Review , 57 (1), 112–139. https://doi.org/10.1080/014177897339687

Statista (2022). Number of migrant domestic workers employed in Singapore from 2013 to 2021 . Retrieved March 25, from https://www.statista.com/statistics/953137/singapore-foreign-domestic-workers-employed/#:~:text=In%202021%2C%20there%20were%20approximately,during%20the%20COVID%2D19%20pandemic

Strangio, S. (2022). ‘Just a Maid’: Report Highlights Emotional Abuse of Migrant Domestic Workers in Singapore. THE DIPLOMAT . https://thediplomat.com/2022/06/just-a-maid-report-highlights-emotional-abuse-of-migrant-domestic-workers-in-singapore/

Sweileh, W. M. (2018). Global research output in the health of international arab migrants (1988–2017). Bmc Public Health , 18 (1), 755. https://doi.org/10.1186/s12889-018-5690-4

Sweileh, W. M., Wickramage, K., Pottie, K., Hui, C., Roberts, B., Sawalha, A. F., & Zyoud, S. H. (2018). Bibliometric analysis of global migration health research in peer-reviewed literature (2000–2016). Bmc Public Health , 18 (1), 777. https://doi.org/10.1186/s12889-018-5689-x

Tan, E. K. (2010). Managing female foreign domestic workers in Singapore: Economic pragmatism, coercive legal regulation, or Human rights? [Article]. Israel Law Review , 43 (1), 99–125. https://doi.org/10.1017/S0021223700000066

Tay, M., & Kong, K. H. (2020). Caregiver burden in familial caregivers and foreign domestic workers of patients with traumatic brain injury in a multi-ethnic Asian population. Brain Inj , 34 (11), 1513–1517. https://doi.org/10.1080/02699052.2020.1809709

United Nations, W. O. M. E. N. (2016). Infographic: Migrant domestic workers - Facts everyone should know . Retrieved March 5th from https://www.unwomen.org/en/digital-library/multimedia/2016/9/infographic-migrant-domestic-workers

van Eck, N. J., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics , 84 (2), 523–538. https://doi.org/10.1007/s11192-009-0146-3

Wickramage, K., De Silva, M., & Peiris, S. (2017). Patterns of abuse amongst Sri Lankan women returning home after working as domestic maids in the Middle East: An exploratory study of medico-legal referrals. Journal of Forensic and Legal Medicine , 45 , 1–6. https://doi.org/10.1016/j.jflm.2016.11.001

Yeoh, B. S. A., & Huang, S. (1998). Negotiating public space: Strategies and styles of migrant female domestic workers in Singapore [Article]. Urban Studies , 35 (3), 583–602. https://doi.org/10.1080/0042098984925

Yeoh, B. S. A., & Huang, S. (1999). Migrant female domestic workers: Debating the economic, social and political impacts in Singapore [Article]. International Migration Review , 33 (1), 114–136. https://doi.org/10.2307/2547324

Yeoh, B. S. A., & Huang, S. (2000). Home and away: Foreign domestic workers and negotiations of diasporic identity in Singapore. Women’s Studies International Forum , 23 (4), 413–429. https://doi.org/10.1016/S0277-5395(00)00105-9

Yi, G., Liu, L., Manio, M., Latkin, C., & Hall, B. J. (2020). The influence of housing on sexual and reproductive health status and service utilization among Filipina migrant domestic workers in Macao (SAR), China: A population survey. J Migr Health , 1-2 , 100007. https://doi.org/10.1016/j.jmh.2020.100007

Zahreddine, N., Hady, R. T., Chammai, R., Kazour, F., Hachem, D., & Richa, S. (2014). Psychiatric morbidity, phenomenology and management in hospitalized female foreign domestic workers in Lebanon [Article]. Community Mental Health Journal , 50 (5), 619–628. https://doi.org/10.1007/s10597-013-9682-7

Download references

Acknowledgements

The author would like to acknowledge S.Z and A.A for helping in the validation of search strategy.

Author information

Authors and affiliations.

Department of Physiology, Pharmacology/Toxicology, Division of Biomedical Sciences, College of Medicine and Health Sciences, An-Najah National University, Nablus, Palestine

Waleed M. Sweileh

You can also search for this author in PubMed   Google Scholar

Contributions

W.S started the idea, did the analysis, wrote and submitted the manuscript.

Corresponding author

Correspondence to Waleed M. Sweileh .

Ethics declarations

Competing interests.

The author declares that he has no financial or non-financial competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Sweileh, W.M. Analysis and mapping of global research publications on migrant domestic workers. CMS 12 , 38 (2024). https://doi.org/10.1186/s40878-024-00401-3

Download citation

Received : 12 April 2023

Accepted : 09 September 2024

Published : 17 September 2024

DOI : https://doi.org/10.1186/s40878-024-00401-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Migrant domestic workers
  • Research analysis

database migration research article

database migration research article

NOTICE: ProQuest e-book platforms will be down for planned maintenance on Saturday, August 12, from 12:00 pm to 3:00 pm CST.

RT 325: Radiation Therapy Readings, Writing, and Research, Weege, FA 2024

  • Journals & Databases

Use this shortcut to the Journals & Newspapers Locator  in Search@UW to find Murphy Library holdings of periodicals (journals, magazines, newsletters), newspapers, annual reports and yearbooks.    Go -->

Use this shortcut to the Murphy Library Catalog to check if the library provides access to a specific periodical title (whether online, in print, or on microform), and if so, what issues are available and how to access them. 

(Note that this is not a tool to search for journal articles !)

Browse Journals

About literature reviews, finding articles.

  • Literature Review Databases
  • Additional Databases

Access funded by the University of Wisconsin System

The following are some of the core journal titles in radiation therapy. View Section 1.2 of the AMA guide for a list of more journal titles.

  • International journal of radiation oncology, biology, physics
  • Medical dosimetry
  • Medical physics
  • Practical radiation oncology
  • Radiation therapist
  • Radiologic Technology

On the Health Professions library guide, you can find a page listing DOS Core Journals .

  • AMA Guide - UWL Radiation Therapy Program - 2024 View Section 1.2 (Researching) of the document to learn about the specific instructions your program expects you to follow for differentiating and selecting sources.
  • Advanced Search Tips - from Brown University Library Visit this link to learn techniques for searching and advanced functions for operating the databases.
  • Types of Research - from Kent Library at Southeast Missouri State University Visit this link to read general definitions of Qualitative vs. Quantitative research and to find videos explaining futher.
  • Types of Studies in Medicine - from NIH NCBI Visit this page from the National Center for Biotechnology Information to read a brief overview of types of scientific experiments and analyses.
  • What is a Scholarly Source? - from UWGB Libraries Visit this link to help you understand what scholarly sources are, distinguish them from non-scholarly sources, and recognize the importance of peer-review.
  • << Previous: Books
  • Next: Citation >>
  • Last Updated: Sep 26, 2024 10:33 AM
  • URL: https://libguides.uwlax.edu/rt325

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 05 August 2024

Rare and highly destructive wildfires drive human migration in the U.S.

  • Kathryn McConnell   ORCID: orcid.org/0000-0003-4395-5483 1 , 2 ,
  • Elizabeth Fussell   ORCID: orcid.org/0000-0003-2812-7719 1 , 3 ,
  • Jack DeWaard   ORCID: orcid.org/0000-0002-9436-3069 4 , 5 ,
  • Stephan Whitaker   ORCID: orcid.org/0000-0002-8057-9184 6 ,
  • Katherine J. Curtis   ORCID: orcid.org/0000-0002-7003-7381 7 ,
  • Lise St. Denis   ORCID: orcid.org/0000-0001-7010-9632 8 ,
  • Jennifer Balch   ORCID: orcid.org/0000-0002-3983-7970 8 &
  • Kobie Price 9  

Nature Communications volume  15 , Article number:  6631 ( 2024 ) Cite this article

2044 Accesses

56 Altmetric

Metrics details

  • Climate-change impacts
  • Environmental impact

The scale of wildfire impacts to the built environment is growing and will likely continue under rising average global temperatures. We investigate whether and at what destruction threshold wildfires have influenced human mobility patterns by examining the migration effects of the most destructive wildfires in the contiguous U.S. between 1999 and 2020. We find that only the most extreme wildfires (258+ structures destroyed) influenced migration patterns. In contrast, the majority of wildfires examined were less destructive and did not cause significant changes to out- or in-migration. These findings suggest that, for the past two decades, the influence of wildfire on population mobility was rare and operated primarily through destruction of the built environment.

Similar content being viewed by others

database migration research article

Human and infrastructure exposure to large wildfires in the United States

database migration research article

Wildfire risk for global wildland–urban interface areas

database migration research article

Spatial and temporal pattern of wildfires in California from 2000 to 2019

Introduction.

In recent decades, wildfire destruction of the built environment has grown dramatically, posing a growing threat to human settlements across the U.S. 1 , 2 . This trend is driven in part by changes in wildfire patterns, with records showing increases in total acres burned, number of large fires, and length of fire weather season 3 , 4 , 5 , 6 . Models project that, under climate change, the potential for very large fires will increase in the coming decades 7 . Concurrent to the rise in wildfire frequency and severity, the number of people living in high fire risk regions has increased, with substantial population and housing growth in areas in close proximity to or intermixed with wildlands 2 , 8 . Consequently, an increasing number of dwellings and their residents are exposed to wildfires.

The growing scale of wildfire destruction to buildings has the potential to impact human mobility patterns, yet little is known about the relationship between wildfire destruction and human migration. Wildfire-related mobility is notably absent in systematic reviews of environmental migration literature 9 , 10 , 11 , 12 , a gap highlighted by the Intergovernmental Panel on Climate Change 13 .

Previous studies of other environmental hazards indicate that climate-migration relationships vary widely in their direction and magnitude, differing between hazard types, as well as by the geographic, social, and economic contexts of affected populations 9 , 11 , 12 , 13 . While in some studies, weather and climate extremes are associated with heightened out-migration, in others, hazards cause minimal impact and relative immobility 14 , 15 , 16 . Given this heterogeneity of hazard-mobility relationships, researchers working in this field do not expect consistent or simple “push” effects, in which residents necessarily move away from hazardous places 9 . Instead, environmental migration scholarship investigates a range of different migration and non-migration responses to environmental change, with special attention to distinct hazards and the thresholds at which migratory effects occur 11 , 12 .

Within existing environmental migration research, the most relevant studies for comparison to wildfire are those that examine sudden-onset hazards, such as floods, tsunamis, and hurricanes. Compared to the stronger migratory effects of slow-onset environmental changes such as drought or precipitation anomalies, sudden-onset events are more often found to have null or even negative effects on migration. Prior research has suggested that this relative immobility results from financial liquidity constraints, as household wealth is destroyed by the event, constraining funds needed to move 10 , 11 , 12 , 17 . Some describe those experiencing this form of involuntary immobility as “trapped populations” 18 . While certain households and populations may have more limited capability to migrate, others may be able to move but desire to remain in place. Such voluntary immobility in the face of intensifying environmental hazards can be due to a range of factors, such as the strength of place-embedded social and economic networks, the draw of local environmental amenities, and residents’ ability to mitigate localized hazard exposure 19 , 20 , 21 .

In the context of wildfire, immobility dynamics may play out in a number of ways. Recent research has linked rising housing costs in urban cores of California to the expansion of less costly housing development in exurban and rural wildfire-prone places 22 . This trend suggests that some residents of fire-prone places may have limited ability to move away from hazards due to regional housing affordability constraints. Other researchers have emphasized the pull of environmental amenities, drawing residents to voluntarily live in fire-prone places 8 , 20 . This research indicates that immobility may be a prevalent response to wildfire, and one that should be given equal attention as environmentally-linked mobility 14 .

While immobility is often documented in response to sudden-onset hazards, select studies on very extreme events—such as Hurricane Katrina in the U.S. Gulf Coast, Hurricane Maria in Puerto Rico, and the Indian Ocean Tsunami in Indonesia—have also illustrated clear patterns of heightened post-disaster out-migration, or, displacement 23 , 24 , 25 , 26 . Collectively, these findings illustrate that sudden-onset environmental shocks may cause a continuum of migratory effects, ranging from immobility to large-scale out-migration. Such variability speaks to the importance of investigating the impacts of hazards across a spectrum of severity levels, ranging from the most extreme events to those less severe but more common hazards.

Scholars have recently begun studying wildfire-related mobility, for instance through investigation of migration intentions related to wildfire and wildfire smoke 27 , 28 and household decisions to remain in place after wildfire 29 . Several quantitative studies have documented patterns of temporary evacuation after major wildfires and long-term migration following subsets of disaster-level fires. These studies report heterogeneous effects across different events, in some cases documenting minimal changes to migration patterns, while in others showing heightened out-migration and reduced in-migration 20 , 30 , 31 , 32 .

To provide greater insight and more generalizable knowledge of wildfire-mobility dynamics, we investigate patterns of out- and in-migration following highly destructive wildfires that occurred in the contiguous U.S. over more than two decades. Building on Hoffman et al.’s distinction between direct and indirect environmental migration drivers 12 , our study tests two hypotheses on the relationship between wildfires and human mobility, positing that wildfires influence migration patterns through two broad pathways: (1) through direct damage to the built environment, and (2) through indirect mechanisms other than impacts to the built environment.

In the first proposed pathway, wildfires drive migration through their effects on the built environment, whereby destroyed structures result in out-migration via housing and other infrastructure loss. We define “structures” broadly to include residential, commercial, outbuilding, and mixed-use buildings 33 . We interpret heightened out-migration following highly destructive wildfires as evidence of damage-driven migration effects, akin to hazard-driven displacement. Our data show that the number of structures destroyed per wildfire has a long right skew, in which a small number of fires caused an outsized proportion of damage 33 . Given this distribution, we anticipate that wildfire effects on migration via the built environment would likely be non-linear, wherein as the number of structures destroyed grows, the number of out-migrants will increase at an increasing rate as local areas are unable to accommodate residents whose residences were destroyed. Such non-linear effects have been documented in the cases of extreme temperature variations 34 and rainfall 17 , among others. Thus, our first hypothesis is that damage-driven out-migration will be greatest in areas experiencing high levels of fire-related destruction in the event period.

In the second proposed pathway of wildfire-driven mobility, we hypothesize that wildfire may influence migration indirectly through a range of other mechanisms that are distinct from direct displacement via structure loss. These indirect mechanisms can be broadly characterized as changes in residential preferences of where to live and/or residents’ capabilities to realize these preferences 12 , 21 , 35 . For instance, residential preferences may be influenced by wildfire-related changes to natural amenities, air quality, local economic conditions, and perceptions of future fire risk and potential losses. Residents’ mobility capabilities may also change, for instance through impacts to household finances or reduced access to homeowner’s insurance 19 . While we are unable to parse individuals’ migration motives with our data, we broadly test for the presence of indirect wildfire effects by examining migratory responses to wildfires in places experiencing lesser impacts to the built environment and at later time periods relative to the event.

If indirect mechanisms are driving wildfire-related migration, we would expect to observe the following changes. First, out-migration will increase in areas that experience lower levels of wildfire destruction, in particular following events in which too few structures are destroyed to directly displace a large number of residents. Second, out-migration will be elevated in burned areas during the temporal period beyond the disaster event, for example several years afterward. In this period, structure loss is unlikely to be the motivation, but other changes caused by the fire may influence residential preferences and/or capabilities, resulting in migration. Third, in-migration to fire-affected places will decline as potential in-migrants seek to avoid the fire-affected destination. Any of these changes would support the hypothesis that indirect mechanisms based on changes in residential preferences and/or mobility capabilities drive wildfire-related migration.

The null hypothesis to the direct and indirect hypotheses of wildfire-induced mobility is, conversely, immobility: migration flows into and out of fire-affected areas will not change in response to wildfire events. Immobility is informed by residents’ desires to remain in place or to move, their capabilities to realize those aspirations 14 , 16 , 19 , and protections, such as home hardening or firefighting resources, that allow people to remain in place. As such, immobility in the face of wildfire destruction could reflect voluntary immobility of residents’ desire to remain living in fire-prone places, but also may reflect certain populations being involuntarily “trapped,” or, without sufficient capability or resources to move away 16 . Observing immobility would be in line with prior studies of comparable sudden-onset hazards 11 , 12 , 17 , and would correspond with the expectation of housing affordability constraints on out-migration 22 as well as environmental amenity pulls to remain in fire-affected places 8 , 20 . The null hypothesis, therefore, is that no change in out- or in-migration will be observed in wildfire-affected areas relative to neighboring, unaffected areas.

In this work, we analyze the migration effects of the top 10% most destructive wildfires in the contiguous U.S. ( N  = 519) between 1999 and 2020. We construct a temporally and spatially harmonized dataset that combines data on wildfire-related structure loss with data on migration at the census tract scale, which are the most comparable spatial unit to neighborhoods 36 . The structure loss data are from the U.S. National Incident Command System/Incident Status Summary Forms (hereafter “ICS”) 33 , linked to two sets of wildfire spatial burn footprints 37 , 38 . Together, these data offer one of the most comprehensive and detailed data sources of wildfires and their impacts within the U.S. The migration data are based on the Federal Reserve Bank of New York/Equifax Consumer Credit Panel (CCP) and estimate the number of credit-visible residents whose address changes tracts between two adjacent quarters. The CCP is an anonymous random sample from the Equifax credit files which can be used to calculate quarterly estimates of the probability of in-migration to and out-migration from census tracts 39 , 40 . These data allow us to investigate the effects of wildfire destruction on human migration, stratified by level of fire severity.

Extreme, outlying wildfire events drive the majority of structure loss

The majority of wildfires ignited between 1999 and 2020 caused no damage to the built environment ( N  = 29,216, 84.4% of all incidents), with a relatively small proportion destroying one or more structures ( N  = 5406, 15.6%) (Fig.  1c ). Within this subset of “destructive wildfires,” levels of destruction were non-linear across events; a small number of wildfires destroyed a disproportionately large portion of all structures (Fig.  1a ). During the period examined, the top ten most destructive fires caused 39.5% of all wildfire-related structure loss, and the single largest event, the 2018 Camp Fire, was responsible for 17.2% of all wildfire structure loss over more than two decades. We focus our analysis on the top decile of most destructive wildfires, further stratifying this subset of events by severity (Fig.  1b ).

figure 1

a Boxplots show the distribution of annual structure damage per destructive wildfire among all wildfires reported by the ICS dataset in the U.S. between 1999 and 2020 that destroyed 1 or more structures ( N  = 5406). The left whisker indicates the minimum value to the 25th percentile, the right whisker indicates the 75th percentile to the maximum value, the left side of the box indicates the 25th percentile, the right side of the box indicates the 75th percentile, and the line within the box indicates the median. Red dots indicate extreme events that destroyed more structures than maximum values. The dotted blue line indicates a global median of structure damage across all years (2 structures destroyed). While the majority of destructive wildfires affected a relatively small number of structures (90% impacted fewer than 14), a small number of events had an outsized contribution to the total number of structures destroyed. b Figure shows the probability distribution of structures destroyed per wildfire event among the top decile of most destructive wildfires that include spatial details the ICS dataset in the contiguous U.S. from 1999 to 2020 ( N  = 529). Within this top decile of wildfires (those that destroyed between 14 to 18,804 structures), the count of structures destroyed per event is highly right skewed. The figure shows how we stratified events for subsequent analysis into the less destructive portion of the decile distribution (green line), more destructive portion of the decile distribution (gold line), and the single most destructive event in the distribution, the Camp Fire (red point). We also analyzed the full decile of events (blue line). c Map shows the geographic distribution of wildfires with destruction levels and points of origin reported in the ICS dataset from 1999 to 2020 in the contiguous U.S. ( N  = 32,296). Each point on the map represents a wildfire point of origin, where the color indicates level of structure loss caused by the fire. Blue dots indicate fires that caused no structure loss; yellow dots indicate the majority of destructive wildfires (90%) that destroyed 1–13 structures; and red dots indicate the most destructive wildfires (top 10%), which destroyed between 14 and 18,804 structures. We focused our analysis on the latter group, analyzing only the most destructive wildfires. Sources: Wildfire data are from the U.S. National Incident Management System/Incident Command System 33 and state boundaries are from the U.S. Census Bureau.

Wildfire structure loss drives increased out-migration only at highest severity levels

Our results indicate that, in the rare cases in which wildfires influenced migration, they did so through our first hypothesized pathway: direct impacts to the built environment (Table  1 , Fig.  2 ). Wildfires were only associated with heightened out-migration in tracts that experienced the highest levels of structure loss, indicating that wildfire effects on migration were non-linear and only observed beyond a certain destruction severity threshold. Furthermore, migratory effects were primarily constrained to the first year following the event, and, in most cases, did not extend beyond this initial time period.

figure 2

a Figures show evolving, unweighted out-migration probabilities (left) and in-migration probabilities (right) among three subsets of destructive wildfires: (1) full top decile distribution (14–18,804 structures destroyed, N = 519 wildfires), (2) less destructive portion of the top decile (14–257 structures destroyed, N  = 463 wildfires), and (3) more destructive portion of the top decile (258–7010 structures destroyed, N = 55 wildfires). Control tract migration probabilities are shown in blue, purple, and green. Vertical dashed line indicates the quarter in which the wildfire occurred. b Figures show evolving, unweighted out-migration probabilities (left) and in-migration probabilities (right) before and after the 2018 Camp Fire. Control tract migration probabilities are shown in blue, purple, and green. Vertical dashed line indicates the quarter in which the wildfire occurred. c For each wildfire event, we selected three rings of control tracts for each cluster of burned tracts (shown in red). Figure shows control selection for the 2000 Cerro Grande Fire in New Mexico. The buffer from the outer edge of burned tracts to 5 miles away is shown in blue; the buffer from 5 to 25 miles is shown in purple; and the buffer between 25 and 50 miles away from the edge of treated tracts is shown in green (left). These buffers are then intersected with spatially overlapping tracts (right). Sources: Migration data are from the Federal Reserve Bank of New York/Equifax Consumer Credit Panel, wildfire data are from the U.S. National Incident Management System/Incident Command System 33 , and tract boundaries are from the U.S. Census Bureau.

Our analysis presents results for the full decile of most destructive wildfires ( N  = 519), as well as for three subsets of these events stratified by destruction severity: the less destructive portion of the decile ( N  = 463), the more destructive portion of the decile ( N  = 55), and the most destructive event, the Camp Fire ( N  = 1). Each subset of wildfires is presented in its own row in Table  1 . To address the potential for spatial spillover and ensure the robustness of our findings, we report regression coefficients derived from comparison to three distinct sets of control groups. Each set of controls was selected from a different distance away from the burned tracts (0–5 miles, 5–25 miles, and 25–50 miles, shown in Fig.  2c ) to reflect both heterogeneity in and uncertainty about the extent of spatial spillover.

When analyzing the full top decile of destructive wildfires (between 14 and 18,804 structures destroyed per event), we observed significant and positive out-migration effects during the event quarter when using the 5-mile and 50-mile control sets (Table  1 , columns A and C). This migratory effect became larger in magnitude during the first year after the event, with estimates ranging from 0.0048 when using the 5-mile control set and 0.0044 when using the 50-mile control set (Table  1 , columns A-C). Put differently, burned tracts experienced 4–5 additional movers per thousand residents, on average, in the year after the fire compared to unburned tracts.

We subsequently analyzed different components of the full decile and observed that wildfires in the less destructive portion of the decile (between 14 and 257 structures destroyed) caused almost no significant changes to out-migration probability. There was only a slight decrease in out-migration probability in the event quarter when using the 25-mile control set (Table  1 , column B), however this effect was not evident when using either alternative control set. The lack of migratory effects among events with lower levels of structure loss and during any disaster or post-disaster time period means that, absent high levels of structure loss, we did not observe population-level migration changes that would indicate wildfires spurred changing residential preferences or capabilities and, subsequently, migration decisions.

When we next examined wildfires in the more destructive portion of the decile (between 258 and 7010 structures destroyed), a migratory effect associated with structure loss was clearly evident. Among the more destructive portion of the top decile, there were four additional out-migrants per thousand residents, on average, during the event quarter, however this effect was only observed when using the 50-mile control set (Table  1 , column C). The effect was more pronounced in the first year following the event, where we observed five and four additional out-migrants per thousand residents when using the 5-mile and 50-mile control sets respectively (Table  1 , columns A and C). We did not observe any significant differences in out-migration the second year following the event, indicating that, during this period, migration trends returned to a similar trajectory as their neighboring control tracts.

Turning to the fourth subset, which includes the single most destructive fire, we saw that the out-migration effect of the Camp Fire (18,804 structures destroyed) was larger in magnitude and longer in temporal duration than any other subset of destructive wildfires. This suggests that both migration driven directly by structure loss as well as indirect wildfire-related migration both occurred. During the event quarter, models indicate that burned tracts experienced between fifty-three and sixty-nine additional out-migrants per thousand residents compared to unburned control tracts (Table  1 , columns A-C). This substantial increase in out-migration immediately following the event indicates that the large scale of the Camp Fire’s destruction led to initial displacement through structure loss.

Following the event period, the migratory effect grew in magnitude during the first post-fire year, where burned tracts experienced between 68 and 83 additional out-migrants per thousand residents per quarter compared to unburned control tracts. This translates to a more than threefold increase in the magnitude of out-migration probability among burned tracts from the two years prior to the fire to the first year following the event quarter. Compared to the more destructive portion of the top decile (between 258 and 7010 structures destroyed), the migratory effect of the Camp Fire during the first year after the event was between fourteen and twenty times as large. Unlike any other subset of destructive wildfires, models indicate that the Camp Fire’s out-migration effect was still significant in the second year after the event. Burned tracts experienced between nineteen and 26 additional out-migrants per thousand residents when using the 5- and 25-mile control sets respectively (Table  1 , column A and C). This elevated out-migration trend in the two full years following the Camp Fire provides evidence supporting our hypothesis of indirect wildfire effects on migration, which we theorize are driven by changing residential preferences and capabilities, rather than destruction of the built environment. After the initial spike in out-migration driven by rapid structure loss, residents continued to leave the area.

Wildfire structure loss has minimal impact on in-migration trends

Finally, we examined trends in in-migration, hypothesizing that indirect effects of wildfires will result in reduced in-migration during and after the event period, as potential in-migrants avoid fire-affected places. Across the full top decile, less destructive, and more destructive portions of the top decile, there were no significant differences in post-fire in-migration among burned tracts relative to any set of control tracts (Table  1 , columns D–F). It was only following the Camp Fire that we observed a significant increase in in-migration probability, starting during the event quarter, where there were an additional 30 in-migrants per thousand residents relative to the 50-mile control set (Table  1 , column F). This positive effect on in-migration continued during the first year when using both the 25-mile and 50-mile control sets, and again during the second year, when using the 5- and 25-mile control sets (Table  1 , columns D-F). We interpret this increase in in-migration as evidence of what is known as “recovery migration,” wherein returning and new residents arrive in a disaster-affected area following an initial displacement event 41 , 42 .

When examining parallel trend plots for Camp Fire in-migration (Fig.  2b ), we observed some evidence of spatial spillovers in the nearest set of control tracts, those between zero and five miles from burned tracts. As the red line indicating mean in-migration probability in burned tracts rises and remains elevated during the event and post-event quarters, so too does the mean in-migration probability for the 5-mile ring, which is shown in blue. The two trends evolve along very similar trajectories, whereas in-migration among 25-mile and 50-mile control tracts remains relatively flat in the post-year period. This spatial spillover is reflected in the non-significance of coefficients for the 5-mile ring comparison in the event and post-event year interaction terms (Table  1 ). Given how large the effect of the Camp Fire was on out-migration, it is possible that this in-migration spillover reflects residents leaving the immediately burned area and moving into nearby tracts.

Despite the robust growth of climate migration research over the past decade 11 , 12 , wildfires remain understudied in this field 13 . Existing research on the effects of comparable sudden-onset hazards indicates that a spectrum of different migratory responses are possible. On one hand, many past studies have shown that such events result in relative immobility 11 , 12 , 17 . However, on the other hand, studies focused on extremely destructive hurricanes and tsunamis have documented heightened out-migration and subsequent recovery migration 23 , 24 , 25 , 26 , 41 , 42 . Our analysis of wildfires across a range of destruction levels reflects this heterogeneity of effects observed in prior literature, illustrating the prevalence of severity thresholds at which wildfires influence migration in the U.S. We show that immobility was the most common response to destructive wildfires, however, for the smaller number of highly destructive fires, we observed increased out-migration.

Our study draws on wildfire data that document exact wildfire structure loss counts 33 , allowing us to stratify our analysis by severity level and to subsequently test for different types of wildfire-related migration. We paired these data with migration estimates from the Federal Reserve Bank of New York/Equifax Consumer Credit Panel, which has been minimally used for migration research but offers improved spatial resolution over traditional migration data. Together, these data sources make possible analysis at the census tract scale, which approximates neighborhoods, offering a level of spatial granularity that has not been previously possible in most multi-decadal environmental migration studies. By analyzing a large number of events, our analysis further provides generalizable findings on an understudied hazard within environmental migration scholarship.

We investigated the 519 most destructive wildfires in the contiguous U.S. between 1999 and 2020, examining direct and indirect pathways of wildfire-driven impacts on human migration. We first tested for migration effects through direct damage to the built environment, wherein heightened out-migration occurs following high levels of structure loss. Second, we examined whether wildfires influenced migration indirectly, through mechanisms apart from structure loss. Through this pathway, residential preferences to remain in place or migrate as well as residents’ capabilities to realize such preferences may change as a result of a fire, in turn affecting population-level migration trends.

Our findings support our first hypothesis, that wildfires affect migration patterns non-linearly at high levels of structure loss, as housing and other infrastructure are destroyed, and residents subsequently relocate. We found that only a small portion of destructive wildfires caused a migratory response, and such rare events influenced mobility primarily through destruction to the built environment. Even among the top ten percent of the most destructive wildfires in the contiguous U.S., it was only the most extreme among these events that caused an increase in out-migration. This was reflected in significant wildfire effects on out-migration among wildfires in the most destructive portion of the top decile (258–7010 structures destroyed), and the largest magnitude of out-migration effects observed after the single most destructive event, the 2018 Camp Fire. Migration following highly destructive events is in keeping with prior research that has documented direct displacement following extreme sudden-onset disasters 23 , 24 , 25 , 26 and non-linear relationships between migration and hazard severity 17 , 34 . It is also in line with emerging literature on wildfire-related mobility, which has documented temporary population displacement following two highly destructive events, the Mendocino Complex and Woolsey Fires in California 30 . However, our research design ultimately does not allow us to distinguish between residents moving away because their own dwellings were destroyed, because their local environment experienced high levels of destruction, or a combination of both. These possible pathways should be investigated in future research with qualitative methods focused on migration decision-making.

We further hypothesized that, separate from direct destruction to structures, wildfire impacts on the biophysical, economic, and social dynamics of a place would influence residents’ desires to remain living there and/or their ability to do so. However, in most cases, we did not find evidence indicating that wildfires influenced migration patterns through this indirect pathway via residents’ changing mobility preferences or capabilities. While wildfires can influence human migration at high levels of structure loss, the majority of wildfire events between 1999 and 2020 did not reach this destruction threshold and, subsequently, did not result in changes to existing migration trends. Following the majority of destructive wildfires in this study (14–257 structures destroyed, 89% of the 519 wildfires examined), we observed no significant increase in out-migration, indicating that immobility is a common response to wildfire, as it is among other hazards 11 , 12 , 14 , 15 , 16 , 17 . Furthermore, the rare spikes in out-migration following the most destructive events were almost all temporally constrained to the disaster period and first year following the event, and did not remain elevated in the second post-event year. The only exception to this trend was following the Camp Fire, in which out-migration from burned tracts remained elevated for the entire temporal period examined. Finally, we observed no declines in in-migration following wildfire events. Rather than being deterred by substantial wildfire destruction, in-migrants arrived in fire-affected tracts at the same rate that they did prior to the fire and relative to neighboring, unburned tracts. Together, these findings suggest that, during the study period, wildfires that did not cause very high levels of structure loss also did not influence residential mobility preferences and/or capabilities sufficiently to affect population-level migration trends.

Prior environmental migration scholarship conducted across hazard types finds broad variability in the direction and magnitude of migration response. We showed that, further, the migration response to the same hazard can vary widely across severity levels, increasing non-linearly at the highest level of impact. Our findings speak to the outsized effects of the most extreme environmental events on human migration. Fires are a common environmental phenomenon occurring across many parts of the U.S. (Fig.  1 ); it is only a much smaller subset of rare, but extremely destructive wildfires that have directly impacted migration through structure loss. This finding is important for situating a general understanding of wildfire-related mobility in the contiguous U.S. – namely, that immobility is the most common response to destructive wildfires. Climate mobility scholars have recently begun emphasizing such findings that have historically been treated as null results of lesser interest, arguing for the importance of studying immobility, especially in the context of intensifying environmental hazards 14 , 16 , 19 . Future research should investigate how both individual aspirations and macro-level structural conditions collectively inform the mobility of residents living in fire-prone places.

While we observed immobility as the most common response to destructive wildfires, we also know that the rate of wildfire-driven structure loss in the U.S. has been increasing over time 1 , with a substantial number of outlying extreme events occurring in recent years (Fig.  1 ). Absent major adaptation efforts, if the recent intensification of wildfire destructiveness continues, our findings suggest that we may expect to observe more direct displacement caused by extreme wildfires in the future. Although we did not observe substantial evidence of indirect wildfire-mobility effects, in which residents began leaving or avoided moving into fire-affected regions absent major structure loss, these effects may yet emerge in the future as the wildfire regime continues to change. Future research should examine how these pathways of wildfire-related migration evolve. Additionally, research in this area could examine whether more recent extreme events and events outside of the contiguous U.S., such as the 2023 Maui Fire in Hawaii, have similar migration effects as those found in this analysis.

Our research design provides a number of important advances to the emerging study of wildfire-related mobility. First, because our wildfire data measure exact counts of structures destroyed at a fine spatial scale, we were able to stratify our analyses by level of wildfire severity. This is a distinct approach from previous wildfire migration studies, which have either investigated a very small selection of events 30 , 31 , 32 , or have made minimal distinctions in event severity among many events 20 . By stratifying our analysis across levels of wildfire destruction, we are able to examine thresholds in wildfire-migration relationships, which is an important area of investigation given prior research suggesting non-linear migration responses to other environmental hazards 11 , 12 , 17 , 34 . Second, our data allow us to examine wildfire-related migration at the census tract scale, the spatial unit that most closely approximates neighborhoods 36 . This spatial scale is critical conceptually, given that prior migration research generally documents short-distance moves in response to environmental changes 15 . It is also technically important for the study of wildfires, because their area of direct impact tends to be small relative to the land area of counties, the coarser spatial unit used in prior studies most similar to ours 20 , 32 (see Supplementary Information  2 for a more detailed discussion of wildfires and spatial scale). Finally, compared to past studies, our quasi-experimental design comparing burned tracts to counterfactual unburned tracts offers improved causal identification of wildfire effects on migration. This approach has not previously been used to study wildfire-migration relationships and is especially important for research on environmental hazard impacts, given the potential for confounding events 43 . Our use of three distinct sets of control groups further allows us to ensure the robustness of our findings and identify spatial spillovers outside of immediately burned regions. Together, these elements of our research design allow us to comprehensively identify nonlinear effects of wildfire destruction at a local scale.

Our study has several limitations that we anticipate can be addressed as future research continues to expand knowledge on wildfire-mobility dynamics. First, our study design did not identify residential moves within tracts. As a result, it is possible that wildfire destruction may cause changes to population mobility at a finer spatial scale than we were able to observe. Such a pattern would be in keeping with findings from a Colorado-based survey, in which residents in a fire-affected region who desired to move preferred nearby destinations 27 . However, even if such within-tract residential mobility were taking place, it would still affirm our study’s broader conclusion: residents by and large did not migrate out of fire-prone areas after less destructive events. Additionally, our aggregated census tract-level approach is not able to distinguish between individual residents whose dwellings were located within a burned tract but not within the burn footprint, and those whose dwellings were located directly within the burn footprint. Because some wildfires fall within a census tract but do not burn that tract’s entire area, our approach necessarily included some unexposed residents in the treated condition. This may mean that our results underestimate the magnitude of migratory effects.

A second limitation of our approach is that the CCP migration data generally cannot be demographically decomposed 40 . Using these data, we are limited in our ability to analyze potentially different migration trends across axes of difference such as race, ethnicity, or nativity. While our approach provides a broad picture, we cannot determine whether particular demographic groups are more or less likely to migrate in response to wildfire destruction. This limitation is not unique to the CCP migration data, and we are aware of no publicly available migration data source that has extensive spatial and temporal coverage, fine-grained spatial and temporal units, and demographic decomposability. Future work creating such data would greatly expand the scope of environmental migration research, enabling lines of inquiry focused more explicitly on disproportionate impacts and questions of equity.

Finally, it is important to note that the CCP migration data only include residents with a Social Security Number (SSN) and a credit history, and therefore under-represent relatively younger and financially disadvantaged people 44 , 45 . As such, the CCP sample is not necessarily representative of the full U.S. population in all places. This challenge is endemic to many commonly-used forms of migration data; for instance, the Internal Revenue Service’s county-to-county migration data includes only residents who file taxes 46 and mobility data derived from mobile phones only sample from residents who use a cell phone 47 . There is a tradeoff to using migration data such as these that offer broad geographic and temporal coverage, but do not fully capture all subpopulations that may be especially vulnerable to wildfire impacts. For example, a case study of the Camp Fire found that the residential structure types most likely to house lower-income residents, mobile home residents, and renters had a higher probability of being destroyed in the fire, suggesting that these populations were more susceptible to housing loss due to characteristics of the built environment 48 . While we clearly detected a significant migratory effect from the Camp Fire, the CCP’s underrepresentation of financially disadvantaged residents means that we may have underestimated the overall effect size for this particular event if these residents were not fully represented in the data. Similarly, in a case study of the 2017 Thomas Fire in California, researchers highlighted the ways that undocumented immigrants who worked in affected areas were both highly impacted by the fire but simultaneously not visible in official census statistics 49 . Focused attention on the experiences of vulnerable subpopulations with wildfire is needed, and must be conducted with tailored data that can overcome limitations of existing national-level datasets. Yet, such analyses would need to address considerable privacy concerns that arise when studying demographically identified groups at small spatial scales.

The heightened out-migration observed after relatively rare but highly destructive wildfires invites further study focused more closely on patterns of in-migration in the years following the event. The concept of “recovery migration” adopted in scholarship on hurricanes encompasses both returning residents and new in-migrants 41 , 42 , and others have further highlighted the temporary in-migration of individuals drawn by disaster cleanup employment 50 . This area of research is generally understudied relative to other aspects of environmental migration 10 , and future research should analyze these distinct forms of in-migration after destructive wildfires. Existing studies suggest that several possible dynamics may be at play, including post-wildfire gentrification 48 , as well as the continued push of residents into more affordable but also more fire-prone places 22 .

In this study, we present a broad examination of wildfire’s impacts on human migration patterns in the contiguous U.S., addressing a scarcity of wildfire-focused research in environmental migration scholarship 13 . Emerging scholarship on this topic to date has been geographically focused on North America, yet wildfires are a global phenomenon 5 , 6 . Prior research conducted in countries with substantial agricultural sectors generally finds more pronounced environmental migration effects, as environmental changes alter agricultural productivity, thereby influencing household income and subsequent migration 10 , 12 , 17 . This pattern suggests that, in different geographic contexts, wildfires may influence migration differently, with potentially stronger effects in agriculturally-dependent regions. Future research should investigate wildfire impacts on migration across the broad geography of fire-prone places, with special attention to the different causal pathways through which fire may influence mobility.

Data construction

We constructed a longitudinal dataset of wildfire destruction and quarterly out- and in-migration probabilities at the census tract scale. Wildfire destruction metrics were adapted from administrative records collected in the U.S. National Incident Management System/Incident Command System, archived by the interagency National Wildfire Coordinating Group, and subsequently processed by St. Denis et al. (“ICS”) 33 . The ICS records encompass all documented wildfires in the U.S. that require the establishment of an incident management team. Drawing on St. Denis et al.’s procedure to create a spatiotemporal version of the data, we used the ICS’s linkage to wildfire perimeters from the Monitoring Trends in Burn Severity database 38 and the Fire Events Delineation (FIRED) database 37 to produce census tract- and quarter-level wildfire data based on 2010 tract boundaries. We selected census tracts as our unit of analysis because they approximate a measure of neighborhoods, generally including between 1200 to 8000 residents 36 . No single unit of analysis perfectly corresponded to the wide range of wildfire burn footprint sizes in our data. However, the granularity of census tracts is better-suited to match the spatial scale of burn footprints, which are generally much smaller than the next largest administrative unit—counties—which have been used in prior wildfire research (see Supplementary Information  2 for additional details on spatial unit selection) 20 , 32 . We obtained tract and state boundaries from the U.S. Census Bureau through the National Historical Geographic Information System (NHGIS) and R tigris package respectively 51 .

The ICS dataset is one of the most comprehensive longitudinal sources of wildfire data for the U.S. For each wildfire event, the ICS reports the total number of structures destroyed, a measure that includes residential, commercial, outbuilding, and mixed-use structures. We utilize data from the full temporal period available in the most recent publication of the ICS, which covers 1999 through 2020. 1999 was the first year for which the National Wildfire Coordinating Group provided the raw data from which the ICS is produced. 2020 represents the most recent year through which the ICS has been cleaned 33 .

A major benefit of the ICS dataset is that it reports direct measures of hazard impact rather than the dollar value of damaged property. The latter approach to disaster data reporting, while commonly used, is unable to distinguish between the destruction of a small number of high-value structures and a high number of low-value structures. The conflation of number of structures damaged or destroyed with the estimated monetary value of damages to structures distorts damage estimates, overstating damages in areas with high property values and understating damages in areas with low property values. The ICS counts of destroyed or damaged structures thus provide a more direct measure of hazard impact 33 . However, it does not account for wildfire impacts on wildlands, agricultural lands, or livestock, which could potentially influence migration via impacts on environment-dependent livelihoods such as forestry, farming, or environmental amenity-based tourism.

Migration measures come from the Federal Reserve Bank of New York/Equifax Consumer Credit Panel (CCP). The CCP is an anonymous five percent random sample drawn from the credit histories maintained by Equifax. It contains panel data on over 10 million individuals. The consumer credit histories are built from the monthly reports Equifax receives from mortgage lenders, credit card issuers, student loan servicers, and other debt holders. Equifax uses an algorithm to identify each individual’s most likely current address from the addresses reported by all of a borrower’s creditors. Equifax provides the census tract containing the selected address in the CCP data. The street addresses themselves are withheld for anonymity, as are all names and Social Security Numbers. A unique anonymous identifier is assigned to each borrower, allowing researchers to build individual-level quarterly histories 40 . To account for differences in population size between tracts, we used the proportion of individuals in a tract who moved into or out of the tract as the dependent variable for our analysis. Unfortunately, the CCP does not contain demographic information on the borrowers, such as sex, race, ethnicity, or nativity.

The Federal Reserve Bank of New York/Equifax Consumer Credit Panel (CCP) has several advantages over other sources of data on residential migration. Compared to U.S. Census Bureau surveys that measure migration, such as the American Community Survey or the Current Population Survey, the CCP’s large sample size provides statistical power necessary for analyses at smaller spatial and temporal scales 45 . Compared to the widely-used Internal Revenue Service’s county-to-county migration estimates (IRS), which report total counts of migrants between county pairs, the CCP provides individual-level records which report residential locations quarterly, as opposed to annually, strengthening temporal inference. Further, the individual-level records can be aggregated to fit a spatial unit, such as a state, county, or census tract. The finer temporal and spatial scales that are possible with the CCP make it highly attractive for the study of environmental shocks and migration responses 32 .

The CCP also has several limitations. The data represent only those U.S. adults who have a Social Security Number (SSN) and a credit history. Therefore, coverage excludes the estimated 10–11% of adults who do not have a formal credit history and those without an SSN 44 . This means that younger and financially disadvantaged people are under-represented in the data 45 . As mentioned above, the CCP does not contain demographic information on the borrowers, such as sex, race, ethnicity, or nativity. These limitations mean that the dataset cannot be used to examine individual-level sociodemographic disparities in hazard-related migration.

Finally, to conduct our matching procedure, we processed tract-level landscape and population variables associated with wildfire risk 52 , 53 . These include elevation and slope derived from NASA’s 90 m SRTM digital elevation map 54 , and the percent of land in each tract belonging to specific land cover classes, derived from the 2019 National Land Cover Database 55 . Of available land cover classes, we utilized the percentage of a tract covered by forest, shrub/scrub, and developed land, which are associated with flammability 53 . We processed the variables above using Google Earth Engine’s cloud computing platform 56 . In addition to these landscape characteristics, we also included a tract’s total land area (where smaller indicates a more urbanized tract) and 2010 county-level population estimates 57 .

Stratification of wildfires by severity

Given the right skew of the wildfire destruction distribution, we chose to analyze only the most destructive decile of wildfires that destroyed structures (hereafter, “top decile”). The top decile encompasses events ranging from fourteen structures destroyed at the least destructive to 18,804 structures destroyed at the most destructive. However, even this most destructive top decile itself is right skewed, with the majority of events causing a lower level of destruction. For this reason, we subsequently stratified the top decile into four sets for analysis: (1) the full decile distribution ( n  = 519), (2) the less destructive portion of the decile distribution ( n  = 463), (3) the more destructive portion of the decile distribution excluding the Camp Fire ( n  = 55), and (4) the most destructive event in the decile distribution, the Camp Fire ( n  = 1) (shown in Fig.  1b ). We subdivided the top decile into these groups with the aid of Jenks natural breaks classification, which is a data classification method that minimizes variation within groups 58 .

Analytical strategy

We used a difference-in-differences (DID) strategy to model out-migration and in-migration probabilities in wildfire “treated” tracts (e.g., tracts containing the burn footprint) comparing them to unburned “control” tracts. We compared migration probabilities during the eight quarters preceding the event with the event quarter and eight quarters after the event. A separate DID model was fitted for each subset of events and control rings (see section 2.4). The model takes the form:

Where \({{mp}}_{{it}}\) is a measure of migration probability in tract i in time period t , which is defined as the total number of movers into or out of a tract divided by the total population of the tract at the start of the period. \({{Treat}}_{i}\) indicates whether a tract was burned in a wildfire event (“1”) or an unburned control tract (“0”). \({{Period}}_{t}\) indicates whether the time period was pre-fire (“0” is the reference category), the event quarter (“1”), the first year after the event quarter (“2”), or the second year after the event quarter (“3”). We modeled multiple post-event temporal periods rather than a binary post-fire period to test whether migratory effects differed over time. The interaction terms between \({{Treat}}_{i}\) and each of the three event and post event \({{Period}}_{t}\) terms are the primary DID coefficients of interest. They reflect whether the change in migration probability between the pre-fire period and subsequent time periods was significantly different between burned and unburned tracts. \({\varepsilon }_{{it}}\) represents residual errors. We report robust standard errors clustered at the tract level and include a Bonferroni-adjusted p value threshold of 0.0021. For ease of interpretation, we transformed the interaction coefficients to report X number of migrants per ten-thousand residents. We conducted analyses using the estimatr package 59 in R statistical software versions 4.3.3 and 4.4.0 and reported two-sided p values.

Applied research analyzing longitudinal data has often used fixed effects (FE) to address concerns about omitted variables. However, recent methodological research suggests that this approach is inappropriate for certain causal research questions, and does not yield readily-interpreted, nonparametric causal estimators 60 , 61 , 62 . For this reason, we did not include fixed effects in our models, and instead addressed potential omitted variables bias through a matching procedure (described below). Matching treatment and control groups based on observed covariate values has recently been advanced as an alternative to FE models 63 . We nevertheless conducted additional sensitivity tests comparing our primary non-FE models to those with tract FE, quarter FE, and two-way FE. We performed these tests on the upper decile of wildfires using 0–5 mile controls, and found no substantive differences in the direction, magnitude, or statistical significance of the DID estimates. Additionally, we ran the same set of models but with a single pre- and single post-fire period, rather than three event and post-event periods. Here, we found that model coefficients followed the same patterns as our primary specification models, with significant increases in out-migration and no significant changes in in-migration in the post-fire period. These tests indicate that our findings are robust across a range of alternative specifications.

Control selection

We matched control tracts to each treated tract through a two-step procedure. First, for each burned tract or cluster of tracts that correspond to a single wildfire, we calculated three rings of distance-based neighboring tracts. We did so by drawing buffers from the outer edge of burned tracts to 5 miles (“5-mile ring”), from 5 to 25 miles (“25-mile ring”), and from 25 to 50 miles (“50-mile ring”) (Fig.  2c , left). We then intersected these buffers with spatially overlapping tracts (Fig.  2c , right) to create the final control tract selection for a given wildfire.

We selected three distinct sets of controls to address the potential for spatial spillover, in which the effects of a destructive wildfire travel beyond the immediate area in which the incident occurred. A recent study of Australia’s Black Summer fires suggests that such spillovers can occur up to 5 km away from a directly burned area 43 . Because there is not sufficient empirical research from which to establish whether such spillovers are common across different wildfire events, we conducted analyses for each wildfire subset three times, each with a different ring of control tracts. Building these sensitivity tests into our analysis allowed us to rule out spatial spillovers for most wildfire subsets, and to identify a modest spatial spillover in the case of in-migration following the Camp Fire.

In cases in which a control tract also experienced a destructive wildfire within the seventeen-quarter observation window, the tract was removed from consideration as a control. This step ensured that treated units were not compared to control units that themselves were treated within the observation period. If a tract quarter was defined as a control for multiple fire-affected tracts, it was only counted once within a given pooled model. In- and out-migration probabilities vary more widely in tracts with small populations, which is in part due to the data’s small sample size within these tracts. To minimize the influence of these outliers, observations with an in-migration probability greater than two standard deviations above the full dataset’s mean in-migration were removed and observations with an out-migration probability greater than the maximum quarterly out-migration observed following the 2018 Camp Fire were removed.

After selecting three rings of potential control tracts for each wildfire, we next conducted coarsened exact matching (CEM) in order to balance covariates between treatment and control groups 64 . CEM has been used in prior disaster research to strengthen causal inference in quasi-experimental research designs 65 . We matched treatment and control tracts using a set of covariates that we selected based on their expected association with the treatment condition (experiencing a destructive wildfire) 66 . Matching was conducted separately within each subset of wildfire events using the MatchIt package in R 67 . While ICS wildfire data are available for Hawaii and Alaska, certain covariates did not include coverage in these states. We therefore constrained our analysis to the contiguous U.S.

To evaluate covariate balance before and after matching, we examined the standardized mean differences between treatment and control groups of each covariate (Supplementary Information Figs.  S.I.3 – S.I.6 ). After matching, standardized mean differences were nearly all at or below 0.1, which is a threshold at which covariates are considered to be well-balanced. The Camp Fire model was the primary exception, where we used a smaller selection of covariates (tract size and percent developed, forest, and shrub or scrub) and matched covariates were better-balanced but did not all fall below the preferred 0.1 standardized mean differences threshold. These limitations were due to the smaller set of treated and control groups available for analyzing a single event, in contrast to the much larger N available for aggregated event analyses. Overall, results suggested that CEM substantially improved covariate balance across treated and control groups, with minimal reduction in the number of observations used for analysis (usually <10–15%).

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

The wildfire data and covariates used for coarsened exact matching are publicly available at refs. 33 , 54 , 55 , 57 . Source data for figures are provided with this paper. The raw migration data from the Federal Reserve Bank of New York/Equifax Consumer Credit Panel (CCP) are available under restricted access to Federal Reserve System employees and cannot be shared due to Data Use Agreement terms.  Source data are provided with this paper.

Code availability

Codes developed to process the publicly available data listed above are available through OSF at https://osf.io/xa39e/ .

Higuera, P. E., Cook, M. C., Balch, J. K., Stavros, E. N. & St. Denis, L. Shifting social-ecological fire regimes explain increasing structure loss from Western wildfires. PNAS Nexus 2 , pgad005 (2023).

Radeloff, V. C. et al. Rising wildfire risk to houses in the United States, especially in grasslands and shrublands. Science 382 , 702–707 (2023).

Article   ADS   CAS   PubMed   Google Scholar  

Abatzoglou, J. T. & Williams, A. P. Impact of anthropogenic climate change on wildfire across western US forests. Proc. Natl Acad. Sci. USA 113 , 11770–11775 (2016).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Dennison, P. E., Brewer, S. C., Arnold, J. D. & Moritz, M. A. Large wildfire trends in the western United States, 1984–2011. Geophys. Res. Lett. 41 , 2928–2933 (2014).

Article   ADS   Google Scholar  

Jolly, W. M. et al. Climate-induced variations in global wildfire danger from 1979 to 2013. Nat. Commun. 6 , 7537 (2015).

Senande-Rivera, M., Insua-Costa, D. & Miguez-Macho, G. Spatial and temporal expansion of global wildland fire activity in response to climate change. Nat. Commun. 13 , 1208 (2022).

Barbero, R., Abatzoglou, J. T., Larkin, N. K., Kolden, C. A. & Stocks, B. Climate change presents increased potential for very large fires in the contiguous United States. Int. J. Wildland Fire 24 , 892–899 (2015).

Article   Google Scholar  

Hammer, R. B., Stewart, S. I. & Radeloff, V. C. Demographic trends, the wildland–urban interface, and wildfire management. Soc. Nat. Resour. 22 , 777–782 (2009).

Hunter, L. M., Luna, J. K. & Norton, R. M. Environmental dimensions of migration. Annu. Rev. Sociol. 41 , 377–397 (2015).

Article   PubMed   PubMed Central   Google Scholar  

Millock, K. Migration and environment. Annu. Rev. Resour. Econ. 7 , 35–60 (2015).

Kaczan, D. J. & Orgill-Meyer, J. The impact of climate change on migration: a synthesis of recent empirical insights. Clim. Change 158 , 281–300 (2020).

Hoffmann, R., Dimitrova, A., Muttarak, R., Crespo Cuaresma, J. & Peisker, J. A meta-analysis of country-level studies on environmental change and migration. Nat. Clim. Chang. 10 , 904–912 (2020).

Pörtner, H.-O. et al. Climate Change 2022: Impacts, Adaptation and Vulnerability. IPCC Sixth Assessment Report (IPCC, 2022).

Cundill, G. et al. Toward a climate mobilities research agenda: intersectionality, immobility, and policy responses. Glob. Environ. Change 69 , 102315 (2021).

Findlay, A. M. Migrant destinations in an era of environmental change. Glob. Environ. Change 21 , S50–S58 (2011).

Zickgraf, C. Theorizing (im)mobility in the face of environmental change. Reg. Environ. Chang. 21 (2021).

Bohra-Mishra, P., Oppenheimer, M. & Hsiang, S. M. Nonlinear permanent migration response to climatic variations but minimal response to disasters. PNAS 111 , 9780–9785 (2014).

DeWaard, J. et al. Operationalizing and empirically identifying populations trapped in place by climate and environmental stressors in Mexico. Reg. Environ. Chang. 22 , 29 (2022).

Schewel, K. Understanding immobility: moving beyond the mobility bias in migration studies. Int. Migr. Rev. 54 , 328–355 (2020).

Winkler, R. L., Rouleau, M. D. Amenities or disamenities? Estimating the impacts of extreme heat and wildfire on domestic US migration. Popul. Environ. https://doi.org/10.1007/s11111-020-00364-4 (2020).

Hunter, L. M. Migration and environmental hazards. Popul. Environ. 26 , 273–302 (2005).

Greenberg, M., Angelo, H., Losada, E. & Wilmers, C. C. Relational geographies of urban unsustainability: the entanglement of California’s housing crisis with WUI growth and climate change. Proc. Natl Acad. Sci. (Accepted for publication) (2024).

Fussell, E., Sastry, N. & Van Landingham, M. Race, socioeconomic status, and return migration to New Orleans after Hurricane Katrina. Popul. Environ. 31 , 20–42 (2010).

Sastry, N. & Gregory, J. The location of displaced New Orleans residents in the year after Hurricane Katrina. Demography 51 , 753–775 (2014).

Article   PubMed   Google Scholar  

DeWaard, J., Johnson, J. E. & Whitaker, S. D. Out-migration from and return migration to Puerto Rico after Hurricane Maria: evidence from the consumer credit panel. Popul. Environ. 42 , 28–42 (2020).

Gray, C., Frankenberg, E., Gillespie, T., Sumantri, C. & Thomas, D. Studying displacement after a disaster using large-scale survey methods: Sumatra after the 2004 Tsunami. Ann. Assoc. Am. Geogr. 104 , 594–612 (2014).

Nawrotzki, R. J., Brenkert-Smith, H., Hunter, L. M. & Champ, P. A. Wildfire-migration dynamics: lessons from Colorado’s Fourmile Canyon fire. Soc. Nat. Resour. 27 , 215–225 (2014).

Berlin Rubin, N. & Wong-Parodi, G. As California burns: the psychology of wildfire- and wildfire smoke-related migration intentions. Popul. Environ. 44 , 15–45 (2022).

Tinoco, N. Post-disaster (im)mobility aspiration and capability formation: case study of Southern California wildfire. Popul. Environ. 45 , 4 (2023).

Jia, S., Kim, S. H., Nghiem, S. V., Doherty, P. & Kafatos, M. C. Patterns of population displacement during mega-fires in California detected using Facebook Disaster Maps. Environ. Res. Lett. 15 , 074029 (2020).

Sharygin, E. Estimating migration impacts of wildfire: California’s 2017 North Bay Fires. In The Demography of Disasters: Impacts for Population and Place (eds. Karácsonyi, D., Taylor, A. & Bird, D.) https://doi.org/10.1007/978-3-030-49920-4_3 , 49–70 (Springer International Publishing, Cham, 2021).

DeWaard, J. et al. Migration as a vector of economic losses from disaster-affected areas in the United States. Demography 60 , 173–199 (2023).

PubMed   Google Scholar  

St. Denis, L. A. et al. All-hazards dataset mined from the US National Incident Management System 1999–2020. Sci. Data 10 , 112 (2023).

Mueller, V., Gray, C. & Kosec, K. Heat stress increases long-term human migration in rural Pakistan. Nat. Clim. Chang. 4 , 182–185 (2014).

Black, R. et al. The effect of environmental change on human migration. Glob. Environ. Chang. 21 , S3–S11 (2011).

U.S. Census Bureau. Glossary. https://www.census.gov/programs-surveys/geography/about/glossary.html (2022).

Balch, J. K. et al. FIRED (Fire Events Delineation): an open, flexible algorithm and database of US fire events derived from the MODIS burned area product (2001–2019). Remote Sens. 12 , 3498 (2020).

Eidenshink, J. et al. A project for monitoring trends in burn severity. Fire Ecol. 3 , 3–21 (2007).

Lee, D. & van der Klaauw, W. An Introduction to the New York Fed Consumer Credit Panel. https://www.newyorkfed.org/research/staff_reports/sr479.html (2010).

Whitaker, S. D. Big Data versus a survey. Q. Rev. Econ. Financ. 67 , 285–296 (2018).

Curtis, K. J., Fussell, E. & DeWaard, J. Recovery migration after Hurricanes Katrina and Rita: spatial concentration and intensification in the migration system. Demography 52 , 1269–1293 (2015).

Fussell, E., Curtis, K. J. & DeWaard, J. Recovery migration to the City of New Orleans after Hurricane Katrina: a migration systems approach. Popul. Environ. 35 , 305–322 (2014).

Akter, S. Australia’s Black Summer wildfires recovery: a difference-in-differences analysis using nightlights. Glob. Environ. Chang. 83 , 102743 (2023).

Brevoort, K. P., Grimm, P. & Kambara, M. Credit Invisibles and the unscored. Cityscape 18 , 9–34 (2016).

Google Scholar  

DeWaard, J., Johnson, J. & Whitaker, S. Internal migration in the United States: A comprehensive comparative assessment of the Consumer Credit Panel. Demographic Research 41 , 953–1006 (2019).

Gross, E. Internal revenue service area-to-area migration data: strengths, limitations, and current uses. Stat. Income, SOI Bull. 25 , 159 (2005).

Kang, Y. et al. Multiscale dynamic human mobility flow dataset in the U.S. during the COVID-19 epidemic. Sci. Data 7 , 1–13 (2020).

McConnell, K. & Braneon, C. V. Post-wildfire neighborhood change: evidence from the 2018 Camp Fire. Landsc. Urban Plan. 247 , 104997 (2024).

Méndez, M., Flores-Haro, G. & Zucker, L. The (in)visible victims of disaster: understanding the vulnerability of undocumented Latino/a and indigenous immigrants. Geoforum 116 , 50–62 (2020).

Fussell, E. Hurricane chasers in New Orleans: Latino immigrants as a source of a rapid response labor force. Hispanic J. Behav. Sci. 31 , 375–394 (2009).

Walker, K. E. tigris: An R package to access and work with geographic data from the US Census Bureau (2016).

Syphard, A. D. et al. The relative influence of climate and housing development on current and projected future fire patterns and structure loss across three California landscapes. Glob. Environ. Chang. 56 , 41–55 (2019).

Alexandre, P. M. et al. The relative impacts of vegetation, topography and spatial arrangement on building loss to wildfires in case studies of California and Colorado. Landsc. Ecol. 31 , 415–430 (2016).

Jarvis, A., Reuter, H. I., Nelson, A. & Guevara, E. Hole-filled SRTM for the globe Version 4. available from the CGIAR-CSI SRTM 90m database ( http://srtm.csi.cgiar.org ) 15 , 5 (2008).

Dewitz, J. National Land Cover Database (NLCD) 2016 Products: US Geological Survey data release (2019).

Gorelick, N. et al. Google Earth Engine: planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 202 , 18–27 (2017).

United States Department of Agriculture Economic Research Service. Rural-Urban Commuting Area Codes. https://www.ers.usda.gov/data-products/rural-urban-commuting-area-codes.aspx (2020).

Chen, J., Yang, S. T., Li, H. W., Zhang, B. & Lv, J. R. Research on geographical environment unit division based on the method of natural breaks (Jenks). The International Archives of the Photogrammetry. Remote Sens. Spat. Inf. Sci. XL-4-W3 , 47–50 (2013).

Blair, G. et al. Package ‘estimatr’. Stat 7 , 295–318 (2018).

Imai, K. & Kim, I. S. On the use of two-way fixed effects regression models for causal inference with panel data. Political Anal. 29 , 405–415 (2021).

Kropko, J. & Kubinec, R. Interpretation and identification of within-unit and cross-sectional variation in panel data models. PLoS ONE 15 , e0231349 (2020).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Hill, T. D., Davis, A. P., Roos, J. M. & French, M. T. Limitations of fixed-effects models for panel data. Soc. Perspect. 63 , 357–369 (2020).

Imai, K., Kim, I. S. & Wang, E. H. Matching methods for causal inference with time-series cross-sectional data. Am. J. Political Sci. 67 , 587–605 (2023).

Iacus, S. M., King, G. & Porro, G. Causal inference without balance checking: coarsened exact matching. Political Anal. 20 , 1–24 (2012).

Raker, E. J. Natural hazards, disasters, and demographic change: the case of severe tornadoes in the United States, 1980–2010. Demography 57 , 653–674 (2020).

Imbens, G. W. & Rubin, D. B. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction (Cambridge University Press, 2015). https://doi.org/10.1017/CBO9781139025751 .

Stewart, E., King, G., Imai, K. & Ho, D. MatchIt: nonparametric preprocessing for parametric causal inference. J. Stat. Softw. 42 , 1–28, https://doi.org/10.18637/jss.v042.i08 (2011).

Download references

Acknowledgements

This work was supported by the U.S. National Science Foundation grant numbers 2001261 (K.M.), 2117405 (E.F., J.D., K.J.C.), and 1850871 (J.D., E.F., K.J.C.). K.M. and E.F. were supported by the Population Studies and Training Center at Brown University through the Eunice Kennedy Shriver National Institute of Child Health and Human Development (P2C HD041020). K.J.C. was supported by the Center for Demography and Ecology at the University of Wisconsin-Madison through the Eunice Kennedy Shriver National Institute of Child Health and Human Development (P2C HD047873) and the Wisconsin Agricultural Experiment Station. Further funding was provided by Earth Lab through the University of Colorado Boulder’s Grand Challenge Initiative and the Cooperative Institute for Research in Environmental Science (CIRES). The views expressed in this report are those of the authors and are not necessarily those of the Federal Reserve Bank of Cleveland, the Board of Governors of the Federal Reserve System, Equifax, the NSF, or the NIH. Thank you to Justin Farrell, Emily Sellars, and Karen Seto for feedback on this research.

Author information

Authors and affiliations.

Population Studies and Training Center, Brown University, Providence, RI, USA

Kathryn McConnell & Elizabeth Fussell

Department of Sociology, The University of British Columbia, Vancouver, BC, Canada

Kathryn McConnell

Institute at Brown for Environment and Society, Brown University, Providence, RI, USA

Elizabeth Fussell

Population Council, New York, NY, USA

Jack DeWaard

Center for Studies in Demography and Ecology, University of Washington, Seattle, WA, USA

Federal Reserve Bank of Cleveland, Cleveland, OH, USA

Stephan Whitaker

University of Wisconsin—Madison, Madison, WI, USA

Katherine J. Curtis

Earth Lab, University of Colorado Boulder, Boulder, CO, USA

Lise St. Denis & Jennifer Balch

University of Minnesota—Twin Cities, Minneapolis, MN, USA

Kobie Price

You can also search for this author in PubMed   Google Scholar

Contributions

K.M., E.F., J.D., and S.W. conceptualized and designed the study. K.M., S.W., and L.S. curated data. K.M. and S.W. wrote software and performed formal analysis. K.M. and E.F. wrote the original draft, and K.M., E.F., J.D., S.W., K.J.C., J.B., and K.P. contributed to review and editing of manuscript drafts. K.M. created visualizations. K.M., E.F., and J.D. supported funding acquisition.

Corresponding author

Correspondence to Kathryn McConnell .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks Toddi Steelman and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, reporting summary, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

McConnell, K., Fussell, E., DeWaard, J. et al. Rare and highly destructive wildfires drive human migration in the U.S.. Nat Commun 15 , 6631 (2024). https://doi.org/10.1038/s41467-024-50630-4

Download citation

Received : 26 January 2024

Accepted : 15 July 2024

Published : 05 August 2024

DOI : https://doi.org/10.1038/s41467-024-50630-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.

database migration research article

  • Cette page n'est pas disponible en Français

Overview data on migration flows and migrant populations

The database contains data on foreign and foreign-born population, migration flows, naturalisations and labour market outcomes.

This database provides tables with recent annual series on migration flows and stocks of foreign-born or foreigners in OECD countries as well as on acquisitions of nationality.

OECD International Migration Database

This series of tables relates to stocks and flows of immigrants for the period 2011-21 or 2012-22.

  • Inflows of foreign population
  • Outflows of foreign population
  • Inflows of asylum seekers
  • Stocks of foreign-born population
  • Stocks of foreign population
  • Acquisition of nationality

Metadata and more data by country of origin available 

Labour market outcomes of immigrants

  • Employment rates by place of birth, 2002-2022
  • Evolution of the gender gap in employment rates, by place of birth, 2002-2022
  • More countries available

Data Formulator:使用prompt就能轻松完成数据可视化

编者按:在使用 AI 驱动的数据可视化工具时,你是否遇到过这些难题?比如,当你想要设计图表时,需要一次性描述你的可视化需求,既冗杂又繁琐;再比如,当你想要更改图表设计时,又需要从头重新输入一遍你的文本提示,AI 可能还会出错。现在,微软雷德蒙研究院深度学习组推出了 Data Formulator 工具。结合图形化用户界面和自然语言输入,该工具可以更加智能地完成你的数据可视化需求,它不仅支持通过简便地拖拽来生成图表,还能自发生成原来没有的数据概念,而且,可以通过 prompt 轻松完成各阶段的图表迭代。

Data Formulator 现已开源,欢迎大家试用,完成一次方便简易的数据可视化创作!

数据可视化是分析数据、激发灵感的重要手段。在将数据转化为图表的过程中,通常会先利用数据处理工具对数据进行预处理,再通过可视化工具将数据映射为图形,并根据图表效果进行必要的调整或进一步设计。随着大语言模型的进步,AI 驱动的数据可视化工具使创作过程变得更加简便。例如,通过自然语言描述可视化设计,大模型便能自动生成代码完成任务,大大节省了数据转换和可视化的工作量。

然而,如何有效利用 AI 更好地辅助数据分析师进行可视化创作,仍面临诸多挑战。特别是在可视化创作的迭代过程(iterative process)中,我们往往无法一蹴而就。每次更新设计都需要重新处理数据和绘制图表。当前 AI 可视化工具大多要求创作者通过文本提示(prompt)一次性描述完整的可视化需求,这不仅使得描述过程冗长繁琐,而且难以精确传达丰富的视觉信息。此外,当需要不断迭代更新设计时,每次也都要从头描述设计需求,既耗时又费力,且 AI 可能也无法一次性准确完成任务。

Data Formulator:结合图形界面操作与自然语言描述,让用户更好地与AI交流可视化设计

为了解决可视化迭代设计的难题,来自微软雷德蒙研究院的研究员们在 GitHub 上发布了一款 AI 驱动的开源可视化工具 Data Formulator。结合图形化用户界面(graphic user interface)和自然语言输入,Data Formulator 能够极大提升用户向 AI 传达迭代过程中可视化设计的能力,使得 AI 能够根据用户的指令逐步完成复杂可视化作品的创作与更新。

Data Formulator

GitHub 链接: https://github.com/microsoft/data-formulator (opens in new tab)

试用链接: https://github.com/microsoft/data-formulator/blob/main/CODESPACES.md (opens in new tab)

论文链接: https://arxiv.org/abs/2408.16119 (opens in new tab)

如图1所示,Data Formulator 的交互界面设计巧妙。用户在构思可视化设计时,可首先通过右侧的 Concept Encoding Shelf 来描述设计目标。在迭代过程中,用户则可通过左侧的 Data Threads 回顾之前的可视化作品,选择合适的路径进行跟进,进而描述新的可视化目标或进行微调。

Data Formulator 的用户交互界面

• 使用 Concept Encoding Shelf 描述可视化设计

Concept Encoding Shelf 的设计结合了传统图形化可视化工具中的“数据放置交互界面”(shelf configuration UI)与 AI 工具的自然语言输入功能,使用户能够更直观地描述可视化目标。用户在选定可视化类型(如柱状图、线性图)后,可通过拖拽数据列至相应的视觉通道(如 x 轴、y 轴、颜色等)来直接映射数据至图形。这种方法相比冗长的文字描述,更能直观且精确地传达图表的设计意图。

此外,Data Formulator 的独特之处在于,它允许用户通过自然语言添加原始数据中不存在的数据概念(data concept)到可视化映射中,从而打破现有数据格式的限制,实现更加深入和丰富的可视化设计。

如图2所示,用户可以在 y 轴上添加“可持续能源百分比”这一数据概念(尽管原始数据仅包含各种能源的消耗值而非百分比),Data Formulator 将自动决定如何转换原始数据,计算出所需的数据栏以完成这一可视化。同样地,若用户希望查看不同国家的可持续能源百分比排名,则可以在 y 轴上添加“排名”数据栏,并通过额外的自然语言描述“计算不同国家的排名”,以指导 Data Formulator 完成相应的可视化过程。

Data Formulator 允许用户添加不存在的数据概念,自行完成转换

• 借助 Data Threads 迭代可视化

当需要在现有图表基础上进行迭代时,用户可以直接在 Concept Encoding Shelf 上使用自然语言来传达迭代指令(或修改之前的数据映射)。例如,输入“仅展示前五名国家的可持续能源百分比”,Data Formulator 便能据此进一步处理数据,实现迭代,无需用户重新描述整个流程。这种方法显著减轻了用户的输入负担。

Data Formulator 支持在现有图表基础上,通过自然语言指令进行迭代

若需追溯至先前的可视化作品进行重新分析,用户则可以利用 Data Threads 功能浏览之前的可视化历程,并选择合适的节点继续研究。例如,用户若想绘制一个柱状图来展示所有国家从2000年至2020年的排名变化,可以返回至“可持续能源百分比排名”的图表,并通过自然语言指令“比较不同国家2000至2020年间的排名变化”来指引 Data Formulator 基于历史数据进行深入分析,并生成所需的图表。

Data Formulator 支持在过往图表基础上,通过自然语言指令进行迭代

大模型代码生成连接可视化与数据转化模块

Data Formulator 的设计策略是将数据转换与可视化过程分离开来,以提高大模型执行任务的准确度,并通过模型的代码生成能力将这两个阶段衔接。如图5所示,用户设定可视化目标后,Data Formulator 首先会根据图形界面中的输入实例化图形模板,并生成一段 Vega-Lite 代码。由于用户输入中包含了新的数据概念,Data Formulator 需要对数据进行转换以创建可视化。为此,它会将用户的输入转换为大模型的提示词,指导模型生成 Python 代码以转换数据,满足 Vega-Lite 代码的需求。

Data Formulator 架构示意图

数据转换完成后,Data Formulator 会将处理后的数据与可视化代码结合,实现最终的可视化效果。在用户选择基于先前可视化进行迭代时,Data Formulator 则会利用原有代码生成新代码,从而减少代码生成过程中的不确定性,更有效地完成用户任务。

在 Data Formulator 的研究中,研究员们致力于融合图形化操作与 AI 的自然语言输入,希望帮助用户更有效地表达他们的可视化目标。但随着大模型的不断进步,如何使用户以简洁而精确的方式传达任务意图变得尤为关键,这也是研究员们未来探索的方向之一。

欢迎大家来 GitHub Codespaces 中试用 Data Formulator 创作可视化!

链接: https://github.com/microsoft/data-formulator/blob/main/CODESPACES.md (opens in new tab)

开发者们可以在 Data Formulator 代码库的基础上开发新的功能。

Data Formulator 代码库: https://github.com/microsoft/data-formulator (opens in new tab)

想要了解更多 Data Formulator 背后的设计理念,请查看论文: https://arxiv.org/abs/2408.16119 (opens in new tab)

  • 在Facebook关注
  • 在Youtube上订阅
  • 关注Instagram
  • 分享到Facebook
  • 分享到LinkedIn

IMAGES

  1. Database Migration

    database migration research article

  2. What is Database Migration and How to Do it Properly

    database migration research article

  3. Understanding data migration: strategy and best practices

    database migration research article

  4. What is Database Migration and How to Do it Properly

    database migration research article

  5. Data Migration: Benefits, Use Cases, and Best Practices

    database migration research article

  6. A Complete Guide to Database Migration and Best Practices

    database migration research article

VIDEO

  1. Database migrations for a single GitLab codebase

  2. 🔸ESF Database Migration Toolkit Professional🎥 HOW TO INSTALL 💻PC/LAPTOP [TUTORIAL 2024 no

  3. Database Migration Tips #database #migration #mongodb #mysql #sql #nosql

  4. The Four Success Factors of Data Migration

  5. SQL Migrations to Azure Made Easy

  6. Highlights from the 5th Conference for Force Migration Studies from the 16-18 September In Bonn City

COMMENTS

  1. Key Opportunities and Challenges of Data Migration in Cloud: Results

    Choosing the right vendor: Data management and data migration are essential research challenges, and it is never as simple as moving information from legacy to cloud [20]. Even after the SWOT (Strength, Weakness, Opportunities, and Threats) analysis, it is not trivial for an organization to choose a suitable cloud provider. ...

  2. Strategies For Migrating Large, Mission-Critical Database Workloads To

    Abstract. This comprehensive research paper explores strategies for migrating large, mission-critical database workloads to cloud environments. By examining various aspects of the migration process, including assessment and planning, data migration strategies, security considerations, performance optimization, and post-migration management ...

  3. (PDF) Data Migration Need, Strategy, Challenges, Methodology

    PDF | Data Migration is a multi-step process that begins with analyzing old data and culminates in data uploading and reconciliation in new... | Find, read and cite all the research you need on ...

  4. Evaluating cloud database migration options using workload models

    A key challenge in porting enterprise software systems to the cloud is the migration of their database. Choosing a cloud provider and service option (e.g., a database-as-a-service or a manually configured set of virtual machines) typically requires the estimation of the cost and migration duration for each considered option. Many organisations also require this information for budgeting and ...

  5. (PDF) Key Opportunities and Challenges of Data Migration in Cloud

    PDF | Cloud data migration is the procedure of moving information, localhost applications, services, and data to the distributed cloud computing... | Find, read and cite all the research you need ...

  6. A Review on Database Migration Strategies, Techniques and Tools

    In a paper, reviewing database migration strategies, tools and techniques, Elamparithi, M and Anuratha, V singled out relational database migration (RDM) as an example. The authors stated that ...

  7. Evidences from the Literature on Database Migration to the Cloud

    The evidences identified in the selected papers provide a panoramic view of six strategies reported to migrate databases from legacy systems to the cloud. These strategies have different goals depending on the migration needs, constraints and resources available (Table 65.1). Table 65.1 Selected papers.

  8. Cloud migration process—A survey, evaluation framework, and open

    This paper is structured as follows: In Section 2, we give a general review of terms related to the cloud migration, key challenges that need to be addressed in a typical migration process, and the related work to this paper. Section 3 describes proposed evaluation framework designed for the purpose of this paper.

  9. Metamodels to support database migration between heterogeneous data

    The core problems of database migration are the data integrity, data accuracy and business continuity. We discussed these problems during heterogeneous database migration in this article. We designed and implemented a migration project for Tsinghua ...

  10. Challenges in migrating legacy software systems to the cloud

    Cloud migration research. In the current research, ... If the migration is from an SQL-based database for example to a SQL based service database, then migration is easy. If migration involves the alteration of the data model (e.g. to a NoSQL solution), then significant modifications must be performed and more importantly to check the ...

  11. PDF Opportunities and Challenges of Data Migration in Cloud

    The above research observes cloud migration as a 'black box' by not narrowing its core interest down on key operational challenges engaged with the migration cycle. Moreover, our examination is a response to experimentally explore how the migration process of data to the cloud is conducted; and subsequently, the challenges and possible

  12. Without Data Quality, There Is No Data Migration

    Data migration is required to run data-intensive applications. Legacy data storage systems are not capable of accommodating the changing nature of data. In many companies, data migration projects fail because their importance and complexity are not taken seriously enough. Data migration strategies include storage migration, database migration, application migration, and business process ...

  13. Lessons learned: on the challenges of migrating a research data

    The transfer of research data management from one institution to another infrastructural partner is all but trivial, but can be required, for instance, when an institution faces reorganization or closure. In a case study, we describe the migration of all research data, identify the challenges we encountered, and discuss how we addressed them. It shows that the moving of research data ...

  14. Database Migration on Premises to AWS RDS

    In this paper, we are going to analyze and perform one such On-Premises to AWS RDS To support, Cloud migration which helps the users on performance, cost, and scalability. For the past four decades, the traditional relational databases have been in use in Information Technology industry. There was a phenomenal conversion in the IT industry in ...

  15. Migrating a research data warehouse to a public cloud: challenges and

    Clinical research data warehouses (RDWs) linked to genomic pipelines and open data archives are being created to support innovative, com ... Categories of underappreciated challenges that emerged during migration from on-premises to cloud data warehouse. Networking/Network security: Integration with enterprise networking. System security plan ...

  16. Key opportunities and challenges for the use of big data in migration

    Title: Key opportunities and challenges for the use of big data in migration research and policy. Authors: Lydia H.V. Franklinos, Rebecca Parrish, Rachel Burns, Andrea Caflisch, Bishawjit Mallick, Taifur Rahman, Vasileios Routsis, Ana Sebastián López, Andrew J. Tatem, Robert Trigwell. -- We thank the editors and the three reviewers for their ...

  17. PDF Data Migration Research Study

    Data Migration Research Study - Experian

  18. Database migration: Concepts and principles (Part 1)

    The core technical building block of a database migration system is the data migration process. The data migration process is specified by a developer and defines the source databases from which data is extracted, the target databases into which data is migrated, and any data modification logic applied to the data during the migration.

  19. Computational approaches to migration and integration research

    A brief history of computational approaches to migration and integration research. Computational social science is a dynamic field. While its rise to prominence has accelerated in the last decade following landmark articles that aimed to define and further establish the field (Lazer et al. Citation 2009; Conte et al. Citation 2012), it actually builds on a longer tradition of research that ...

  20. Research and Practice of university database Migration

    We make an investigation into the approach and technique used in database Migration in this paper. Different database migrations such as Oracle, SqlServer are discussed. As an example, we design and implement a migration project for Tsinghua University. This project migrates Oracle RAC into a different platform completely and successfully. At the same time, it will shorten the switch time and ...

  21. (PDF) Data Migration

    Abstract Thisdocument gives the overview of all the process involved in Data Migration. Data Migration is a multi-step. process that begins wi th an analysis of the legacy data and culm inates in ...

  22. Immigration & Migration

    How the origins of America's immigrants have changed since 1850. In 2022, the number of immigrants living in the U.S. reached a high of 46.1 million, accounting for 13.8% of the population. short readsJul 22, 2024.

  23. Analysis and mapping of global research publications on migrant

    Recognizing the importance of evidence-based research in informing migration policies and empowering migrant domestic workers (MDWs), this study aims to provide a comprehensive analysis of MDW research patterns and trends. Using a descriptive cross-sectional study design, research articles on MDWs were retrieved from the Scopus database. The findings reveal a substantial increase in research ...

  24. Journals & Databases

    The Cochrane Library is a regularly updated collection of evidence-based medicine databases that brings together relevant research on the effectiveness of healthcare treatments and interventions. It can be used to inform healthcare decision-making for hundreds of medical conditions, plus related topics such as injury prevention.

  25. Rare and highly destructive wildfires drive human migration in ...

    We paired these data with migration estimates from the Federal Reserve Bank of New York/Equifax Consumer Credit Panel, which has been minimally used for migration research but offers improved ...

  26. Overview data on migration flows and migrant populations

    The database contains data on foreign and foreign-born population, migration flows, naturalisations and labour market outcomes. ... Reports and research papers. Research and working papers with deep dives and findings. Policy papers and briefs. Policy recommendations and case studies. Featured publications

  27. REVIEW OF CLOUD DATABASE BENEFITS AND CHALLENGES

    Amazon Web Services (AWS), Microsoft. Azure and Google Cloud are the top cloud computing providers (Bajpai, 2023). In Q1 2023. AWS revenue increased b y 20% year to year to 21.4B $, Intelligent ...

  28. Full article: At risk or resilient? Examining the effects of having a

    Previous research also indicates that boys and girls with a migration background face different challenges (Stevens et al., Citation 2015), and that the impact of migration can vary between early and late adolescence (Tartakovsky, Citation 2009). Consequently, this study will empirically examine whether differences based on migration background ...

  29. FBI Releases 2023 Crime in the Nation Statistics

    The FBI released detailed data on over 14 million criminal offenses for 2023 reported to the Uniform Crime Reporting (UCR) Program by participating law enforcement agencies. More than 16,000 state ...

  30. Data Formulator:使用prompt就能轻松完成数据可视化

    图4:Data Formulator 支持在过往图表基础上,通过自然语言指令进行迭代. 大模型代码生成连接可视化与数据转化模块. Data Formulator 的设计策略是将数据转换与可视化过程分离开来,以提高大模型执行任务的准确度,并通过模型的代码生成能力将这两个阶段衔接。