cognitiveclass.ai logo

Peer Review Assignment - Data Engineer - ETL ¶

Objectives ¶.

The goal of this project is to generate a list of the largest banks, ranked by market capitalization, and represented in British Pounds.

In this Data Engineering Python project I will:

  • Extract currency exchange rates and bank market cap data from disperate sources
  • Transform the market cap currency using the exchange rate data
  • Load the transformed data into a seperate CSV

1. Preparation ¶

For this lab, we are going to be using Python and several Python libraries. The following are lines of code that will install the neccessary libraries. They are commented to avoid allocating resources when the libraries are already installed.

a. Imports ¶

Import the libraries needed to complete the project.

2. Functions ¶

A. json extract function ¶.

This function will extract JSON files. The argument is a json file that we want to convert to a Pandas DataFrame object. The output is the Pandas DataFrame.

b. Extract Functions ¶

I am creating two extract functions. The first, extract_table() , will utilize web scraping to extract a table from a Wikipedia webpage. The desired table is a list of the largest banks by market capitalization. The data is generated from annual reports and financial statements from companies provided by Relbanks.com, dated 1 July 2019. After the function executes the data is returned as a Pandas DataFrame.

The second extract function, extract_rate() , extracts from a public API and saves a .csv file to the directory. The file has a list of conversion factors for converting from Euros to other currencies. Each row is indexed as the 3-character desired currency identification and the data point is the conversion factor.

1. Extract with Web Scraping ¶

2. extract with api ¶, c. transform function ¶.

Using exchange_rate and the exchange_rates.csv file find the exchange rate of USD to GBP and transform the web scrapped table. Write a transform function that

  • Changes the Market Cap (US$ Billion) column from USD to GBP
  • Rounds the Market Cap (US$ Billion)` column to 3 decimal places
  • Rename Market Cap (US$ Billion) to Market Cap (GBP$ Billion)

(Note: The base currency for the exchange rates is Euros since that is the free API. When the conversion is complete the USD column of the web scrapped table will need to be converted to from USD to Euros, then from Euros to British Pounds.)

d. Load Function ¶

Create a function that takes a dataframe and load it to a csv named bank_market_cap_gbp.csv . Make sure to set index to False .

bank_market_cap_gbp.csv will be passed as the argument for targetfile when the ETL process is run later.

e. Logging Function ¶

Write the logging function log to log your data.

At the begining and end of each phase in the ETL process a log entry will be generated and appended to the log file. If we were to run into an error in the future we could check the log file to verify if all phases were executed. If not, then we would have a starting point for debugging the pipeline.

3. Running the ETL Process ¶

Here is where the ETL process will begin execution, calling functions that are defined above.

Log the process accordingly using the following "ETL Job Started" and "Extract phase Started"

a. Extract ¶

Question 2 Use the function extract , and print the first 5 rows, take a screen shot:

Name Market Cap (US$ Billion)
0 JPMorgan Chase 488.470
1 Bank of America 401.75
2 Industrial and Commercial Bank of China 250.383
3 Wells Fargo 224.87
4 China Construction Bank 257.399

Log the data as "Extract phase Ended"

b. Transform ¶

In this phase the data will be converted from USD to GBP, rounded to 3 decimal places, then the colum will be renamed to represent the accurate currency.

Log the following "Transform phase Started"

Question 3 Use the function transform and print the first 5 rows of the output, take a screen shot:

Name Market Cap (GBP$ Billion)
0 JPMorgan Chase 357.199
1 Bank of America 293.784
2 Industrial and Commercial Bank of China 183.095
3 Wells Fargo 164.439
4 China Construction Bank 188.226

Log your data "Transform phase Ended"

c. Load ¶

The cleaned data will be saved as a .csv file that can be used by Data Analysts or Data Scientists in the future. The data is now formated in a way that will provide value to down-stream users.

Log the following "Load phase Started" .

Call the load function

Log the following "Load phase Ended" .

Here I will check the log file to verify that the phases were executed and logged correctly.

2022-Jan-16 21:31:08, ----- Document Header -----
0 2022-Jan-16 21:31:08, ETL Job Started
1 2022-Jan-16 21:31:08, Extract Phase Started
2 2022-Jan-16 21:31:11, Extract Phase Ended
3 2022-Jan-16 21:31:12, Transform Phase Started
4 2022-Jan-16 21:31:13, Transform Phase Ended
5 2022-Jan-16 21:31:14, Load Phase Started
6 2022-Jan-16 21:31:16, Load Phase Ended
7 2022-Jan-16 21:31:16, ETL Job Completed

There is now a bank_market_cap_gbp.csv file in this notebook directory that has the transformed data in a structured format. The first step was to extract the list of largest banks by market cap by web scraping a table from a Wikipedia page. Then, exchange rate data was extracted from an API and saved as a csv. The base data was transformed by converting the 'market cap' data point from USD to GBP. The base data has been converted to the correct currency so it can be loaded to a new csv.

Now that the data is structured and contains valuable information it can be processed by down-stream users like Data Analysts, Data Scientists, or various company departments.

Author ¶

Giovanni Harold

Big Data ETL Implementation Approaches: A Systematic Literature Review

  • Conference: 2018 Software Engineering and Knowledge Engineering

Faisal Aqlan at University of Louisville

  • University of Louisville

Joshua Chibuike Nwokeji at Gannon University

  • Gannon University

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • G K S Nimishi
  • R S Ranwala
  • R S L B Ranasinghe

Suchindra Suchindra

  • Vytautas Siozinys

Martynas Jonaitis

  • Milvydas Siozinys

Jacky Akoka

  • Int J Hybrid Intell Syst

Hana Mallek

  • Malte Rathjens
  • FUTURE GENER COMP SY

Genaro Sánchez-Gallegos

  • INFORM SYST

Ahmed Harby

  • G. Sunil Santhosh Kumar
  • M. Rudra Kumar

Joshua Chibuike Nwokeji

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

[Hidden Company] is hiring for remote Data Engineer

[Hidden Company]

Data engineer, disclaimer: before you apply, please make sure the job is legit..

Attempting to apply for jobs might take you off this site to a different website not owned by us. Any consequence as a result for attempting to apply for jobs is strictly at your own risk and we assume no liability.

Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam egestas semper. Aenean ultricies mi vitae est. Mauris placerat eleifend leo. Quisque sit amet est et sapien ullamcorper pharetra. Vestibulum erat wisi, condimentum sed, commodo vitae, ornare sit amet, wisi. Aenean fermentum, elit eget tincidunt condimentum, eros ipsum rutrum orci, sagittis tempus lacus enim ac dui. Donec non enim in turpis pulvinar facilisis. Ut felis. Praesent dapibus, neque id cursus faucibus, tortor neque egestas augue, eu vulputate magna eros eu erat. Aliquam erat volutpat. Nam dui mi, tincidunt quis, accumsan porttitor, facilisis luctus, metus

HTML Ipsum Presents

Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam egestas semper. Aenean ultricies mi vitae est. Mauris placerat eleifend leo. Quisque sit amet est et sapien ullamcorper pharetra. Vestibulum erat wisi, condimentum sed, commodo vitae , ornare sit amet, wisi. Aenean fermentum, elit eget tincidunt condimentum, eros ipsum rutrum orci, sagittis tempus lacus enim ac dui. In turpis pulvinar facilisis. Ut felis.

PREMIUM JOB

Unlock this job with dailyremote premium, get access to this and 48,833 other remote jobs.

Here’s what you get with DailyRemote Premium:

  • Apply To Unlimited Jobs
  • Unlimited Advanced Job Searches
  • Access To 48,834 Hand-Screened Remote Jobs
  • Job Alerts Every 6 Hours
  • Discover Hidden Roles Not Posted On Any Other Job Board
  • Get More Interviews
  • Updates Every 24 Hours
  • Filter By Salary
  • No Ads, No Junk
  • Cancel Anytime, 100% Money-Back Guarantee
  • Premium Support

Costs less than a cup of coffee!

Monthly membership, yearly membership.

peer review assignment data engineer etl

Trusted by over 200,000+ candidates across the globe

Help us grow.

🌟 When applying, kindly mention you found this job on DailyRemote . Your shoutout helps us bring more great companies on board!

Ace Your Job Interview

Read our advice on how to answer the most common interview questions.

How to Answer "How Do You Handle Criticism"?

How to answer "tell me about yourself" in an interview, how to answer "what is your experience with customer service", how to answer "describe your experience working with diverse teams or different cultures", how to answer the interview question "what sets you apart from other candidates", how to answer "why are you the best person for this job", how to answer "tell me about a time when you had to balance competing priorities", how to answer "why should we hire you", how to answer "what areas need improvement", how to answer “tell me about a time you received constructive feedback”, how to answer "what is your greatest accomplishment", dailyremote, remote work tips, remote job roles, remote jobs, remote job type, remote job resources, join 200k+ people getting remote jobs.

peer review assignment data engineer etl

Trusted by over 200,000+ remote job seekers

peer review assignment data engineer etl

 by

Instantly share code, notes, and snippets.

@olayinka04

olayinka04 / 1 - ExtractTransformLoad_V2.ipynb

  • Download ZIP
  • Star ( 8 ) 8 You must be signed in to star a gist
  • Fork ( 2 ) 2 You must be signed in to fork a gist
  • Embed Embed this gist in your website.
  • Share Copy sharable link for this gist.
  • Clone via HTTPS Clone using the web URL.
  • Learn more about clone URLs
  • Save olayinka04/975d41b85e3bed16aee9f5eeec3f3b5d to your computer and use it in GitHub Desktop.

@ps2014132

ps2014132 commented Oct 1, 2023

This is very helpful, thanks a loads!! :)

Sorry, something went wrong.

IMAGES

  1. Intro-to-Data-Engineering/ETL_Engineer_Peer_Review_Assignment (1) (1

    peer review assignment data engineer etl

  2. Peer review assignments

    peer review assignment data engineer etl

  3. ETL Developer Role Explained: Skills and Responsibilities

    peer review assignment data engineer etl

  4. ETL Pipelines for Data Science Projects

    peer review assignment data engineer etl

  5. What is ETL Testing: Process, Tools, and Concepts

    peer review assignment data engineer etl

  6. Understand the peer review process

    peer review assignment data engineer etl

VIDEO

  1. ETL DEVELOPER Online Training @ DURGASOFT

  2. SERVICENOW Assignment Data Lookup Rules and Transfer maps @SERVICENOW_Beginners

  3. Peer Review Assignment Instructions

  4. ADF Session 19 Delete

  5. Peer Review Week 2024

  6. Data Platform

COMMENTS

  1. faytemiz/Peer-Review-Assignment---Data-Engineer---ETL

    Peer-Review-Assignment---Data-Engineer---ETL. IBM Data Engineering course 3 final project. Objectives In this final part you will: Run the ETL process Extract bank and market cap data from the JSON file bank_market_cap.json Transform the market cap currency using the exchange rate data Load the transformed data into a seperate CSV.

  2. {"payload":{"allShortcutsEnabled":false,"fileTree":{"":{"items ...

    IBM Data Engineering course 3 final project \n. Objectives\nIn this final part you will: \n. Run the ETL process\nExtract bank and market cap data from the JSON file bank_market_cap.json\nTransform the market cap currency using the exchange rate data\nLoad the transformed data into a seperate CSV

  3. coursera-ibm-python-project-for-data-engineering/Final_Assignment.md at

    Peer Review Assignment - Data Engineer - ETL. Estimated time needed: 20 minutes. Objectives. In this final part you will: Run the ETL process; Extract bank and market cap data from the JSON file bank_market_cap.json; Transform the market cap currency using the exchange rate data;

  4. Peer Review Assignment

    Peer Review Assignment - Data Engineer - ETL. GitHub Gist: instantly share code, notes, and snippets.

  5. Peer Review Assignment

    The goal of this project is to generate a list of the largest banks, ranked by market capitalization, and represented in British Pounds. In this Data Engineering Python project I will: Demonstrate the ETL process. Extract currency exchange rates and bank market cap data from disperate sources. Transform the market cap currency using the ...

  6. Peer Review Assignment

    Contribute to rayhanozzy/Python-Project-for-Data-Engineering-Coursera development by creating an account on GitHub.

  7. Big Data ETL Implementation Approaches: A Systematic Literature Review

    Abstract. Extract, transform, load (ETL) is an essential technique for integrating data from multiple sources into a data warehouse. ETL is applicable to data warehousing, big data, and business ...

  8. cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud

    {"cells":[{"cell_type":"markdown","metadata":{},"source":[" \n"," \n"," \n"," \n"," \n"]},{"cell_type":"markdown","metadata":{},"source":["# Peer Review Assignment ...

  9. ETL Engineer Peer.ipynb · GitHub

    javiicc / ETL Engineer Peer.ipynb. Created December 18, 2021 11:07. Show Gist options. Download ZIP Star (2) 2 You must be signed in to star a gist; Fork (3) 3 You must be signed in to fork a gist; Embed. Embed Embed this gist in your website. Share Copy sharable link for this gist.

  10. GitHub

    Peer Review Assignment - Data Engineer - ETL Estimated time needed: 20 minutes. Objectives In this final part you will: Run the ETL process Extract bank and market cap data from the JSON file bank_market_cap.json Transform the market cap currency using the exchange rate data Load the transformed data into a seperate CSV For this lab, we are ...

  11. Notebooks

    Objectives. In this part you will: Collect exchange rate data using an API. Store the data as a CSV. For this lab, we are going to be using Python and several Python libraries. Some of these libraries might be installed in your lab environment or in SN Labs. Others may need to be installed by you. The cells below will install these libraries ...

  12. ETL_Engineer_Peer_Review_Assignment (1).ipynb · GitHub

    GitHub Gist: instantly share code, notes, and snippets.

  13. Data Engineer at CDC Foundation

    The Data Engineer will be hired by the CDC Foundation and assigned to the Louisiana Department of Health, Office of Public Health. ... and peer review.Knowledge of data warehousing concepts and tools.Experience with cloud computing platforms.Expertise in data modeling, ETL (Extract, Transform, Load) processes, and data integration techniques ...

  14. Naqiuddinr/IBM-Data-Engineering-ETL-Peer-Review-Assignment

    This is a Jupyter Notebook submission for a Peer Review Assignment to fulfill the Data Engineering course by IBM About No description, website, or topics provided.

  15. Altair Engineering Altair Monarch Reviews, Ratings & Features 2024

    Read the latest, in-depth Altair Monarch reviews from real users verified by Gartner Peer Insights, and choose your business software with confidence. Read the latest, in-depth Altair Monarch reviews from real users verified by Gartner Peer Insights, and choose your business software with confidence. ... rate_review Write a Review. download_2 ...

  16. DE IBM 3

    Status. Docs. Contact. Manage cookies. Do not share my personal information. DE IBM 3 - Python Project for Data Engineering. GitHub Gist: instantly share code, notes, and snippets.

  17. Peer Review Assignment

    Peer Review Assignment - Data Engineer - ETL. Contribute to samsonngov/Peer-Review-Assignment---Data-Engineer---ETL development by creating an account on GitHub.

  18. ETL_Engineer_Peer_Review_Assignment (1).ipynb

    Write better code with AI Code review. Manage code changes