Peer Review Assignment - Data Engineer - ETL ¶
Objectives ¶.
The goal of this project is to generate a list of the largest banks, ranked by market capitalization, and represented in British Pounds.
In this Data Engineering Python project I will:
- Extract currency exchange rates and bank market cap data from disperate sources
- Transform the market cap currency using the exchange rate data
- Load the transformed data into a seperate CSV
1. Preparation ¶
For this lab, we are going to be using Python and several Python libraries. The following are lines of code that will install the neccessary libraries. They are commented to avoid allocating resources when the libraries are already installed.
a. Imports ¶
Import the libraries needed to complete the project.
2. Functions ¶
A. json extract function ¶.
This function will extract JSON files. The argument is a json file that we want to convert to a Pandas DataFrame object. The output is the Pandas DataFrame.
b. Extract Functions ¶
I am creating two extract functions. The first, extract_table() , will utilize web scraping to extract a table from a Wikipedia webpage. The desired table is a list of the largest banks by market capitalization. The data is generated from annual reports and financial statements from companies provided by Relbanks.com, dated 1 July 2019. After the function executes the data is returned as a Pandas DataFrame.
The second extract function, extract_rate() , extracts from a public API and saves a .csv file to the directory. The file has a list of conversion factors for converting from Euros to other currencies. Each row is indexed as the 3-character desired currency identification and the data point is the conversion factor.
1. Extract with Web Scraping ¶
2. extract with api ¶, c. transform function ¶.
Using exchange_rate and the exchange_rates.csv file find the exchange rate of USD to GBP and transform the web scrapped table. Write a transform function that
- Changes the Market Cap (US$ Billion) column from USD to GBP
- Rounds the Market Cap (US$ Billion)` column to 3 decimal places
- Rename Market Cap (US$ Billion) to Market Cap (GBP$ Billion)
(Note: The base currency for the exchange rates is Euros since that is the free API. When the conversion is complete the USD column of the web scrapped table will need to be converted to from USD to Euros, then from Euros to British Pounds.)
d. Load Function ¶
Create a function that takes a dataframe and load it to a csv named bank_market_cap_gbp.csv . Make sure to set index to False .
bank_market_cap_gbp.csv will be passed as the argument for targetfile when the ETL process is run later.
e. Logging Function ¶
Write the logging function log to log your data.
At the begining and end of each phase in the ETL process a log entry will be generated and appended to the log file. If we were to run into an error in the future we could check the log file to verify if all phases were executed. If not, then we would have a starting point for debugging the pipeline.
3. Running the ETL Process ¶
Here is where the ETL process will begin execution, calling functions that are defined above.
Log the process accordingly using the following "ETL Job Started" and "Extract phase Started"
a. Extract ¶
Question 2 Use the function extract , and print the first 5 rows, take a screen shot:
Name | Market Cap (US$ Billion) | |
---|---|---|
0 | JPMorgan Chase | 488.470 |
1 | Bank of America | 401.75 |
2 | Industrial and Commercial Bank of China | 250.383 |
3 | Wells Fargo | 224.87 |
4 | China Construction Bank | 257.399 |
Log the data as "Extract phase Ended"
b. Transform ¶
In this phase the data will be converted from USD to GBP, rounded to 3 decimal places, then the colum will be renamed to represent the accurate currency.
Log the following "Transform phase Started"
Question 3 Use the function transform and print the first 5 rows of the output, take a screen shot:
Name | Market Cap (GBP$ Billion) | |
---|---|---|
0 | JPMorgan Chase | 357.199 |
1 | Bank of America | 293.784 |
2 | Industrial and Commercial Bank of China | 183.095 |
3 | Wells Fargo | 164.439 |
4 | China Construction Bank | 188.226 |
Log your data "Transform phase Ended"
c. Load ¶
The cleaned data will be saved as a .csv file that can be used by Data Analysts or Data Scientists in the future. The data is now formated in a way that will provide value to down-stream users.
Log the following "Load phase Started" .
Call the load function
Log the following "Load phase Ended" .
Here I will check the log file to verify that the phases were executed and logged correctly.
2022-Jan-16 21:31:08, ----- Document Header ----- | |
---|---|
0 | 2022-Jan-16 21:31:08, ETL Job Started |
1 | 2022-Jan-16 21:31:08, Extract Phase Started |
2 | 2022-Jan-16 21:31:11, Extract Phase Ended |
3 | 2022-Jan-16 21:31:12, Transform Phase Started |
4 | 2022-Jan-16 21:31:13, Transform Phase Ended |
5 | 2022-Jan-16 21:31:14, Load Phase Started |
6 | 2022-Jan-16 21:31:16, Load Phase Ended |
7 | 2022-Jan-16 21:31:16, ETL Job Completed |
There is now a bank_market_cap_gbp.csv file in this notebook directory that has the transformed data in a structured format. The first step was to extract the list of largest banks by market cap by web scraping a table from a Wikipedia page. Then, exchange rate data was extracted from an API and saved as a csv. The base data was transformed by converting the 'market cap' data point from USD to GBP. The base data has been converted to the correct currency so it can be loaded to a new csv.
Now that the data is structured and contains valuable information it can be processed by down-stream users like Data Analysts, Data Scientists, or various company departments.
Author ¶
Giovanni Harold
Big Data ETL Implementation Approaches: A Systematic Literature Review
- Conference: 2018 Software Engineering and Knowledge Engineering
- University of Louisville
- Gannon University
Discover the world's research
- 25+ million members
- 160+ million publication pages
- 2.3+ billion citations
- G K S Nimishi
- R S Ranwala
- R S L B Ranasinghe
- Vytautas Siozinys
- Milvydas Siozinys
- Int J Hybrid Intell Syst
- Malte Rathjens
- FUTURE GENER COMP SY
- INFORM SYST
- G. Sunil Santhosh Kumar
- M. Rudra Kumar
- Recruit researchers
- Join for free
- Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up
[Hidden Company]
Data engineer, disclaimer: before you apply, please make sure the job is legit..
Attempting to apply for jobs might take you off this site to a different website not owned by us. Any consequence as a result for attempting to apply for jobs is strictly at your own risk and we assume no liability.
Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam egestas semper. Aenean ultricies mi vitae est. Mauris placerat eleifend leo. Quisque sit amet est et sapien ullamcorper pharetra. Vestibulum erat wisi, condimentum sed, commodo vitae, ornare sit amet, wisi. Aenean fermentum, elit eget tincidunt condimentum, eros ipsum rutrum orci, sagittis tempus lacus enim ac dui. Donec non enim in turpis pulvinar facilisis. Ut felis. Praesent dapibus, neque id cursus faucibus, tortor neque egestas augue, eu vulputate magna eros eu erat. Aliquam erat volutpat. Nam dui mi, tincidunt quis, accumsan porttitor, facilisis luctus, metus
HTML Ipsum Presents
Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam egestas semper. Aenean ultricies mi vitae est. Mauris placerat eleifend leo. Quisque sit amet est et sapien ullamcorper pharetra. Vestibulum erat wisi, condimentum sed, commodo vitae , ornare sit amet, wisi. Aenean fermentum, elit eget tincidunt condimentum, eros ipsum rutrum orci, sagittis tempus lacus enim ac dui. In turpis pulvinar facilisis. Ut felis.
PREMIUM JOB
Unlock this job with dailyremote premium, get access to this and 48,833 other remote jobs.
Here’s what you get with DailyRemote Premium:
- Apply To Unlimited Jobs
- Unlimited Advanced Job Searches
- Access To 48,834 Hand-Screened Remote Jobs
- Job Alerts Every 6 Hours
- Discover Hidden Roles Not Posted On Any Other Job Board
- Get More Interviews
- Updates Every 24 Hours
- Filter By Salary
- No Ads, No Junk
- Cancel Anytime, 100% Money-Back Guarantee
- Premium Support
Costs less than a cup of coffee!
Monthly membership, yearly membership.
Trusted by over 200,000+ candidates across the globe
Help us grow.
🌟 When applying, kindly mention you found this job on DailyRemote . Your shoutout helps us bring more great companies on board!
Ace Your Job Interview
Read our advice on how to answer the most common interview questions.
How to Answer "How Do You Handle Criticism"?
How to answer "tell me about yourself" in an interview, how to answer "what is your experience with customer service", how to answer "describe your experience working with diverse teams or different cultures", how to answer the interview question "what sets you apart from other candidates", how to answer "why are you the best person for this job", how to answer "tell me about a time when you had to balance competing priorities", how to answer "why should we hire you", how to answer "what areas need improvement", how to answer “tell me about a time you received constructive feedback”, how to answer "what is your greatest accomplishment", dailyremote, remote work tips, remote job roles, remote jobs, remote job type, remote job resources, join 200k+ people getting remote jobs.
Trusted by over 200,000+ remote job seekers
Instantly share code, notes, and snippets.
olayinka04 / 1 - ExtractTransformLoad_V2.ipynb
- Download ZIP
- Star ( 8 ) 8 You must be signed in to star a gist
- Fork ( 2 ) 2 You must be signed in to fork a gist
- Embed Embed this gist in your website.
- Share Copy sharable link for this gist.
- Clone via HTTPS Clone using the web URL.
- Learn more about clone URLs
- Save olayinka04/975d41b85e3bed16aee9f5eeec3f3b5d to your computer and use it in GitHub Desktop.
ps2014132 commented Oct 1, 2023
This is very helpful, thanks a loads!! :)
Sorry, something went wrong.
IMAGES
VIDEO
COMMENTS
Peer-Review-Assignment---Data-Engineer---ETL. IBM Data Engineering course 3 final project. Objectives In this final part you will: Run the ETL process Extract bank and market cap data from the JSON file bank_market_cap.json Transform the market cap currency using the exchange rate data Load the transformed data into a seperate CSV.
IBM Data Engineering course 3 final project \n. Objectives\nIn this final part you will: \n. Run the ETL process\nExtract bank and market cap data from the JSON file bank_market_cap.json\nTransform the market cap currency using the exchange rate data\nLoad the transformed data into a seperate CSV
Peer Review Assignment - Data Engineer - ETL. Estimated time needed: 20 minutes. Objectives. In this final part you will: Run the ETL process; Extract bank and market cap data from the JSON file bank_market_cap.json; Transform the market cap currency using the exchange rate data;
Peer Review Assignment - Data Engineer - ETL. GitHub Gist: instantly share code, notes, and snippets.
The goal of this project is to generate a list of the largest banks, ranked by market capitalization, and represented in British Pounds. In this Data Engineering Python project I will: Demonstrate the ETL process. Extract currency exchange rates and bank market cap data from disperate sources. Transform the market cap currency using the ...
Contribute to rayhanozzy/Python-Project-for-Data-Engineering-Coursera development by creating an account on GitHub.
Abstract. Extract, transform, load (ETL) is an essential technique for integrating data from multiple sources into a data warehouse. ETL is applicable to data warehousing, big data, and business ...
{"cells":[{"cell_type":"markdown","metadata":{},"source":[" \n"," \n"," \n"," \n"," \n"]},{"cell_type":"markdown","metadata":{},"source":["# Peer Review Assignment ...
javiicc / ETL Engineer Peer.ipynb. Created December 18, 2021 11:07. Show Gist options. Download ZIP Star (2) 2 You must be signed in to star a gist; Fork (3) 3 You must be signed in to fork a gist; Embed. Embed Embed this gist in your website. Share Copy sharable link for this gist.
Peer Review Assignment - Data Engineer - ETL Estimated time needed: 20 minutes. Objectives In this final part you will: Run the ETL process Extract bank and market cap data from the JSON file bank_market_cap.json Transform the market cap currency using the exchange rate data Load the transformed data into a seperate CSV For this lab, we are ...
Objectives. In this part you will: Collect exchange rate data using an API. Store the data as a CSV. For this lab, we are going to be using Python and several Python libraries. Some of these libraries might be installed in your lab environment or in SN Labs. Others may need to be installed by you. The cells below will install these libraries ...
GitHub Gist: instantly share code, notes, and snippets.
The Data Engineer will be hired by the CDC Foundation and assigned to the Louisiana Department of Health, Office of Public Health. ... and peer review.Knowledge of data warehousing concepts and tools.Experience with cloud computing platforms.Expertise in data modeling, ETL (Extract, Transform, Load) processes, and data integration techniques ...
This is a Jupyter Notebook submission for a Peer Review Assignment to fulfill the Data Engineering course by IBM About No description, website, or topics provided.
Read the latest, in-depth Altair Monarch reviews from real users verified by Gartner Peer Insights, and choose your business software with confidence. Read the latest, in-depth Altair Monarch reviews from real users verified by Gartner Peer Insights, and choose your business software with confidence. ... rate_review Write a Review. download_2 ...
Status. Docs. Contact. Manage cookies. Do not share my personal information. DE IBM 3 - Python Project for Data Engineering. GitHub Gist: instantly share code, notes, and snippets.
Peer Review Assignment - Data Engineer - ETL. Contribute to samsonngov/Peer-Review-Assignment---Data-Engineer---ETL development by creating an account on GitHub.
Write better code with AI Code review. Manage code changes