30 Days DevOps Challenge - NBA Player Stats
#Week1-Day3 #DevOpsAllStarsChallenge
Automate Deployment of Azure Data Factory using Python and Creating pipelines for Data Factory
In this blog post, we'll walk you through an exciting DevOps challenge from Week 1 - Day 3, where we automate the creation and configuration of an Azure Storage Account and Blob Container. This setup enables public access and runs an additional script (adf.py
) to create an Azure Data Factory. The Data Factory pulls information from sportsapi.io, transforms the data by removing unnecessary details, and then stores it in a blob container.
Prerequisites
Before we dive in, ensure you have the following:
Python 3.x
Azure SDK for Python
dotenv
package for loading environment variablesAn Azure subscription with appropriate permissions
Installation
Clone the repository:
git clone https://github.com/annoyedalien/week1-day3.git cd week1-day3
Create a virtual environment and activate it:
python -m venv (your venv) source (your venv)\bin\activate
Install the required Python packages:
pip install -r requirements.txt
Create a
.env
file in the root directory of the project and add the following environment variables:AZURE_SUBSCRIPTION_ID=your_subscription_id RESOURCE_GROUP_NAME=your_resource_group_name STORAGE_ACCOUNT_NAME=your_storage_account_name LOCATION=your_location CONTAINER_NAME=your_container_name DATA_FACTORY_NAME=your_datafactory_name REST_API_URL=https://api.sportsdata.io/v3/nba/scores/json/Players SUBSCRIPTION_KEY=your_api_key LS_REST_NAME=linked_service_rest_name LS_BLOB_NAME=linked_service_blob_name
Usage
Run the script:
python main.py
The script will:
Check if the specified resource group exists and create it if it doesn't.
Check if the specified storage account name is available and create the storage account if it doesn't exist.
Enable public access on the storage account.
Create a blob container with anonymous access if it doesn't exist.
Run the
adf.py
script as a subprocess.
The adf.py
script will:
Create an Azure Data Factory, linked services, datasets, and pipelines.
Use
config.py
to define the properties of the resources to be created.
Script Details
Resource Management: Uses
ResourceManagementClient
to manage Azure resource groups.Storage Management: Uses
StorageManagementClient
to manage Azure storage accounts.Blob Service: Uses
BlobServiceClient
to manage blob containers and enable public access.Environment Variables: Loads configuration from a
.env
file using thedotenv
package.Subprocess: Runs an additional script (
adf.py
) after setting up the storage account and container.
adf.py
and config.py
adf.py
: Contains the creation of Azure Data Factory along with its linked services, datasets, and pipelines.config.py
: Contains the properties of the linked services, datasets, and pipelines.
Launch Data Factory Studio
After running the script, launch Data Factory Studio in Azure and run the pipeline. Navigate into the storage account, container, and check the blob received from the data factory.
Click on Author
Choose Pipeline
Run Debug
Azure Data Factory
Letโs analyze what happened after manually running the pipeline.
The Rest Dataset invokes a Get request from sportsdata.io with the help of the Linked service we created.
By clicking on Preview Data
we can see all the information requested.
In a scenario where some of the information gathered are not required, we need to transform the data before it goes to a sink.
With the help of Mapping, we can import the schema and remove the unnecessary information.
The mapping schema is included on the config.py which is called by adf.py upon creation
The output is then sent to the Storage Account container with a blob named player.json
By following this guide, you'll be able to automate the process of pulling NBA player profile stats from sportsapi.io, transforming the data, and storing it in an Azure Blob Container. Happy coding! ๐๐
Clean Up
Clean up the resources by deleting the Resource Groups created
az group list
az group delete --name [Resource Group Name]
Special Thanks Alicia Ahl for the project
check out her video here https://www.youtube.com/watch?v=RAkMac2QgjM&t=0s