diff --git a/notebooks/iceberg.ipynb b/notebooks/iceberg.ipynb new file mode 100644 index 0000000..849e088 --- /dev/null +++ b/notebooks/iceberg.ipynb @@ -0,0 +1,138 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Writing to Apache Iceberg\n", + "\n", + "1. Insure you have a Google Bigquery service account key on disk\n", + "2. The service key location is set as an environment variable **BQ_KEY**\n", + "3. The dataset will be automatically created within the project associated with the service key\n", + "\n", + "The cell below creates a dataframe that will be stored within Google Bigquery" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['data transport version ', '2.4.0']\n" + ] + } + ], + "source": [ + "#\n", + "# Writing to Google Bigquery database\n", + "#\n", + "import transport\n", + "from transport import providers\n", + "import pandas as pd\n", + "import os\n", + "\n", + "PRIVATE_KEY = os.environ['BQ_KEY'] #-- location of the service key\n", + "DATASET = 'demo'\n", + "_data = pd.DataFrame({\"name\":['James Bond','Steve Rogers','Steve Nyemba'],'age':[55,150,44]})\n", + "# bqw = transport.get.writer(provider=providers.ICEBERG,catalog='mz',database='edw.mz',table='friends')\n", + "bqw = transport.get.writer(provider=providers.ICEBERG,table='edw.mz.friends')\n", + "bqw.write(_data,if_exists='replace') #-- default is append\n", + "print (['data transport version ', transport.__version__])\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Reading from Google Bigquery\n", + "\n", + "The cell below reads the data that has been written by the cell above and computes the average age within a Google Bigquery (simple query). \n", + "\n", + "- Basic read of the designated table (friends) created above\n", + "- Execute an aggregate SQL against the table\n", + "\n", + "**NOTE**\n", + "\n", + "By design **read** object are separated from **write** objects in order to avoid accidental writes to the database.\n", + "Read objects are created with **transport.get.reader** whereas write objects are created with **transport.get.writer**" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " name age\n", + "0 James Bond 55\n", + "1 Steve Rogers 150\n", + "2 Steve Nyemba 44\n", + "--------- STATISTICS ------------\n" + ] + } + ], + "source": [ + "\n", + "import transport\n", + "from transport import providers\n", + "import os\n", + "PRIVATE_KEY=os.environ['BQ_KEY']\n", + "pgr = transport.get.reader(provider=providers.ICEBERG,database='edw.mz')\n", + "_df = pgr.read(table='friends')\n", + "_query = 'SELECT COUNT(*) _counts, AVG(age) from friends'\n", + "_sdf = pgr.read(sql=_query)\n", + "print (_df)\n", + "print ('--------- STATISTICS ------------')\n", + "# print (_sdf)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "An **auth-file** is a file that contains database parameters used to access the database. \n", + "For code in shared environments, we recommend \n", + "\n", + "1. Having the **auth-file** stored on disk \n", + "2. and the location of the file is set to an environment variable.\n", + "\n", + "To generate a template of the **auth-file** open the **file generator wizard** found at visit https://healthcareio.the-phi.com/data-transport" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}