4.5 KiB
4.5 KiB
None
<html lang="en">
<head>
</head>
</html>
Writing to Google Bigquery¶
- Insure you have a Google Bigquery service account key on disk
- The service key location is set as an environment variable BQ_KEY
- The dataset will be automatically created within the project associated with the service key
The cell below creates a dataframe that will be stored within Google Bigquery
In [1]:
# # Writing to Google Bigquery database # import transport from transport import providers import pandas as pd import os PRIVATE_KEY = os.environ['BQ_KEY'] #-- location of the service key DATASET = 'demo' _data = pd.DataFrame({"name":['James Bond','Steve Rogers','Steve Nyemba'],'age':[55,150,44]}) bqw = transport.factory.instance(provider=providers.BIGQUERY,dataset=DATASET,table='friends',context='write',private_key=PRIVATE_KEY) bqw.write(_data,if_exists='replace') #-- default is append print (['data transport version ', transport.__version__])
100%|██████████| 1/1 [00:00<00:00, 5440.08it/s]
['data transport version ', '2.0.0']
Reading from Google Bigquery¶
The cell below reads the data that has been written by the cell above and computes the average age within a Google Bigquery (simple query).
- Basic read of the designated table (friends) created above
- Execute an aggregate SQL against the table
NOTE
It is possible to use transport.factory.instance or transport.instance they are the same. It allows the maintainers to know that we used a factory design pattern.
In [2]:
import transport from transport import providers import os PRIVATE_KEY=os.environ['BQ_KEY'] pgr = transport.instance(provider=providers.BIGQUERY,dataset='demo',table='friends',private_key=PRIVATE_KEY) _df = pgr.read() _query = 'SELECT COUNT(*) _counts, AVG(age) from demo.friends' _sdf = pgr.read(sql=_query) print (_df) print ('--------- STATISTICS ------------') print (_sdf)
Downloading: 100%|██████████| Downloading: 100%|██████████| name age 0 James Bond 55 1 Steve Rogers 150 2 Steve Nyemba 44 --------- STATISTICS ------------ _counts f0_ 0 3 83.0
The cell bellow show the content of an auth_file, in this case if the dataset/table in question is not to be shared then you can use auth_file with information associated with the parameters.
NOTE:
The auth_file is intended to be JSON formatted
In [3]:
{ "dataset":"demo","table":"friends" }
Out[3]:
{'dataset': 'demo', 'table': 'friends'}
In [ ]: