data-transport/bigquery.ipynb at f5187790ced0b23c820738467476e20fe8c11825

4.5 KiB

Raw Blame History

None <html lang="en"> <head> </head>

Writing to Google Bigquery¶

Insure you have a Google Bigquery service account key on disk
The service key location is set as an environment variable BQ_KEY
The dataset will be automatically created within the project associated with the service key

The cell below creates a dataframe that will be stored within Google Bigquery

In [1]:

#
# Writing to Google Bigquery database
#
import transport
from transport import providers
import pandas as pd
import os

PRIVATE_KEY = os.environ['BQ_KEY'] #-- location of the service key
DATASET = 'demo'
_data = pd.DataFrame({"name":['James Bond','Steve Rogers','Steve Nyemba'],'age':[55,150,44]})
bqw = transport.factory.instance(provider=providers.BIGQUERY,dataset=DATASET,table='friends',context='write',private_key=PRIVATE_KEY)
bqw.write(_data,if_exists='replace') #-- default is append
print (['data transport version ', transport.__version__])

100%|██████████| 1/1 [00:00<00:00, 5440.08it/s]

['data transport version ', '2.0.0']

Reading from Google Bigquery¶

The cell below reads the data that has been written by the cell above and computes the average age within a Google Bigquery (simple query).

Basic read of the designated table (friends) created above
Execute an aggregate SQL against the table

NOTE

It is possible to use transport.factory.instance or transport.instance they are the same. It allows the maintainers to know that we used a factory design pattern.

In [2]:

import transport
from transport import providers
import os
PRIVATE_KEY=os.environ['BQ_KEY']
pgr = transport.instance(provider=providers.BIGQUERY,dataset='demo',table='friends',private_key=PRIVATE_KEY)
_df = pgr.read()
_query = 'SELECT COUNT(*) _counts, AVG(age) from demo.friends'
_sdf = pgr.read(sql=_query)
print (_df)
print ('--------- STATISTICS ------------')
print (_sdf)

Downloading: 100%|██████████|
Downloading: 100%|██████████|
           name  age
0    James Bond   55
1  Steve Rogers  150
2  Steve Nyemba   44
--------- STATISTICS ------------
   _counts   f0_
0        3  83.0

The cell bellow show the content of an auth_file, in this case if the dataset/table in question is not to be shared then you can use auth_file with information associated with the parameters.

NOTE:

The auth_file is intended to be JSON formatted

In [3]:

{
    
    "dataset":"demo","table":"friends"
}

Out[3]:

{'dataset': 'demo', 'table': 'friends'}

In [ ]:

</html>

4.5 KiB Raw Blame History

Writing to Google Bigquery¶

Reading from Google Bigquery¶

4.5 KiB

Raw Blame History