Skip to content

Commit 8a2d81b

Browse files
authored
chore: complete migration of sqlalchemy-bigquery to librarian (#16750)
The README.rst change is just using a full copy of the parent README.rst instead of an include. (That is then consistent with all other packages.)
1 parent 403bcf5 commit 8a2d81b

3 files changed

Lines changed: 370 additions & 11 deletions

File tree

librarian.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4626,7 +4626,6 @@ libraries:
46264626
skip_readme_copy: true
46274627
- name: sqlalchemy-bigquery
46284628
version: 1.16.0
4629-
skip_generate: true
46304629
python:
46314630
library_type: INTEGRATION
46324631
name_pretty_override: SQLAlchemy dialect for BigQuery
Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,11 @@
11
{
2-
"name": "sqlalchemy-bigquery",
3-
"name_pretty": "SQLAlchemy dialect for BigQuery",
2+
"api_id": "bigquery.googleapis.com",
43
"client_documentation": "https://googleapis.dev/python/sqlalchemy-bigquery/latest/index.html",
5-
"release_level": "preview",
4+
"distribution_name": "sqlalchemy-bigquery",
65
"language": "python",
76
"library_type": "INTEGRATION",
8-
"repo": "googleapis/google-cloud-python",
9-
"distribution_name": "sqlalchemy-bigquery",
10-
"api_id": "bigquery.googleapis.com",
11-
"default_version": "",
12-
"codeowner_team": "@googleapis/python-core-client-libraries"
13-
}
7+
"name": "sqlalchemy-bigquery",
8+
"name_pretty": "SQLAlchemy dialect for BigQuery",
9+
"release_level": "stable",
10+
"repo": "googleapis/google-cloud-python"
11+
}

packages/sqlalchemy-bigquery/docs/README.rst

Lines changed: 0 additions & 1 deletion
This file was deleted.
Lines changed: 363 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,363 @@
1+
SQLAlchemy Dialect for BigQuery
2+
===============================
3+
4+
|GA| |pypi| |versions|
5+
6+
`SQLALchemy Dialects`_
7+
8+
- `Dialect Documentation`_
9+
- `Product Documentation`_
10+
11+
.. |GA| image:: https://img.shields.io/badge/support-GA-gold.svg
12+
:target: https://github.com/googleapis/google-cloud-python/blob/main/README.rst#general-availability
13+
.. |pypi| image:: https://img.shields.io/pypi/v/sqlalchemy-bigquery.svg
14+
:target: https://pypi.org/project/sqlalchemy-bigquery/
15+
.. |versions| image:: https://img.shields.io/pypi/pyversions/sqlalchemy-bigquery.svg
16+
:target: https://pypi.org/project/sqlalchemy-bigquery/
17+
.. _SQLAlchemy Dialects: https://docs.sqlalchemy.org/en/14/dialects/
18+
.. _Dialect Documentation: https://googleapis.dev/python/sqlalchemy-bigquery/latest
19+
.. _Product Documentation: https://cloud.google.com/bigquery/docs/
20+
21+
22+
Quick Start
23+
-----------
24+
25+
In order to use this library, you first need to go through the following steps:
26+
27+
1. `Select or create a Cloud Platform project.`_
28+
2. [Optional] `Enable billing for your project.`_
29+
3. `Enable the BigQuery Storage API.`_
30+
4. `Setup Authentication.`_
31+
32+
.. _Select or create a Cloud Platform project.: https://console.cloud.google.com/project
33+
.. _Enable billing for your project.: https://cloud.google.com/billing/docs/how-to/modify-project#enable_billing_for_a_project
34+
.. _Enable the BigQuery Storage API.: https://console.cloud.google.com/apis/library/bigquery.googleapis.com
35+
.. _Setup Authentication.: https://googleapis.dev/python/google-api-core/latest/auth.html
36+
37+
38+
Installation
39+
------------
40+
41+
Install this library in a `virtualenv`_ using pip. `virtualenv`_ is a tool to
42+
create isolated Python environments. The basic problem it addresses is one of
43+
dependencies and versions, and indirectly permissions.
44+
45+
With `virtualenv`_, it's possible to install this library without needing system
46+
install permissions, and without clashing with the installed system
47+
dependencies.
48+
49+
.. _`virtualenv`: https://virtualenv.pypa.io/en/latest/
50+
51+
52+
Supported Python Versions
53+
^^^^^^^^^^^^^^^^^^^^^^^^^
54+
Python >= 3.9, <3.14
55+
56+
Unsupported Python Versions
57+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
58+
Python <= 3.7.
59+
60+
61+
Mac/Linux
62+
^^^^^^^^^
63+
64+
.. code-block:: console
65+
66+
pip install virtualenv
67+
virtualenv <your-env>
68+
source <your-env>/bin/activate
69+
<your-env>/bin/pip install sqlalchemy-bigquery
70+
71+
72+
Windows
73+
^^^^^^^
74+
75+
.. code-block:: console
76+
77+
pip install virtualenv
78+
virtualenv <your-env>
79+
<your-env>\Scripts\activate
80+
<your-env>\Scripts\pip.exe install sqlalchemy-bigquery
81+
82+
83+
Installations when processing large datasets
84+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
85+
86+
When handling large datasets, you may see speed increases by also installing the
87+
`bqstorage` dependencies. See the instructions above about creating a virtual
88+
environment and then install `sqlalchemy-bigquery` using the `bqstorage` extras:
89+
90+
.. code-block:: console
91+
92+
source <your-env>/bin/activate
93+
<your-env>/bin/pip install sqlalchemy-bigquery[bqstorage]
94+
95+
96+
Usage
97+
-----
98+
99+
SQLAlchemy
100+
^^^^^^^^^^
101+
102+
.. code-block:: python
103+
104+
from sqlalchemy import *
105+
from sqlalchemy.engine import create_engine
106+
from sqlalchemy.schema import *
107+
engine = create_engine('bigquery://project')
108+
table = Table('dataset.table', MetaData(bind=engine), autoload=True)
109+
print(select([func.count('*')], from_obj=table().scalar()))
110+
111+
112+
Project
113+
^^^^^^^
114+
115+
``project`` in ``bigquery://project`` is used to instantiate BigQuery client with the specific project ID. To infer project from the environment, use ``bigquery://`` – without ``project``
116+
117+
Authentication
118+
^^^^^^^^^^^^^^
119+
120+
Follow the `Google Cloud library guide <https://google-cloud-python.readthedocs.io/en/latest/core/auth.html>`_ for authentication.
121+
122+
Alternatively, you can choose either of the following approaches:
123+
124+
* provide the path to a service account JSON file in ``create_engine()`` using the ``credentials_path`` parameter:
125+
126+
.. code-block:: python
127+
128+
# provide the path to a service account JSON file
129+
engine = create_engine('bigquery://', credentials_path='/path/to/keyfile.json')
130+
131+
* pass the credentials in ``create_engine()`` as a Python dictionary using the ``credentials_info`` parameter:
132+
133+
.. code-block:: python
134+
135+
# provide credentials as a Python dictionary
136+
credentials_info = {
137+
"type": "service_account",
138+
"project_id": "your-service-account-project-id"
139+
}
140+
engine = create_engine('bigquery://', credentials_info=credentials_info)
141+
142+
Location
143+
^^^^^^^^
144+
145+
To specify location of your datasets pass ``location`` to ``create_engine()``:
146+
147+
.. code-block:: python
148+
149+
engine = create_engine('bigquery://project', location="asia-northeast1")
150+
151+
152+
Table names
153+
^^^^^^^^^^^
154+
155+
To query tables from non-default projects or datasets, use the following format for the SQLAlchemy schema name: ``[project.]dataset``, e.g.:
156+
157+
.. code-block:: python
158+
159+
# If neither dataset nor project are the default
160+
sample_table_1 = Table('natality', schema='bigquery-public-data.samples')
161+
# If just dataset is not the default
162+
sample_table_2 = Table('natality', schema='bigquery-public-data')
163+
164+
Batch size
165+
^^^^^^^^^^
166+
167+
By default, ``arraysize`` is set to ``5000``. ``arraysize`` is used to set the batch size for fetching results. To change it, pass ``arraysize`` to ``create_engine()``:
168+
169+
.. code-block:: python
170+
171+
engine = create_engine('bigquery://project', arraysize=1000)
172+
173+
Page size for dataset.list_tables
174+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
175+
176+
By default, ``list_tables_page_size`` is set to ``1000``. ``list_tables_page_size`` is used to set the max_results for `dataset.list_tables`_ operation. To change it, pass ``list_tables_page_size`` to ``create_engine()``:
177+
178+
.. _`dataset.list_tables`: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/list
179+
.. code-block:: python
180+
181+
engine = create_engine('bigquery://project', list_tables_page_size=100)
182+
183+
Adding a Default Dataset
184+
^^^^^^^^^^^^^^^^^^^^^^^^
185+
186+
If you want to have the ``Client`` use a default dataset, specify it as the "database" portion of the connection string.
187+
188+
.. code-block:: python
189+
190+
engine = create_engine('bigquery://project/dataset')
191+
192+
When using a default dataset, don't include the dataset name in the table name, e.g.:
193+
194+
.. code-block:: python
195+
196+
table = Table('table_name')
197+
198+
Note that specifying a default dataset doesn't restrict execution of queries to that particular dataset when using raw queries, e.g.:
199+
200+
.. code-block:: python
201+
202+
# Set default dataset to dataset_a
203+
engine = create_engine('bigquery://project/dataset_a')
204+
205+
# This will still execute and return rows from dataset_b
206+
engine.execute('SELECT * FROM dataset_b.table').fetchall()
207+
208+
209+
Connection String Parameters
210+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
211+
212+
There are many situations where you can't call ``create_engine`` directly, such as when using tools like `Flask SQLAlchemy <http://flask-sqlalchemy.pocoo.org/2.3/>`_. For situations like these, or for situations where you want the ``Client`` to have a `default_query_job_config <https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.client.Client.html#google.cloud.bigquery.client.Client>`_, you can pass many arguments in the query of the connection string.
213+
214+
The ``credentials_path``, ``credentials_info``, ``credentials_base64``, ``location``, ``arraysize`` and ``list_tables_page_size`` parameters are used by this library, and the rest are used to create a `QueryJobConfig <https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.job.QueryJobConfig.html#google.cloud.bigquery.job.QueryJobConfig>`_
215+
216+
Note that if you want to use query strings, it will be more reliable if you use three slashes, so ``'bigquery:///?a=b'`` will work reliably, but ``'bigquery://?a=b'`` might be interpreted as having a "database" of ``?a=b``, depending on the system being used to parse the connection string.
217+
218+
Here are examples of all the supported arguments. Any not present are either for legacy sql (which isn't supported by this library), or are too complex and are not implemented.
219+
220+
.. code-block:: python
221+
222+
engine = create_engine(
223+
'bigquery://some-project/some-dataset' '?'
224+
'credentials_path=/some/path/to.json' '&'
225+
'location=some-location' '&'
226+
'arraysize=1000' '&'
227+
'list_tables_page_size=100' '&'
228+
'clustering_fields=a,b,c' '&'
229+
'create_disposition=CREATE_IF_NEEDED' '&'
230+
'destination=different-project.different-dataset.table' '&'
231+
'destination_encryption_configuration=some-configuration' '&'
232+
'dry_run=true' '&'
233+
'labels=a:b,c:d' '&'
234+
'maximum_bytes_billed=1000' '&'
235+
'priority=INTERACTIVE' '&'
236+
'schema_update_options=ALLOW_FIELD_ADDITION,ALLOW_FIELD_RELAXATION' '&'
237+
'use_query_cache=true' '&'
238+
'write_disposition=WRITE_APPEND'
239+
)
240+
241+
In cases where you wish to include the full credentials in the connection URI you can base64 the credentials JSON file and supply the encoded string to the ``credentials_base64`` parameter.
242+
243+
.. code-block:: python
244+
245+
engine = create_engine(
246+
'bigquery://some-project/some-dataset' '?'
247+
'credentials_base64=eyJrZXkiOiJ2YWx1ZSJ9Cg==' '&'
248+
'location=some-location' '&'
249+
'arraysize=1000' '&'
250+
'list_tables_page_size=100' '&'
251+
'clustering_fields=a,b,c' '&'
252+
'create_disposition=CREATE_IF_NEEDED' '&'
253+
'destination=different-project.different-dataset.table' '&'
254+
'destination_encryption_configuration=some-configuration' '&'
255+
'dry_run=true' '&'
256+
'labels=a:b,c:d' '&'
257+
'maximum_bytes_billed=1000' '&'
258+
'priority=INTERACTIVE' '&'
259+
'schema_update_options=ALLOW_FIELD_ADDITION,ALLOW_FIELD_RELAXATION' '&'
260+
'use_query_cache=true' '&'
261+
'write_disposition=WRITE_APPEND'
262+
)
263+
264+
To create the base64 encoded string you can use the command line tool ``base64``, or ``openssl base64``, or ``python -m base64``.
265+
266+
Alternatively, you can use an online generator like `www.base64encode.org <https://www.base64encode.org>_` to paste your credentials JSON file to be encoded.
267+
268+
269+
Supplying Your Own BigQuery Client
270+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
271+
272+
The above connection string parameters allow you to influence how the BigQuery client used to execute your queries will be instantiated.
273+
If you need additional control, you can supply a BigQuery client of your own:
274+
275+
.. code-block:: python
276+
277+
from google.cloud import bigquery
278+
279+
custom_bq_client = bigquery.Client(...)
280+
281+
engine = create_engine(
282+
'bigquery://some-project/some-dataset?user_supplied_client=True',
283+
connect_args={'client': custom_bq_client},
284+
)
285+
286+
287+
Creating tables
288+
^^^^^^^^^^^^^^^
289+
290+
To add metadata to a table:
291+
292+
.. code-block:: python
293+
294+
table = Table('mytable', ...,
295+
bigquery_description='my table description',
296+
bigquery_friendly_name='my table friendly name',
297+
bigquery_default_rounding_mode="ROUND_HALF_EVEN",
298+
bigquery_expiration_timestamp=datetime.datetime.fromisoformat("2038-01-01T00:00:00+00:00"),
299+
)
300+
301+
To add metadata to a column:
302+
303+
.. code-block:: python
304+
305+
Column('mycolumn', doc='my column description')
306+
307+
To create a clustered table:
308+
309+
.. code-block:: python
310+
311+
table = Table('mytable', ..., bigquery_clustering_fields=["a", "b", "c"])
312+
313+
To create a time-unit column-partitioned table:
314+
315+
.. code-block:: python
316+
317+
from google.cloud import bigquery
318+
319+
table = Table('mytable', ...,
320+
bigquery_time_partitioning=bigquery.TimePartitioning(
321+
field="mytimestamp",
322+
type_="MONTH",
323+
expiration_ms=1000 * 60 * 60 * 24 * 30 * 6, # 6 months
324+
),
325+
bigquery_require_partition_filter=True,
326+
)
327+
328+
To create an ingestion-time partitioned table:
329+
330+
.. code-block:: python
331+
332+
from google.cloud import bigquery
333+
334+
table = Table('mytable', ...,
335+
bigquery_time_partitioning=bigquery.TimePartitioning(),
336+
bigquery_require_partition_filter=True,
337+
)
338+
339+
To create an integer-range partitioned table
340+
341+
.. code-block:: python
342+
343+
from google.cloud import bigquery
344+
345+
table = Table('mytable', ...,
346+
bigquery_range_partitioning=bigquery.RangePartitioning(
347+
field="zipcode",
348+
range_=bigquery.PartitionRange(start=0, end=100000, interval=10),
349+
),
350+
bigquery_require_partition_filter=True,
351+
)
352+
353+
354+
Threading and Multiprocessing
355+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
356+
357+
Because this client uses the `grpc` library, it's safe to
358+
share instances across threads.
359+
360+
In multiprocessing scenarios, the best
361+
practice is to create client instances *after* the invocation of
362+
`os.fork` by `multiprocessing.pool.Pool` or
363+
`multiprocessing.Process`.

0 commit comments

Comments
 (0)