Option |
Description |
url |
The JDBC URL to connect to. The source-specific connection properties may be specified in the URL. For example: jdbc:postgresql://localhost/analytics?user=pg&password=secret . See chapter 8 for examples. |
dbtable |
The JDBC table that should be read. Anything that is valid in a FROM clause of an SQL query can be used. Instead of a table, you could also use a subquery in parentheses. See chapter 8 for examples. |
driver |
The class name of the JDBC driver to use to connect to this URL. See chapter 8 for examples. |
partitionColumn , lowerBound , upperBound |
These options must all be specified if any of them is specified. In addition, numPartitions must be specified. They describe how to partition the table when reading in parallel from multiple workers. partitionColumn must be a numeric column from the table in question. Note that lowerBound and upperBound are used to decide the partition stride, not to filter rows in the table. All rows in the table will be partitioned and returned. This option applies only to reading. See chapter 8 for examples. |
numPartitions |
The maximum number of partitions that can be used for parallelism in table reading and writing. This also determines the maximum number of concurrent JDBC connections. If the number of partitions to write exceeds this limit, Spark decreases it to this limit by calling coalesce(numPartitions) before writing. See chapter 8 for an example on reading. |
fetchsize |
The JDBC fetch size, which determines how many rows to fetch per round trip. This can help performance on JDBC drivers that default to low fetch size (for example, Oracle with 10 rows). This option applies only to reading. |
batchsize |
The JDBC batch size, which determines how many rows to insert per round trip. This can help performance on JDBC drivers. This option applies only to writing. It defaults to 1000 . |
isolationLevel |
The transaction isolation level, which applies to the current connection. It can be NONE , READ_COMMITTED , READ_UNCOMMITTED , REPEATABLE_READ , or SERIALIZABLE , corresponding to standard transaction isolation levels defined by JDBC's Connection object, with a default of READ_UNCOMMITTED . This option applies only to writing. Refer to the documentation in java.sql.Connection. |
sessionInitStatement |
After each database session is opened to the remote database and before starting to read data, this option executes a custom SQL statement (or a PL/SQL block). Use this to implement session initialization code. Example: option("sessionInitStatement", """BEGIN execute immediate 'alter session set "_serial_direct_read"=true'; END;""") |
truncate |
When SaveMode.Overwrite is enabled, this option causes Spark to truncate an existing table instead of dropping and re-creating it. This can be more efficient and prevents the table metadata (for example, indices) from being removed. However, it will not work in some cases, such as when the new data has a different schema. It defaults to false . This option applies only to writing. |
createTableOptions |
If specified, this option allows setting of database-specific table and partition options when creating a table. This option applies only to writing. Example: CREATE TABLE t (name string) ENGINE=InnoDB |
createTableColumnTypes |
The database column datatypes to use, instead of the defaults, when creating the table. Datatype information should be specified in the same format as CREATE TABLE column syntax; for example: name CHAR(64), comments VARCHAR(1024) . The specified types should be valid Spark SQL datatypes. This option applies only to writing. |
customSchema |
The custom schema to use for reading data from JDBC connectors. For example, id DECIMAL(38, 0), name STRING . You can also specify partial fields, and the others use the default type mapping; for example: id DECIMAL(38, 0) . The column names should be identical to the corresponding column names of the JDBC table. Users can specify the corresponding datatypes of Spark SQL instead of using the defaults. This option applies only to reading. |