Working with the Data Warehouse
The Data Warehouse in PowerView is a database that gathers data from various sources and stores it in a format that is optimized for reporting. Some of the data within the Data Warehouse is updated on a regular basis, while other data can be generated withing the Data Warehouse itself (such as KPIs).
The Data Warehouse is a Microsoft SQL database. To interface with the Data from python, we use the `pyodbc`_ library. This library allows us to connect to the Data Warehouse and execute SQL queries, but this means that you can use the functionality provided in this library to connect to any SQL database, e.g. the “Model Results” database.
## pyodbc and drivers
The `pyodbc`_ library is a python library that allows you to connect to SQL databases. However, it does not provide the drivers that are needed to connect to the databases. The drivers are provided by the database vendors and must be installed separately. For example, to connect to the Data Warehouse, you need to install the Microsoft SQL Server ODBC Driver. This driver is available for download from the Microsoft website as descibed (here)[https://learn.microsoft.com/en-us/sql/connect/python/pyodbc/step-1-configure-development-environment-for-pyodbc-python-development?view=sql-server-ver16&tabs=macos].
When you initiate the DWH class, it will automatically try to connect using the first driver that it finds. If you have multiple drivers installed, you can specify which driver to use by passing the driver argument to the DWH as an input argument (integer). If you are not certain of the driver number, you can use the list_drivers method to list all the drivers that are available on your system, but only after you have initiated the DWH class.
## Connecting to the Data Warehouse
To connect to the Data Warehouse, you need to know the name of the server and the database, as well as your username and password. These are all provided to you by the PowerView team. Once you have this information, you can connect by initating the DWH class like this:
from pyprediktormapclient.dwh import DWH
dwh = DWH(server='server_name', database='database_name',
username='username', password='password')
The DWH class has a connect method that will automatically connect to the database when you initiate the class and the other functions within the class will reconnect if the connection is lost. The DWH class also has a disconnect method that will close the connection to the database, also automatically run with the __exit__ method.
The object returned by the DWH class is a pyodbc.Connection object, so you can use any of the methods provided by that class. For example, you can use the cursor method to create a cursor object that can be used to execute SQL or anythign else as (documented in the official project)[https://github.com/mkleehammer/pyodbc/wiki].
cursor = dwh.cursor()
cursor.execute('SELECT * FROM dbo.DimDate')
cursor.fetchall()
## Getting Data from the Data Warehouse
The DWH class has a number of methods that can be used to read and write data from/to the Data Warehouse. These methods are documented in the API reference, but here are some examples of how to use them.
### Reading Data
The DWH class has a read method that can be used to read data from the Data Warehouse. This method takes a SQL query as a string and returns the result as a list with all the results. If you want to read the data into a pandas DataFrame, you can use the read_to_dataframe method, which will return a pandas DataFrame. Also, if you want to read the result line by line, you can use the call the cursor method directly and iterate over the results with fetchone from pyodbc.
# Read data into a list
data = dwh.read('SELECT * FROM dbo.DimDate')
# Read data into a pandas DataFrame
df = dwh.read_to_dataframe('SELECT * FROM dbo.DimDate')
# Read data line by line
cursor = dwh.cursor()
cursor.execute('SELECT * FROM dbo.DimDate')
while True:
row = cursor.fetchone()
if row == None:
break
print(row)
### Writing Data
The DWH class has a write method that can be used to write data to the Data Warehouse. This method takes a SQL query as a string and executes it. If you add the commit=True argument, the changes will be committed to the database or rolled back if an error occurs. Alternatively, you can use the commit method to commit the changes to the database. Failing to commit the changes will result in the changes being lost.
# Write data to the database and commit the changes
dwh.write('INSERT INTO dbo.DimDate VALUES (\'2018-01-01\')', commit=True)
# or in two steps that would potentially allow you to read the data before
# committing the changes
dwh.write('INSERT INTO dbo.DimDate VALUES (\'2018-01-01\')')
dwh.commit()