A day in the life of a BI Developer: August 2012

Monday, August 20, 2012

Task 13: OLAP vs OLTP

I read a lot about these differences between Transaction Processing and Analytical Processing but today I better clarified myself the concepts. The idea is a simple one:

We can divide IT systems into transactional (OLTP) and analytical (OLAP). In general we can assume that OLTP systems provide source data to data warehouses, whereas OLAP systems help to analyze it.

Transaction is achieved by the concept called OLTP.
Analysis is achieved by the concept called OLAP.

Online Transaction Processing is driven by Operational needs and Online Analytical Processing is driven by Strategic needs. Collective OLAP is termed into one concept knows as BI. There are some OLTP applications like: reservation systems, point of sales, inventory control, automated tellers, manufacturing systems. The requirements are current data, rapid data update, high availability. As client/server systems: DB2, Oracle,Access, SQL Server, Departmental or small business data, normalized design.

You need to know that OLTP systems are not easy to query because ad-hoc queries do bad things to transactional systems and its performance but the problem is solved by ETL process and OLAP. Data is extracted from various sources, the extracted data is transformed into related information for advanced BI processing and finally it is loaded into destinations as tables or materialized views.

These materialized views are database objects that contains the result of a query.Basically it is a process by which necessary data is collected from various remote tables and stored into one local area. The original data in the remote system is safe from changes because the materialized views are read-only and can also be synchronized periodically with the main data, thus keeping them updated.

OLAP starts from where ETL process ends. It is a group of technologies and applications that collect, manage, process and present multidimensional data for analysis and management purposes, making use of the information from ETL processes output. OLAP effectively makes use of ETL process output which is mostly of materialized views.

- OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second. In OLTP database there is detailed and current data, and schema used to store transactional databases is the entity model (usually 3NF).

- OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems a response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema).

Sunday, August 12, 2012

Task 16: Replication

Replication

Replication is the mechanism for creating and maintaining multiple copies of the same data. Replication allows:
• Multiple copies of data can be kept in sync. Load balancing. Replication allows you to disseminate your data to a number of servers and then distribute the query load among those servers.
• Allows data be closer to users, improving performance. Redundancy. Replication allows you to build a fail-over database server that’s ready to pick up the processing load at a moment’s notice.
• Allows a branch office to work when not connected to the main office. Offline processing. You may wish to manipulate data from your database on a machine that is not always connected to the network.
• Separate process and offload work from production servers

What are the Components of Replication

Following are the important components in replication setup
• A publication contains articles.
• The publisher either distributes the magazine directly or uses a distributor.
• Subscribers receive publications to which they have subscribed.

What are the Server Roles Involved in Replication ?

SQL Servers can be assigned three different roles in a replication topology:
• Publishers create and modify data. Publishers have data to offer to other servers. Any given replication scheme may have one or more publishers.
• Distributors store replication-specific data associated with one or more publishers.
• A subscriber holds a replica copy of the data. Subscribers are database servers that wish to receive updates from the Publisher when data is modified.

What are the type of replication available in SQL Server ?

SQL Server supports

Snapshot replication sends the entire data set to the subscriber. The publisher simply takes a snapshot of the entire replicated database and shares it with the subscribers. You don't use this replication type for databases that change frequently.
Transactional replication only sends modifications to the data. The replication agent monitors the publisher for changes to the database and transmits those changes to the subscribers.
Merge replication items are modified at both the publisher and subscribers. Allows the publisher and subscriber to independently make changes to the database. Both entities can work without an active network connection. When they are reconnected, the merge replication agent checks for changes on both sets of data and modifies each database accordingly.

Here you can find how to configure the data replication in SQL Server 2008

You may wish to implement replication in your organization for a number of reasons, such as:

Load balancing. Replication allows you to disseminate your data to a number of servers and then distribute the query load among those servers.
Offline processing. You may wish to manipulate data from your database on a machine that is not always connected to the network.
Redundancy. Replication allows you to build a fail-over database server that’s ready to pick up the processing load at a moment’s notice.

Thursday, August 9, 2012

Task 15: Maintenance Plan

To create a MaintenancePlan is pretty easy because there is a wizard there and the tasks are already defined.

Example of MaintenancePlan created on the project I work for:

Maintenance plans can be created to perform the following tasks:

Reorganize the data on the data and index pages by rebuilding indexes with a new fill factor. Rebuilding indexes with a new fill factor makes sure that database pages contain an equally distributed amount of data and free space. It also enables faster growth in the future. For more information, see Specify Fill Factor for an Index.
Compress data files by removing empty database pages.
Update index statistics to make sure the query optimizer has current information about the distribution of data values in the tables. This enables the query optimizer to make better judgments about the best way to access data, because it has more information about the data stored in the database. Although index statistics are automatically updated by SQL Server periodically, this option can force the statistics to update immediately.
Perform internal consistency checks of the data and data pages within the database to make sure that a system or software problem has not damaged data.
Back up the database and transaction log files. Database and log backups can be retained for a specified period. This lets you create a history of backups to be used if you have to restore the database to a time earlier than the last database backup. You can also perform differential backups.
Run SQL Server Agent jobs. This can be used to create jobs that perform a variety of actions and the maintenance plans to run those jobs.

Task 14: Indexes

Here you can find a link that provide you with detailed information about indexes:

Index Overview, Index Architecture, Clustered Indexes, Nonclustered Indexes, Covering Indexes, How Indexes Improve Performance, Index Uniqueness, Data Pages & Extents, How Indexes Are Stored, The Relationship Between Index Size and Page Size, Other Things Stored in the Index, Retrieving Database Page Metadata, Index Statistics, Best Practices on How to Design Database Indexes, Maintaining Indexes for Top Performance

Index fragmentation is a subject that I would like to cover next.

The SQL Server Database Engine automatically maintains indexes whenever insert, update, or delete operations are made to the underlying data. Over time these modifications can cause the information in the index to become scattered in the database (fragmented). Fragmentation exists when indexes have pages in which the logical ordering, based on the key value, does not match the physical ordering inside the data file. Heavily fragmented indexes can degrade query performance and cause your application to respond slowly.

Detecting Fragmentation

The first step in deciding which defragmentation method to use is to analyze the index to determine the degree of fragmentation. By using the system function sys.dm_db_index_physical_stats, you can detect fragmentation in a specific index, all indexes on a table or indexed view, all indexes in a database, or all indexes in all databases. For partitioned indexes, sys.dm_db_index_physical_stats also provides fragmentation information for each partition.

The result set returned by the sys.dm_db_index_physical_stats function includes the following columns.

avg_fragmentation_in_percent value > 5% and < = 30% ALTER INDEX REORGANIZE

avg_fragmentation_in_percent value > 30% ALTER INDEX REBUILD WITH (ONLINE = ON)

USE AdventureWorks2012;
GO
-- Find the average fragmentation percentage of all indexes
-- in the HumanResources.Employee table.
SELECT a.index_id, name, avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats (DB_ID(N'AdventureWorks2012'), OBJECT_ID(N'HumanResources.Employee'), NULL, NULL, NULL) AS a
JOIN sys.indexes AS b ON a.object_id = b.object_id AND a.index_id = b.index_id;
GO
-- Reorganize the IX_Employee_OrganizationalLevel_OrganizationalNode index on the HumanResources.Employee table.
ALTER INDEX IX_Employee_OrganizationalLevel_OrganizationalNode ON HumanResources.Employee
REORGANIZE ;
GO
-- Reorganize all indexes on the HumanResources.Employee table.
ALTER INDEX ALL ON HumanResources.Employee
REORGANIZE ;
GO
ALTER INDEX PK_Employee_BusinessEntityID ON HumanResources.Employee
REBUILD;
GO

B-tree structure of a SQL Server index

Wednesday, August 8, 2012

Task 12: Report in Power View

I found a cool Sharepoint tutorial which learn you how to Create a Sample Report in Power View
http://technet.microsoft.com/en-us/library/hh759325%28v=sql.110%29.aspx

First of all I got many access problems with Sharepoint site:
1. Trying to create a new folder in Shared docuemnts I got the following error:

2. In New Document I could not found the Report Data Source option. (If there is no Report Data Source option on the New Document tab, then someone with adequate permissions on the

SharePoint site needs to add that content type to the site. For more information, see Add Report Server Content Types to a Library (Reporting Services in SharePoint Integrated Mode).)

The issues were resolved by our Configuration Engineer.

You can use this tutorial anytime when you need to create a simple power view report using data source

saving the excel file as SharePoint document library of PowerPivot Gallery.

Task 11: Administering Microsoft SQL Server 2012 Databases certification

I passed the Querying Microsoft SQL Server 2012 certification two days ago and I still dream about all 13 questions I didn't answered correctly or completely. I'll continue with the next 2 certifications:

For 70-462 there is a Training Kit available to free download.