Quantcast
Channel: Informatics @ Northwestern Weblog » EDW
Viewing all articles
Browse latest Browse all 10

Monitoring Oracle GoldenGate Replicats on Windows

$
0
0

Table of Contents


Goals
Overview
Monitoring Replicat Status
Extracting Replicat Configuration
Monitoring Status by Table
Conclusion
Credits & References:

Goals


We want to set up some processes to help us better monitor Oracle GoldenGate replicats on Windows and also let us relate replicat status to a specific table object.

  • Push replicat status to a table in Microsoft SQL Server (MSSQL)
  • Parse replicat configuration information and store that in MSSQL
  • Create a view that will help us communicate both status and lag per table

Overview:


If you’ve played with Oracle GoldenGate you’re already familiar with the fact that it’s very “command line-y” and “config file-y” in nature. That’s fine – it’s simple, portable, and probably just easier for Oracle. It poses a few challenges when you want to do a couple of things, however:

  • Review replicat status – Are they running? Abended? Stopped?
  • Review the configuration of your replicats – Which replicat is handling specific tables?
  • Answer questions like “Hey, do we have the latest demographic data from the HRMS?”

I should probably point out we’re in sort of a strange place when it comes to GoldenGate. We don’t own the source database (Oracle) and we don’t own the origin extract/pump. We only manage one of the two destination replica targets – which is a Microsoft SQL Server 2008R2 setup. That means we don’t get access to the Oracle-provided monitoring tools – we can only see our own replicats.

Our team tends to be very Type-A. We like to know what’s what and it’s not acceptable to us that the configuration and status of a mission-critical system be a black box or require manual intervention for basic monitoring.

We identified a few key needs:

Review Replicat Status

  • List out each replicat – “PROD_001,” “PROD_002,” etc.
  • Determine status of the replicat – RUNNING, ABENDED, or STOPPED
  • Determine the lag – 10 seconds behind? 10 minutes behind? 10 hours behind?
  • Determine the timestamp of the status – Updated 10 minutes ago?

Review Replicat Configuration

  • Replicat name
  • MAP table directives – MAP someschema.* or MAP someschema.thistable
  • TABLEEXCLUDE directives – TABLEEXCLUDE someschema.thisothertable or TABLEEXCLUDE someschema.dontinclude*)

So – going back to the GoldenGate being “command line-y” and “config file-y” – we learned we had to do two things:

  • Review replicat status – use the ggsci command line and review the output from the “info replicat *” command
  • Review replicat configuration – read the configuration files (*.prm) located in your GoldenGate installation’s “dirprm” directory (EX: D:\ggs_trgt\dirprm\ )
  • Break down the replicat MAP and TABLEEXCLUDE directives from the configuration so we can determine which tables we’re importing – then link those tables to a replicat in order to determine table load state

We’re going to use PowerShell, regular expressions, and SQL Server to help build a list of our replicats and tables, then set up a monitoring process to inform us of both availability and lag. The status and configuration will then be query-able via Sql Server.

Caveat Emptor, Greg

There’s a major “YMMV” disclaimer on this. The code and examples below suited our specific needs, but it may not be appropriate for your organization. Also, I’m not particularly good at either Windows PowerShell (this was my first project using it) or regular expressions, so you’re definitely going to find situations where there are issues (particularly with the regex – I know there are bugs). We’ve taken liberties with the code below because we can – not because we should. You may also ask good questions like “why did you use a varchar for something that’s clearly a time?” – Good question. My response: “I’m not convinced I’ve seen all of the possible output from the ggsci command yet, so I’d rather not make the assumption that what I understand as a time will always be represented as a time.” Stuff like that. Plus, this is a version 0.0001 we’re looking at here and has been tested by one person (me) for a whopping 1 hour or so.

Take-away: use at your own risk and don’t assume I actually know what I’m doing. Northwestern is sharing this as a courtesy to the community and is not accountable for the quality of the code or impact to you, your projects, your organization, or your customers.

First, Set up your Target Monitoring Database

We’re going to create a new database and a new schema, both conveniently called “goldengate”

Create Your Monitoring Database

CREATE DATABASE goldengate

* Please make sure you talk to your DBA first. Ours would kill me if I just did the above without talking to him about filegroups, permissions, and the other usual stuff.

Create Your Monitoring Schema

USE goldengate
 
CREATE schema goldengate

* Again – avoiding DBA rage here is a good idea

Monitoring Replicat Status


Create Your Replicat Monitoring Table

We’re going to use use some basic columns here to represent the replicat name, status, checkpoint lag, and date of last update. We’re also going to tag on a separate column for the date of when we last ran the status update process. Note that we’re (currently) using varchars here for things that clearly represent time (checkpoint_lag and last_updated) – that’s mainly because I’m not always 100% sure GoldenGate will emit. It should be a time, but… Also note I’m mixing naming styles for the columns. We call one “last_updated” and one “status_dts” – that’s not our just being lazy. We tend to persist the names of our origin data source “columns,” but then apply our own naming convention to our custom EDW-specific columns. In GoldenGate, the output of the “info replicat *” produces something that looks like “last updated,” so we get “last_updated,” whereas our internal status date column follows our EDW-specific convention of “_dts” for datetime stamps.

--DROP TABLE [goldengate].[replicat_status]
CREATE TABLE [goldengate].[replicat_status](
    [rep_name] [VARCHAR](8) NOT NULL,
    [rep_status] VARCHAR(50) NULL,
    [checkpoint_lag] VARCHAR(20) NULL,
    [last_updated] VARCHAR(20) NULL,
    [status_dts] datetime2(2) NULL
CONSTRAINT [pk_gg_replicat_status_rep_name] PRIMARY KEY CLUSTERED
(
    [rep_name] ASC
)WITH (DATA_COMPRESSION = ROW) ON [PRIMARY]
) ON [PRIMARY]

Extracting Replicat Status from GGSCI

To get the status of the replicats from the Windows shell we need to call into ggsci, passing in the command “info replicat *”. You can do that and pass the output to a file ala

echo info replicat * | e:\ggs_trgt\ggsci

This produces output similar to the following:

REPLICAT   PROD_001  Last Started 2012-05-23 11:45   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:00 ago)
Log Read Checkpoint  File E:\ggs_trgt\dirdat\ed002713
                     2012-06-15 09:08:27.266907  RBA 1234567
 
REPLICAT   PROD_002  Last Started 2012-05-23 11:45   Status ABENDED
Checkpoint Lag       00:00:00 (updated 00:01:23 ago)
Log Read Checkpoint  File E:\ggs_trgt\dirdat\ed002700
                     2012-06-15 09:08:27.266907  RBA 2345678
 
REPLICAT   PROD_003  Last Started 2012-05-23 11:45   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:00 ago)
Log Read Checkpoint  File E:\ggs_trgt\dirdat\ed002713
                     2012-06-15 09:08:27.266907  RBA 4567890

Great – so we know have a way to call into ggsci and get the status of our replicats – but it’s a giant wad of text. Regular Expressions to the rescue.

Digging into the output we’re looking for the following patterns

GoldenGate Replicat Status Key Values for RegEx Extraction

We’re going to try the following regular expression (YMMV)

REPLICAT\s*(?[a-zA-Z_0-9]+).+?Status\s(?[a-zA-Z]+).+?Checkpoint Lag(\s|\t)+(?((\d)+:\d{2}:\d{2})+).+?updated\s+(?((\d)+:\d{2}:\d{2})+)

This will let us pull out key elements from the replicat status

Replicat Name Status Checkpoint Lag Last Updated
PROD_001 RUNNING 00:00:00 00:00:00
PROD_002 ABENDED 00:00:00 00:01:23
PROD_003 RUNNING 00:00:00 00:00:00

Awesome! Now that we can process the replicat status text and produce discrete data it’s time to stick that status information into SQL Server. For that we’re going to abuse PowerShell and ADO.NET (Note: I didn’t use LINQ here because I have approximately 4 hours of experience with PowerShell and I found a great example of how to use ADO.NET – feel free to direct me to a LINQ example).

I’m going to skip over the details, but the gist of what the script does is…

  1. Retrieves the text from our “info replicat *” message to GGSCI
  2. Parses that text using regex in order to produce discrete status data
  3. Connects to our database
  4. Retrieves our existing status data and returns it as a datatable
  5. Creates a new datatable with the same column definition as our “real” table in the database
  6. Populates our new datatable with the discrete data from our GGSCI command
  7. Abuses the ADO.NET datatable’s “merge” command to merge our new data with our existing data from the database

This then posts to our database and behold – our replicat status is now stored in a relational form.

rep_name rep_status checkpoint_lag last_updated status_dts
PROD_001 RUNNING 00:00:00 00:00:00 2012-06-17 12:52:00
PROD_002 ABENDED 00:00:00 00:01:23 2012-06-17 12:52:00
PROD_003 RUNNING 00:00:00 00:00:00 2012-06-17 12:52:00

So there’s our replicat status – extracted from ggsci.

$replicat_list = New-Object System.Collections.Generic.List[System.Object]
 
#=========================================================================================# 
VARIABLE DEFINITIONS
#=========================================================================================
# Command to extract the replicat status from ggsci
#  - you want the output of the command "info replicat *" when run from within the ggsci console
#  - in order to do this from the command line you have to echo the command (info replicat *), piping it into ggsci
#  - we then take this output and send it into the variable "ggsci_replicat_status"
$ggsci_replicat_status = CMD /C "echo info replicat * | e:\ggs_trgt\ggsci"
#$ggsci_replicat_status = Get-Content -Path C:\Temp\goldengate\replicat_status.txt
 
# Connection string for our target database - where we want to store the replicat status information
$gg_db_conn_string = "server=PRODDB;database=goldengate;User Id=gg_user;Password=GGPASSWORDHERE;Persist Security Info=True;"
#$gg_db_conn_string = "server=localhost;database=goldengate;Integrated Security = sspi;" 
 
# Insert, Update, "Delete" commands for our
$gg_sql_select = "select * from goldengate.goldengate.replicat_status"
$gg_sql_update = "UPDATE goldengate.goldengate.replicat_status SET rep_status=@rep_status, checkpoint_lag = @checkpoint_lag, last_updated = @last_updated, status_dts = @status_dts WHERE rep_name=@rep_name;"
$gg_sql_insert = "INSERT INTO goldengate.goldengate.replicat_status (rep_name, rep_status, checkpoint_lag, last_updated, status_dts) values (@rep_name, @rep_status, @checkpoint_lag, @last_updated, @status_dts)" 
 
#=========================================================================================# 
EXTRACT REPLICAT STATUS FROM RESPONSE TEXT (from info replicat *)
#=========================================================================================
#http://msgoodies.blogspot.com/2008/12/matching-multi-line-text-and-converting.html
$regex=[regex] "REPLICAT\s*(?[a-zA-Z_0-9]+).+?Status\s(?[a-zA-Z]+).+?Checkpoint Lag(\s|\t)+(?((\d)+:\d{2}:\d{2})+).+?updated\s+(?((\d)+:\d{2}:\d{2})+)" 
 
$regex.matches($ggsci_replicat_status) | Foreach-Object {
    # Save current pipeline object, so it is available from inside the next foreach-object
    $match=$_
    # Construct a new, empty object. Always return objects as output whenever possible. It
    # makes using the output must easier
    $obj=new-object object
    # Get all the group names defined in the pattern - ignore the numeric, auto ones
    $regex.GetGroupNames() | Where-Object {$_ -notmatch '^\d+$'} | Foreach-Object {
        # And add each match as a property. When multiple results are returned, the
        # value must be picked up using an index number hence the GroupNumberFromName call
        add-member -inputobject $obj NoteProperty $_ $match.groups[$regex.GroupNumberFromName($_)].value
    }
    # emit the object to the pipeline
 
    $rep_name = $obj.replicat
    $rep_status = $obj.status
    $checkpoint_lag = $obj.checkpoint_lag
    $last_updated = $obj.last_updated
 
    $new_replicat = New-Object PSObject -Property @{rep_name = $rep_name; rep_status = $rep_status; checkpoint_lag = $checkpoint_lag; last_updated = $last_updated}
    $replicat_list.Add( $new_replicat )
 
}
 
#=========================================================================================# 
GET REPLICAT STATUS INFO FROM DATABASE - for updates, etc.
#=========================================================================================
#Great example taken from http://www.informit.com/guides/content.aspx?g=sqlserver&seqNum=272
  $sqlConnection = new-object System.Data.SqlClient.SqlConnection $gg_db_conn_string
  $sqlConnection.Open()
  $sqlCommand = New-object system.data.sqlclient.SqlCommand
  $sqlCommand.CommandTimeout = 30
  $sqlCommand.Connection = $sqlConnection
  $sqlCommand.CommandText = $gg_sql_select  #get-content c:\temp\SQLText.txt
  $sqlDataAdapter = new-object System.Data.SqlClient.SQLDataAdapter($sqlCommand) 
 
  #update
  $sqlDataAdapter.UpdateCommand = New-object system.data.sqlclient.SqlCommand($gg_sql_update,  $sqlConnection)
  $sqlDataAdapter.UpdateCommand.Parameters.Add("@rep_name", [System.Data.SqlDbType]::VarChar, 8, "rep_name")
  $sqlDataAdapter.UpdateCommand.Parameters.Add("@rep_status", [System.Data.SqlDbType]::VarChar, 50, "rep_status")
  $sqlDataAdapter.UpdateCommand.Parameters.Add("@checkpoint_lag", [System.Data.SqlDbType]::VarChar, 20, "checkpoint_lag")
  $sqlDataAdapter.UpdateCommand.Parameters.Add("@last_updated", [System.Data.SqlDbType]::VarChar, 20, "last_updated")
  $sqlDataAdapter.UpdateCommand.Parameters.Add("@status_dts", [System.Data.SqlDbType]::DateTime, 2, "status_dts")
 
  #insert
  $sqlDataAdapter.InsertCommand = New-object system.data.sqlclient.SqlCommand($gg_sql_insert,  $sqlConnection)
  $sqlDataAdapter.InsertCommand.Parameters.Add("@rep_name", [System.Data.SqlDbType]::VarChar, 8, "rep_name")
  $sqlDataAdapter.InsertCommand.Parameters.Add("@rep_status", [System.Data.SqlDbType]::VarChar, 50, "rep_status")
  $sqlDataAdapter.InsertCommand.Parameters.Add("@checkpoint_lag", [System.Data.SqlDbType]::VarChar, 20, "checkpoint_lag")
  $sqlDataAdapter.InsertCommand.Parameters.Add("@last_updated", [System.Data.SqlDbType]::VarChar, 20, "last_updated")
  $sqlDataAdapter.InsertCommand.Parameters.Add("@status_dts", [System.Data.SqlDbType]::DateTime, 2, "status_dts")
 
  $sqlDataSet = new-object System.Data.dataset
  $sqlDataAdapter.fill($sqlDataSet)
  $sqlDataSet.tables[0].select()
 
#=========================================================================================# 
DEFINE REPLICAT STATUS TABLE - for our new replicat status information
#=========================================================================================
# define out replicat status output columns
$replicatStatusTable = New-Object System.Data.DataTable
$pk = new-object System.Data.DataColumn("rep_name", [string])
$replicatStatusTable.Columns.Add($pk)
$replicatStatusTable.Columns.Add("rep_status", [string])
$replicatStatusTable.Columns.Add("checkpoint_lag", [string])
$replicatStatusTable.Columns.Add("last_updated", [string])
$replicatStatusTable.Columns.Add("status_dts", [DateTime])
# we need to identify a primary key for our table so our merge works
$replicatStatusTable.PrimaryKey = $pk
 
# Build our replicat status table based on the replicat status data we extracted
#  from our regex/text output from ggsci
Foreach ($replicat in $replicat_list) {
    if($replicat.rep_name){
        #Write-Host  $replicat.rep_name
        #Write-Host  $replicat.rep_status
 
        $AddRow = $replicatStatusTable.newrow()
        $AddRow.rep_name = $replicat.rep_name
        $AddRow.rep_status = $replicat.rep_status
        $AddRow.checkpoint_lag = $replicat.checkpoint_lag
        $AddRow.last_updated = $replicat.last_updated
 
        $AddRow.status_dts = [System.DateTime]::Now
        $replicatStatusTable.Rows.Add($AddRow)
    }
}
 
#$replicatStatusTable
 
#=========================================================================================# 
MERGE TABLES AND UPDATE DATABASE
#=========================================================================================
# merge our db table data with our new replicat status table
# PK is critical here, so make sure it's set above - that's why we're using replicat name as the PK
$sqlDataSet.tables[0].Merge($replicatStatusTable)
#$sqlDataSet.tables[0]
 
#database cleanup
$sqlDataAdapter.Update($sqlDataSet.tables[0])
$sqlConnection.Close()

Scheduling the GoldenGate Monitoring PowerShell Task

Now that we have our GoldenGate monitoring PowerShell script set up we should schedule it so it polls GoldenGate and updates the database at a regular interval.

Create a new scheduled task:

  • Name: GoldenGate Replicat Status
  • Description: GoldenGate monitoring – messages ggsci to retrieve replicat status and then updates EDW GoldenGate replicat status table
  • Settings:Set “Run whether or not the user is logged in” and “Do not save password”
    • Command: powershell
    • Add Arguments: -command &E:\gg_monitoring\gg_status.ps1

Set this to run on a schedule you feel is appropriate. We have it running every 10 minutes – mainly because we’re monitoring about 5,000 tables of mission-critical data and we really need to know if something abends or lags.

Extracting Replicat Configuration


GoldenGate replicat configuration information is stored in parameter files (.prm) and these are typically stored in a single folder. In our case for this article that will be

E:\ggs_trgt\dirprm\

There should be one configuration file per replicat.

E:\ggs_trgt\dirprm\prod_001.rpm
E:\ggs_trgt\dirprm\prod_002.rpm
E:\ggs_trgt\dirprm\prod_003.rpm

These configuration files store a few key bits of information we want to extract, store, and interpret:

  • Replicat Name
  • MAP statements
  • TABLEEXCLUDE statements

Here’s a quick example (note this is just a subset of the configuration)

REPLICAT prod_002
...
MAP HR.*, TARGET HR.*;
MAP INVENTORY.*, TARGET INVENTORY.*;
MAP SALES.*, TARGET SALES.*;
MAP INSURANCE.*, TARGET INSURANCE.*;
...
TABLEEXCLUDE HR.PAYROLL;
TABLEEXCLUDE HR.DEMO*;
TABLEEXCLUDE *.MEDICAL;

Once we get ahold of the MAP and TABLEEXCLUDE statements we’re going to dig into the details of which schemas / tables are either mapped or excluded. MAP and TABLEEXCLUDE support the notion of matching by patterns, so if we read the above we can see we’re including a range of schemas, then excluding specific tables that match patterns. We’ll use this later to match our destination tables / schemas by name pattern. Note that for our purposes we’re mapping origin schema to destination schema so the code incorporates some assumptions around those lines. Just take that into account if you use any of this.

Creating a Replicat Config Table

We’re going to create another table to house our configuration information. We’re going to have a few basic columns that will persist the basic mapping directives, as well as a few for our table / schema naming patterns so we can link back to our tables.

--DROP TABLE [goldengate].[replicat_config]
CREATE TABLE [goldengate].[replicat_config](
    config_id INT IDENTITY(1,1) NOT NULL,
    [rep_name] [VARCHAR](8) NOT NULL,
    [table_operator] VARCHAR(25) NULL,
    [table_mask] VARCHAR(100) NULL,
    [table_schema_mask] VARCHAR(100) NULL,
    [table_name_mask] VARCHAR(100) NULL,
    [updt_dts] datetime2(2) NULL
CONSTRAINT [pk_gg_replicat_config_config_id] PRIMARY KEY CLUSTERED 
(
    [config_id] ASC
)WITH (DATA_COMPRESSION = ROW) ON [PRIMARY]
) ON [PRIMARY]

Creating a Replicat Config Parser

We’re going to do something similar to the other process, but the approach is a bit different. We’re going to once again abuse regular expressions in order to pull apart the configuration information, but we’re going to flush and reload the full set of configuration data each time – as opposed to the “insert if new, update if existing” process we used for the replicat status processing.

  1. Create an empty list object
  2. Get a list of *.prm files from our GoldenGate parameter file directory
  3. Open each file and …
    • Parse replicat name
    • Parse MAP directives – try to pull out schema and table patterns as well
    • Parse TABLEEXCLUDE directives – try to pull out schema and table patterns as well
    • Add a new object to our list – containing the discrete elements we just retrieved
  4. Truncate the destination table (we’re flushing all contents each time we reload)
  5. Create an empty datatable that matches our destination [goldengate].[replicat_config] definition
  6. Insert the new config rows into [goldengate].[replicat_config]
$replicat_list = New-Object System.Collections.Generic.List[System.Object]
 
#==================================================================================================================
# VARIABLE DEFINITIONS
#==================================================================================================================
 
# Where are the goldengate parameter files stored? We're looking for a fully-qualified Windows path here
# EX: C:\temp\
$gg_param_path = "C:\Temp\goldengate\dirprm\" 
 
# Connection string for our target database - where we want to store the replicat status information
$gg_db_conn_string = "server=PRODDB;database=goldengate;User Id=gg_user;Password=GGPASSWORDHERE;Persist Security Info=True;" 
#$gg_db_conn_string = "server=localhost;database=goldengate;Integrated Security = sspi;" 
 
# Insert, Update, "Delete" commands for our 
$gg_sql_select = "select * from goldengate.goldengate.replicat_config" 
$gg_sql_insert = "INSERT INTO goldengate.goldengate.replicat_config (rep_name, table_operator, table_mask, table_schema_mask, table_name_mask, updt_dts) values (@rep_name, @table_operator, @table_mask, @table_schema_mask, @table_name_mask, @updt_dts)" 
$gg_sql_truncate = "truncate table goldengate.[goldengate].[replicat_config]" 
 
#==================================================================================================================
# GET LIST OF PARAMETER FILES FROM PATH
#==================================================================================================================
$file_list = Get-ChildItem -Path $gg_param_path -filter *.prm*
 
#==================================================================================================================
# PROCESS PARAMETER FILES
#==================================================================================================================
Foreach ($file in $file_list) {
    $file_contents = Get-Content -Path $file.fullname
    GetReplicatConfig($file_contents)
    #$file.fullname
}
 
#==================================================================================================================
# EXTRACT REPLICAT CONFIG INFORMATION FROM FILE(S)
#  We're looking for...
#   - the name of the replicat (EX: prod_ed1)
#   - the MAP / TABLEEXCLUDE operator
#   - the map mask / rule / directive (EX: MYSCHEMA.*)
#==================================================================================================================
Function GetReplicatConfig($file_contents){
 
    #http://msgoodies.blogspot.com/2008/12/matching-multi-line-text-and-converting.html
    $regex_rep_name=[regex] "^REPLICAT\s*(?[a-zA-Z_0-9]+).+?" 
    $regex_rep_config=[regex] "[^(--)](?(TABLEEXCLUDE|MAP){1})\s+(?([a-zA-Z0-9]|\.|\*|_)+).+?" 
    #the second regex is flawed - it will incorrectly include items after comments in some cases
 
    $regex_rep_table_schema_name_mask=[regex]"(?([^\.].*))\.(?([^\.].*))" 
 
    $rep_name = $regex_rep_name.match($file_contents).Groups["replicat"].value
 
    $regex_rep_config.matches($file_contents) | Foreach-Object {
        # Save current pipeline object, so it is available from inside the next foreach-object
        $match=$_
        # Construct a new, empty object. Always return objects as output whenever possible. It
        # makes using the output must easier
        $obj=new-object object
        # Get all the group names defined in the pattern - ignore the numeric, auto ones
        $regex_rep_config.GetGroupNames() | Where-Object {$_ -notmatch '^\d+$'} | Foreach-Object {
            # And add each match as a property. When multiple results are returned, the
            # value must be picked up using an index number hence the GroupNumberFromName call
            add-member -inputobject $obj NoteProperty $_ $match.groups[$regex_rep_config.GroupNumberFromName($_)].value
        }
 
        # emit the object to the pipeline
        #$obj
 
        $table_operator = $obj.table_operator
        $table_mask = $obj.table_mask
 
        $table_schema_mask = $regex_rep_table_schema_name_mask.match($table_mask).Groups["table_schema_mask"].value
        $table_name_mask = $regex_rep_table_schema_name_mask.match($table_mask).Groups["table_name_mask"].value
 
        $new_replicat = New-Object PSObject -Property @{rep_name = $rep_name; table_operator = $table_operator; table_mask = $table_mask; table_schema_mask = $table_schema_mask; table_name_mask = $table_name_mask;}
        $replicat_list.Add( $new_replicat )
 
    }
 
    #sort and unique-ify our list of replicats (since my regexes aren't very good we have opportunity for dupes)
    #$replicat_list | sort rep_name, table_operator, table_mask, table_schema_mask, table_name_mask -Unique
}
 
#==================================================================================================================
# GET REPLICAT CONFIG INFO FROM DATABASE - for updates, etc. 
#==================================================================================================================
 
#Great example taken from http://www.informit.com/guides/content.aspx?g=sqlserver&seqNum=272
  $sqlConnection = new-object System.Data.SqlClient.SqlConnection $gg_db_conn_string
  $sqlConnection.Open()
  $sqlCommand = New-object system.data.sqlclient.SqlCommand   
  $sqlCommand.CommandTimeout = 30 
  $sqlCommand.Connection = $sqlConnection
  $sqlCommand.CommandText = $gg_sql_select  #get-content c:\temp\SQLText.txt
  $sqlDataAdapter = new-object System.Data.SqlClient.SQLDataAdapter($sqlCommand) 
 
#truncate our destination object... yeah, I know.
$sqlTruncateCommand = $sqlConnection.CreateCommand()
$sqlTruncateCommand.CommandText = $gg_sql_truncate
$sqlReader = $sqlTruncateCommand.ExecuteNonQuery()
 
  #insert
  $sqlDataAdapter.InsertCommand = New-object system.data.sqlclient.SqlCommand($gg_sql_insert,  $sqlConnection)
  $sqlDataAdapter.InsertCommand.Parameters.Add("@rep_name", [System.Data.SqlDbType]::VarChar, 8, "rep_name")
  $sqlDataAdapter.InsertCommand.Parameters.Add("@table_operator", [System.Data.SqlDbType]::VarChar, 25, "table_operator")
  $sqlDataAdapter.InsertCommand.Parameters.Add("@table_mask", [System.Data.SqlDbType]::VarChar, 100, "table_mask")
  $sqlDataAdapter.InsertCommand.Parameters.Add("@table_schema_mask", [System.Data.SqlDbType]::VarChar, 100, "table_schema_mask")
  $sqlDataAdapter.InsertCommand.Parameters.Add("@table_name_mask", [System.Data.SqlDbType]::VarChar, 100, "table_name_mask")
  $sqlDataAdapter.InsertCommand.Parameters.Add("@updt_dts", [System.Data.SqlDbType]::DateTime, 2, "updt_dts")
 
    foreach ($Parameter in $sqlDataAdapter.InsertCommand.Parameters)
    {
        #Parameter.Value = DBNull.Value;
        $Parameter.IsNullable = $true;
    }
 
  $sqlDataSet = new-object System.Data.dataset 
  $sqlDataAdapter.fill($sqlDataSet)
  $sqlDataSet.tables[0].select()
  $sqlDataSet.tables[0].clear()
  $sqlDataAdapter.Update($sqlDataSet.tables[0])
 
#==================================================================================================================
# DEFINE REPLICAT CONFIG TABLE - for our new replicat status information 
#==================================================================================================================
 
# define out replicat status output columns
#$replicatStatusTable = New-Object System.Data.DataTable
#$pk = new-object System.Data.DataColumn("rep_name", [string])
#$replicatStatusTable.Columns.Add("table_operator", [string])
#$replicatStatusTable.Columns.Add("table_mask", [string])
#$replicatStatusTable.Columns.Add("updt_dts", [DateTime])
 
# Build our replicat status table based on the replicat status data we extracted
#  from our regex/text output from ggsci
 
Foreach ($replicat in $replicat_list | sort rep_name, table_operator, table_mask, table_schema_mask, table_name_mask -Unique) {
    if($replicat.rep_name){
        #Write-Host  $replicat.rep_name
        #Write-Host  $replicat.rep_status
        #Write-Host  $replicat.table_schema_mask + " - " + $replicat.table_name_mask
 
        $AddRow = $sqlDataSet.tables[0].newrow()
        $AddRow.rep_name = $replicat.rep_name
        $AddRow.table_operator = $replicat.table_operator
        $AddRow.table_mask = $replicat.table_mask
        $AddRow.table_schema_mask = $replicat.table_schema_mask
        $AddRow.table_name_mask = $replicat.table_name_mask
 
        $AddRow.updt_dts = [System.DateTime]::Now
 
        #$replicatStatusTable.Rows.Add($AddRow)
        $sqlDataSet.tables[0].Rows.Add($AddRow)
 
    }
}
#$sqlDataSet.tables[0]
 
#$replicatStatusTable
 
#==================================================================================================================
# UPDATE DATABASE 
#==================================================================================================================
 
#database cleanup
$sqlDataAdapter.Update($sqlDataSet.tables[0])
$sqlConnection.Close()

In the case of our example we’ll wind up with the following in [goldengate].[replicat_config]

rep_name table_operator table_mask table_schema_mask table_name_mask updt_dts
prod_001 MAP HR.* HR * 2012-06-17 12:52:00
prod_001 MAP INVENTORY.* INVENTORY * 2012-06-17 12:52:00
prod_001 MAP SALES.* SALES * 2012-06-17 12:52:00
prod_001 MAP INSURANCE.* INSURANCE * 2012-06-17 12:52:00
prod_001 TABLEEXCLUDE HR.PAYROLL HR PAYROLL 2012-06-17 12:52:00
prod_001 TABLEEXCLUDE HR.DEMO* HR DEMO* 2012-06-17 12:52:00
prod_001 TABLEEXCLUDE *.MEDICAL * MEDICAL* 2012-06-17 12:52:00

Monitoring Status by Table


Alright! Now we have some information we can use. Both the replicat status and the replicat configuration tables are handy just by themselves, but we can now combine them to determine the replication state and lag for a given table.

Determining Which Tables Are in Which Replicat (SQL)

Now that we have the information in [goldengate].[replicat_config] we can try to join the schema and table masks to INFORMATION_SCHEMA in our destination database. We need to take some extra care because we’re mapping both include and exclude statements. That means we need to intersect both sets of objects and exclude that intersection.

GoldenGate Intersect MAP and TABLEEXCLUDE

To do that we’re going to use a SQL “except” statement.

  • Get our list of included (MAPped) objects
  • “EXCEPT”
  • those TABLEEXCLUDE objects

Now, let’s pull back only the tables we’re replicating. For the match we’re going to do something we generally try to avoid – inline “REPLACE” statements – never a good idea when it comes to performance. The rational here is we’re doing this because we want to leave the original MAP and TABLEEXCLUDE directives in their native form and sine we’re doing a “LIKE” search anyway… Plus, the configuration tables should be small, so we can probably get away with it.

SELECT
    conf.rep_name,
    t.TABLE_SCHEMA, t.TABLE_NAME
FROM
    goldengate.goldengate.replicat_config conf
    INNER JOIN goldengate.INFORMATION_SCHEMA.TABLES t
        ON t.TABLE_SCHEMA = conf.table_schema_mask --this only works beacuse we're using the same schema for both source/dest
        AND t.TABLE_NAME LIKE (REPLACE(conf.table_name_mask, '*', '%'))
AND conf.table_operator = 'MAP'
 
EXCEPT
 
SELECT
    conf.rep_name,
    t.TABLE_SCHEMA, t.TABLE_NAME
FROM
    goldengate.goldengate.replicat_config conf
    INNER JOIN goldengate.INFORMATION_SCHEMA.TABLES t
        ON t.TABLE_SCHEMA = conf.table_schema_mask --this only works beacuse we're using the same schema for both source/dest
        AND t.TABLE_NAME LIKE (REPLACE(conf.table_name_mask, '*', '%'))
        AND conf.table_operator = 'TABLEEXCLUDE'
 
ORDER BY conf.rep_name, t.TABLE_SCHEMA, t.TABLE_NAME

We get something like…

rep_name schema_name table_name
prod_001 hr people
prod_001 hr reporting_rel
prod_001 inventory parts
prod_001 inventory part_location
prod_001 inventory locations
prod_001 sales region
prod_001 sales volume
prod_001 insurance plans
prod_001 insurance plan_members

See that? We’re not pulling back “HR.PAYROLL” or “HR.MEDICAL” since those are excluded per our TABLEEXCLUDE directives.

Create a Replicat Table Status View

Now let’s take that general query and just make a convenience view so we can query that – less mess and hassle for everyone. Plus, no repetition of logic.

USE [goldengate]
GO
CREATE VIEW [goldengate].[replicat_table_status] AS
(
    SELECT
        rep_info.rep_name, rep_info.TABLE_SCHEMA, rep_info.TABLE_NAME
        , stat.rep_status, stat.checkpoint_lag, stat.last_updated, stat.status_dts
    FROM (
        --include our MAPped tables
        SELECT
            conf.rep_name,
            t.TABLE_SCHEMA, t.TABLE_NAME
        FROM
            goldengate.goldengate.replicat_config conf WITH(nolock)
            INNER JOIN goldengate.INFORMATION_SCHEMA.TABLES t WITH(nolock)
                ON t.TABLE_SCHEMA = conf.table_schema_mask --this only works beacuse we're using the same schema for both source/dest
                AND t.TABLE_NAME LIKE (REPLACE(conf.table_name_mask, '*', '%'))
                AND conf.table_operator = 'MAP'
 
        EXCEPT
 
        --exclude our TABLEEXCLUDEd tables
        SELECT
            conf.rep_name,
            t.TABLE_SCHEMA, t.TABLE_NAME
        FROM
            goldengate.goldengate.replicat_config conf WITH(nolock)
            INNER JOIN goldengate.INFORMATION_SCHEMA.TABLES t WITH(nolock)
            ON t.TABLE_SCHEMA = conf.table_schema_mask --this only works beacuse we're using the same schema for both source/dest
            AND t.TABLE_NAME LIKE (REPLACE(conf.table_name_mask, '*', '%'))
            AND conf.table_operator = 'TABLEEXCLUDE'
        ) rep_info
    INNER JOIN goldengate.[goldengate].[replicat_status] stat
        ON rep_info.rep_name = stat.rep_name
);
 
GO

Now we can see each table’s replication state (Note that I’ve removed some columns and rows for brevity)

SELECT 
     rts.[rep_name]
     , rts.[schema_name]
     , rts.[TABLE_NAME]
     , rts.[rep_status]
     , rts.[checkpoint_lag]
FROM 
     [goldengate].[replicat_table_status] rts
WHERE 
     rts.[rep_name] = 'prod_001'
rep_name schema_name table_name rep_status checkpoint_lag
prod_001 hr people RUNNING 00:00:00
prod_001 hr reporting_rel RUNNING 00:00:00
prod_001 inventory parts RUNNING 00:00:00

Conclusion


I hope this is of some use to you as you tinker with GoldenGate. While GoldenGate on Windows does provide you a way of alarming on REPLICAT abends via the Windows event monitoring stack, it’s hard to determine the impact of such an event. Also, for anyone who needs to communicate table loading state / lag to an end-user community it can be a bit of a challenge. This was our solution – what’s yours? Do you have a better, faster, cheaper, easier approach? If so, please do share – we’re all ears.

Credits & References:


I want to give credit to a few articles and people who were instrumental in getting this to work. I’ve credited them in the code where applicable, but I want to make sure they’re explicitly called out.


Viewing all articles
Browse latest Browse all 10

Trending Articles