Sladescross's Blog

Blogging about Sharepoint related stuff

Trouble Shooting Indexing Performance Using SPChange March 5, 2010

Filed under: EventCache,Index,Log Parser,Performance,Powershell,Search,SPChange — sladescross @ 5:29 pm

Ever wondered what is happening in your Sharepoint content crawls? The details are contained within your content database (and the search database).

The EventType enumeration to understand the values in the EventType column in the EventCache table in the Content Database.

EventType Enumeration Description EventType Database Value
AssignmentAdd 524288
AssignmentDelete 33554432
Update 8192
Add 4097
Delete 16388
MemberAdd 2097152
MemberDelete 4194304

Extract SPChanges from the Content Database using Powershell.

function Send_ChangeTokens_All__For_ContentDatabase([string] $ServerName, [string] $SiteCollectionName, [string] $ChangeTSVFile)
{

$spsite = new-object Microsoft.Sharepoint.SPSite($ServerName + $SiteCollectionName);

$spcontentdatabaseidmethod =  [Microsoft.SharePoint.Administration.SPContentDatabase].getmethod(“get_ID”);
$spcontentdatabaseguid = $spcontentdatabaseidmethod.Invoke($spsite.ContentDatabase, “instance,public”, $null, $null, $null);

“Content Database is ” + $spcontentdatabaseguid.ToString() | select;

$spcontentdatabasegetchangesmethod =  [Microsoft.SharePoint.Administration.SPContentDatabase].getmethod(“GetChanges”,[Microsoft.SharePoint.SPChangeToken]);
$targetparameters = @($null);
$total = 0;

do
{

$spcontentdatabasechanges = $spcontentdatabasegetchangesmethod.Invoke($spsite.ContentDatabase, “instance,public”, $null, $targetparameters, $null);
$total += $spcontentdatabasechanges.Count;

$spcontentdatabasechanges | foreach-object {

$_.ChangeToken.ToString() + “`t” + $_.ChangeType.ToString() + “`t” + $_.SiteId.ToString() + “`t” + $_.Time.ToString()  | add-content -encoding ascii  $ChangeTSVFile;

}

$targetparameters[0] = $spcontentdatabasechanges.LastChangeToken;

if($targetparameters[0] -ne $null) {
“Change Token ” + $targetparameters[0].ToString() | select; }

} while ($spcontentdatabasechanges.Count -gt 0)

“Total ” + $total | select;

# Dispose of Site
$spsite.Dispose();

}

Break out the SPChange data output from Powershell by using LogParser.

LogParser -iTsFormat:”dd/MM/yyyy hh:mm:ss” -o:TSV -i:TSV -HeaderRow:Off -iSeparator:tab  “Select EXTRACT_TOKEN(field1,2,’;’) AS ContentDatabaseID,EXTRACT_TOKEN(field1,4,’;’) As EventCacheID,field2 As EventType,field3 As SiteID,field4 As EventTime INTO ‘c:\powershell scripts\logparserchange.txt’   From ‘c:\powershell scripts\change.txt’  WHERE EventTime > TIMESTAMP(’2010-01-28 00:00:00′,’yyyy-mm-dd hh:mm:ss’) ORDER BY EventCacheId”

And do a time analysis of events using QUANTIZE.

LogParser -o:TSV -i:TSV -HeaderRow:On -iSeparator:tab  “Select QUANTIZE(EventTime,1800) As EventTimePeriod,COUNT(EventCacheId) As EventCount INTO ‘c:\powershell scripts\logparserquantizedchange.txt’ From ‘c:\powershell scripts\logparserchange.txt’ GROUP BY EventTimePeriod”

And check the types of events during the time analysis.

LogParser -o:TSV -i:TSV -HeaderRow:On -iSeparator:tab  “Select QUANTIZE(EventTime,1800) As EventTimePeriod,EventType,COUNT(EventCacheId) As EventCount INTO ‘c:\powershell scripts\logparserquantizedchange.txt’ From ‘c:\powershell scripts\logparserchange.txt’ GROUP BY EventTimePeriod,EventType”

And investigate the Uls Log files using queries like this.

LogParser -o:TSV -i:TSV “SELECT  * INTO ‘C:\LOGS\ulslog.txt’ FROM ‘C:\LOGS\Copy.log’ WHERE Process Like ‘%OWSTIMER.EXE%’”

Where the fields in the Uls Log file are as follows.

Timestamp Process TID Area Category EventID Level Message Correlation

In general fields can be identified as follows.

logparser -h -i:csv C:\perflogs\test.csv -iTsFormat:”mm/dd/yyyy hh:mm:ss”

and this is particularly useful for identifying timestamp formats.

At which point you should be able to determine whether you have security crawl issues as detailed in this blog http://blogs.msdn.com/russmax/archive/2009/02/09/troubleshooting-security-only-crawl.aspx.

If a security crawl is not the explanation for long running indexing then the next step focuses on the perfmon counters and in particular gathering evidence of retries or idle threads.

PerfMon comes with the following counters that are useful to drilldown on indexing performance.

Office Search Server Archival Plugin
Office Search Server Gatherer
Office Search Gatherer Projects
Office Server Search Indexer Catalogs
Office Search Schema Plugin

And as an explanation of poor indexer performance I am interested in file retries, idle threads and the overall document added rate as a first cut.

Performance Object Performance Counter
Office Search Server Gatherer
Documents Filtered Rate
Document Successfully Filtered Rate
Delayed Documents
Documents Delayed Retry
Filtering Threads
Idle Threads
Threads blocked due to back off event
Office Search Gatherer Projects
Accessed File Rate
Accessed HTTP Rate
Delayed Documents
Document Add Rate
Retries Rate

A PerfMon counter log file saved to disk and then converted from the BLG format to CSV format to use with LogParser or to import to SQL.

relog -f csv inputfile.blg -o outputFile.csv

logparser -i:csv “SELECT [(PDH-CSV 4.0) (GMT Standard Time)(0)],[\\WINSMOORIS\Office Server Search Gatherer\Threads In Plug-ins] FROM C:\perflogs\outputFile.csv TO C:\perflogs\LogParseroutputFile.csv” -iTsFormat:”mm/dd/yyyy hh:mm:ss.lll”

Web site crawls can be viewed in the table dbo.MSSCrawlUrl.

About these ads
 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 63 other followers