Ever wondered what is happening in your Sharepoint content crawls? The details are contained within your content database (and the search database).
The EventType enumeration to understand the values in the EventType column in the EventCache table in the Content Database.
| EventType Enumeration Description | EventType Database Value |
| AssignmentAdd | 524288 |
| AssignmentDelete | 33554432 |
| Update | 8192 |
| Add | 4097 |
| Delete | 16388 |
| MemberAdd | 2097152 |
| MemberDelete | 4194304 |
Extract SPChanges from the Content Database using Powershell.
function Send_ChangeTokens_All__For_ContentDatabase([string] $ServerName, [string] $SiteCollectionName, [string] $ChangeTSVFile)
{
$spsite = new-object Microsoft.Sharepoint.SPSite($ServerName + $SiteCollectionName);
$spcontentdatabaseidmethod = [Microsoft.SharePoint.Administration.SPContentDatabase].getmethod(“get_ID”);
$spcontentdatabaseguid = $spcontentdatabaseidmethod.Invoke($spsite.ContentDatabase, “instance,public”, $null, $null, $null);
“Content Database is ” + $spcontentdatabaseguid.ToString() | select;
$spcontentdatabasegetchangesmethod = [Microsoft.SharePoint.Administration.SPContentDatabase].getmethod(“GetChanges”,[Microsoft.SharePoint.SPChangeToken]);
$targetparameters = @($null);
$total = 0;
do
{
$spcontentdatabasechanges = $spcontentdatabasegetchangesmethod.Invoke($spsite.ContentDatabase, “instance,public”, $null, $targetparameters, $null);
$total += $spcontentdatabasechanges.Count;
$spcontentdatabasechanges | foreach-object {
$_.ChangeToken.ToString() + “`t” + $_.ChangeType.ToString() + “`t” + $_.SiteId.ToString() + “`t” + $_.Time.ToString() | add-content -encoding ascii $ChangeTSVFile;
}
$targetparameters[0] = $spcontentdatabasechanges.LastChangeToken;
if($targetparameters[0] -ne $null) {
“Change Token ” + $targetparameters[0].ToString() | select; }
} while ($spcontentdatabasechanges.Count -gt 0)
“Total ” + $total | select;
# Dispose of Site
$spsite.Dispose();
}
Break out the SPChange data output from Powershell by using LogParser.
LogParser -iTsFormat:”dd/MM/yyyy hh:mm:ss” -o:TSV -i:TSV -HeaderRow:Off -iSeparator:tab “Select EXTRACT_TOKEN(field1,2,’;') AS ContentDatabaseID,EXTRACT_TOKEN(field1,4,’;') As EventCacheID,field2 As EventType,field3 As SiteID,field4 As EventTime INTO ‘c:\powershell scripts\logparserchange.txt’ From ‘c:\powershell scripts\change.txt’ WHERE EventTime > TIMESTAMP(’2010-01-28 00:00:00′,’yyyy-mm-dd hh:mm:ss’) ORDER BY EventCacheId”
And do a time analysis of events using QUANTIZE.
LogParser -o:TSV -i:TSV -HeaderRow:On -iSeparator:tab “Select QUANTIZE(EventTime,1800) As EventTimePeriod,COUNT(EventCacheId) As EventCount INTO ‘c:\powershell scripts\logparserquantizedchange.txt’ From ‘c:\powershell scripts\logparserchange.txt’ GROUP BY EventTimePeriod”
And check the types of events during the time analysis.
LogParser -o:TSV -i:TSV -HeaderRow:On -iSeparator:tab “Select QUANTIZE(EventTime,1800) As EventTimePeriod,EventType,COUNT(EventCacheId) As EventCount INTO ‘c:\powershell scripts\logparserquantizedchange.txt’ From ‘c:\powershell scripts\logparserchange.txt’ GROUP BY EventTimePeriod,EventType”
And investigate the Uls Log files using queries like this.
LogParser -o:TSV -i:TSV “SELECT * INTO ‘C:\LOGS\ulslog.txt’ FROM ‘C:\LOGS\Copy.log’ WHERE Process Like ‘%OWSTIMER.EXE%’”
Where the fields in the Uls Log file are as follows.
Timestamp Process TID Area Category EventID Level Message Correlation
In general fields can be identified as follows.
logparser -h -i:csv C:\perflogs\test.csv -iTsFormat:”mm/dd/yyyy hh:mm:ss”
and this is particularly useful for identifying timestamp formats.
At which point you should be able to determine whether you have security crawl issues as detailed in this blog http://blogs.msdn.com/russmax/archive/2009/02/09/troubleshooting-security-only-crawl.aspx.
If a security crawl is not the explanation for long running indexing then the next step focuses on the perfmon counters and in particular gathering evidence of retries or idle threads.
PerfMon comes with the following counters that are useful to drilldown on indexing performance.
Office Search Server Archival Plugin
Office Search Server Gatherer
Office Search Gatherer Projects
Office Server Search Indexer Catalogs
Office Search Schema Plugin
And as an explanation of poor indexer performance I am interested in file retries, idle threads and the overall document added rate as a first cut.
| Performance Object | Performance Counter |
| Office Search Server Gatherer | |
| Documents Filtered Rate | |
| Document Successfully Filtered Rate | |
| Delayed Documents | |
| Documents Delayed Retry | |
| Filtering Threads | |
| Idle Threads | |
| Threads blocked due to back off event | |
| Office Search Gatherer Projects | |
| Accessed File Rate | |
| Accessed HTTP Rate | |
| Delayed Documents | |
| Document Add Rate | |
| Retries Rate |
A PerfMon counter log file saved to disk and then converted from the BLG format to CSV format to use with LogParser or to import to SQL.
relog -f csv inputfile.blg -o outputFile.csv
logparser -i:csv “SELECT [(PDH-CSV 4.0) (GMT Standard Time)(0)],[\\WINSMOORIS\Office Server Search Gatherer\Threads In Plug-ins] FROM C:\perflogs\outputFile.csv TO C:\perflogs\LogParseroutputFile.csv” -iTsFormat:”mm/dd/yyyy hh:mm:ss.lll”
Web site crawls can be viewed in the table dbo.MSSCrawlUrl.









