Parse and import custom ASP.NET log files with SSIS

Importing text files into a database is a common scenario. It gets more interesting though when you are required to import log files generated by ASP.NET’s health monitoring.

We recently wrote a web application that receives and logs information via HTTP handlers. The logging was achieved using ASP.NET’s health monitoring. We wanted additional information to be included in the log entries however, so we implemented a custom event type by extending the WebRequestEvent base class. This allowed us to append the additional data we needed to the initial ASP.NET health monitoring log entry in a structured way.

Latterly we needed to import this data into a SQL Server 2008 database for analysis purposes. While, no specific effort was made to allow the file format to be parsable, we thought that we would try to use an out of the box tool. But not with much hope.

We decided to try to achieve this with a scheduled SSIS task that would run once a day. While we could have made the handlers log directly to the database, a conscious decision was made to keep these services fast, lightweight and single-purpose. Since real-time log analysis was not required, this also made it the more logical decision.

The challenge here was to use SSIS to parse the health monitoring log format. What makes this an interesting challenge is that it isn’t in a format that uses standard row and column delimiters throughout. This is in addition to the fact that the log format has a significant amount of noise in it (white space and fields of no value to us) that we aren’t actually interested in capturing.

Example:

Event code: 100005
Event message: Error event received from Device
Event time: 2/10/2010 1:01:03 AM
Event time (UTC): 2/10/2010 1:01:03 AM
Event ID: e5152f4d83d9481cba6a2dd7946e3b44
Event sequence: 275
Event occurrence: 2
Event detail code: 0

Application information:
Application domain: /AA/W3SVC/1/ROOT-1-123456789012345678
Trust level: Full
Application Virtual Path: /
Application Path: c:\inetpub\wwwroot\
Machine name: 12345-WEB

Process information:
Process ID: 3620
Process name: w3wp.exe
Account name: NT AUTHORITY\NETWORK SERVICE

Request information:
Request URL: https://web.company.net:443/diagnostics.ashx
Request path: /diagnostics.ashx
User host address: 127.0.0.1
User:
Is authenticated: False
Authentication Type:
Thread account name: NT AUTHORITY\NETWORK SERVICE

Even though we were interested in all of this data, we were not concerned with extracting each individual value.

SQL Server Integrated Services (SSIS)

We used the Flat File Source task to specify which columns we want to pass down the pipeline to subsequent tasks. This task utilises the Flat File Connection Manager which opens the file and extracts data based on various format settings. This is the crux of the whole extraction process and where the power of the Flat File Connection Manager becomes apparent.

Looking at the previous example of a health monitoring event you can see that most fields are prefixed by a colon (:) and so you may be tempted to try parsing the file by using a colon as a delimiter. However this will yield results bad results like so:

clip_image001

There are several problems here, first of all the colon only denotes the beginning of a new field, it doesn’t help us identify where that field ends – hence in the above example ‘Column 1’ contains the event code plus the phrase ‘Event message ‘ suffixed onto the end of it. Equally challenging; though less obvious from the picture; is that each column is prefixed with a white space character. This is going to pose all kinds of problems when we want to convert these to the correct data-types in SQL Server.

The answer to all of this is to leverage the advanced features of the SSIS Flat File Connection Manager, found in the Advanced tab. In the advanced tab you can specify individual columns and a delimiter for each column. Better still each delimiter can contain multiple characters. Now you have everything you need to parse the message.

Using the above log entry as an example the first thing we need to do is get rid of the noise at the beginning of the message so in the advanced tab I create a column called ‘Noise 1’ that has a Column Delimiter of “code: ” (Note the trailing white space at the end). This will match everything up to the beginning of the event code value.

We then create a second column called ‘Event Code’ this will actually store the event code value. The delimiter on this column should be set to “{LF}Event message : ” – this will read in the value up to the end of the line and set the cursor for the beginning of the next column to be just after the Event Message title.

If you rinse and repeat this technique of creating columns with unique delimiters and where necessary additional columns to just swallow up the noise that you don’t actually need you can quite easily parse the log file messages like so:

clip_image003

While you are tweaking these settings, Preview will allow you to see if everything is being extracted correctly. Using these settings, the Flat File Connection Manager was able to extract all the rows of data using unique Column Delimiters. Once this has been setup correctly, the Flat File Source task allows us to select the columns we were interested in for import into our database.

Date Formats and SSIS

We did run across an issue where the date format logged by the ASP.NET health monitoring was not recognised by SSIS as a valid format that can be converted to a SQL DateTime data type.

To get around this, we created a small Script Component task to handle this scenario. In the Script Component editor, we selected the specific column we wanted to convert from the Available Input Columns. Since we were only reading the value from this column, we set it to “ReadOnly”. Under Inputs and Outputs, we created an Output with a single column called ParsedDateTime that would be the column we would write to and use in the database insertion. Under Script, clicking the Edit Script button opens the script editor. At this point, the Input and Output columns are generated as properties on an object that represents the data buffer that the task interacts with. In the ProcessInputRow override for the input buffer, we converted the date format to a format that is known by SSIS:

var original = Row.ExtractedDateTime.Trim();
DateTime date =
    DateTime.ParseExact(original, "M/d/yyyy h:m:s tt", null);
Row.ParsedDateTime = date.ToString("yyyy-MM-ddTHH:mm:ss");

As this code snippet shows, we retrieve the value from the generated input column and convert it to the supported date format. We then set the output column’s value to the new date format. This output column is then added to the data pipeline and is available to all subsequent tasks.

Conclusion

This example really demonstrates the raw power that you get out of the box with SSIS. The most common use of a Flat File Connection Manager is to import simple comma (,) separated variable text files; a job it does admirably; but its capabilities are far greater than that.

We have demonstrated how we can use this SSIS component to parse text files with non-uniform delimited field values navigating our way through noise and white space. All too often a developer would probably choose to solve this problem using a custom script task (and regular expressions) which I’m sure you would agree would be a lot more effort!

We managed to import these log files with the very minimum of effort and custom code (3 lines!) once you really get to know SSIS you soon realise just how much you can achieve without the need for custom development.