Quantcast
Channel: All Forums
Viewing all articles
Browse latest Browse all 27852

Custom TSV input format

$
0
0

I am new to LogParser, but have to say it looks great...except for some really annoying things with log inputs.

I have a variety of logs I need to parse for one-time analysis and statistics. They will be in any number of formats, mostly proxy logs. Some are ISA and IIS, but some are SQUID, BlueCoat, SmartFilter, Apache and any number of other formats. The order of log fields in unpredictable and most have space delimiters, but sometimes commas or tabs. And there could be 20-30 Gig of logs in any processing batch.

Each LogParser input format creates it's own challenge with any of these logs. Some have #Fields:, other don't, some have quoted strings, some have timestamps like this: [29/May/2010:19:48:21 -0400] (which parses to 2 fields with a space delimiter)

I'm torn between the TSV and W3C formats. Each has _almost_ what I need for everything, but not quite all.

TSV has: headerRow/iHeaderFile, nSkipLines & lineFilter.

W3C has: dQuotes, #Fields: header row parsing.

Neither has both. So, is there a way to extend either of these formats so it has the other's features? I would settle for just dQuotes in the TSV format, but I will take what I can get. (There are lots of "User-Agent" strings, that really throw off parsing)

I understand there is some COM extensibility. I've looked at some sample code, but it just isn't clicking on how to put it all together in C# yet. I need to study more examples, but haven't found what I need to make the peices fall together yet. Any suggestion for examples for this task specifically? (note that I would prefer not to use regex. I have a Perl tool that does it already and am trying to replace that instead of having people hand-edit regexes all day)

My desired end-state is to have a C# front-end that can load the first N lines of any file format into a datagrid, and assign fieldnames from a selection so I can proceed with some of the statistics I want to gather.

If there is not something already canned for the COM plugin, can you guide me on how to write one to get really close to how TSV works?

(I guess the LogParser source itself isn't available, huh? )

Thanks.


Viewing all articles
Browse latest Browse all 27852

Trending Articles