Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

SQL Query Help: Need to remove consecutive items and just keep the earliest

1 view
Skip to first unread message

Arthur Dent

unread,
Sep 13, 2007, 5:27:55 PM9/13/07
to
I could really use some help with this problem. Think of a help desk
ticket database scenario. There is an audit/event log that captures
changes and timestamps when ticket records are updated. One of those
changes is when a ticket is reassigned/dispatched to another group.
For example, say a ticket originates at "Helpdesk" at 10:00 today.
There would be a timestamp in the log saying "Helpdesk" and 10:00.
Then say 30 minutes later it's assigned to Field Services. There
would be another entry made in the log that shows "Field Services" and
10:30... What I'm trying to do is calculate how long a ticket is
assigned to a group before being dispatched to the next group (30
minutes in this case). However read on for some complications to this
scenario...

Take a look at the following:

CREATE TABLE #AssignmentLog (
[case_id] [varchar] (6) NOT NULL,
[event_timestamp] [smalldatetime] NOT NULL,
[group_assigned] [varchar] (15) NOT NULL
PRIMARY KEY CLUSTERED (case_id, event_timestamp)
)

This table represents what the audit log looks like after I've
filtered it to display the relevant data. Here's some sample data to
put in it:

INSERT INTO #AssignmentLog (ticket_id, event_timestamp,
group_assigned)
VALUES ('ABC001', '9/10/2007 15:30:00', 'Helpdesk')
INSERT INTO #AssignmentLog (ticket_id, event_timestamp,
group_assigned)
VALUES ('ABC001', '9/10/2007 15:42:00', 'Server Admin')
INSERT INTO #AssignmentLog (ticket_id, event_timestamp,
group_assigned)
VALUES ('ABC001', '9/10/2007 15:50:00', 'Helpdesk')
INSERT INTO #AssignmentLog (ticket_id, event_timestamp,
group_assigned)
VALUES ('ABC001', '9/10/2007 15:52:00', 'Helpdesk')
INSERT INTO #AssignmentLog (ticket_id, event_timestamp,
group_assigned)
VALUES ('ABC001', '9/10/2007 16:01:00', 'Helpdesk')
INSERT INTO #AssignmentLog (ticket_id, event_timestamp,
group_assigned)
VALUES ('ABC001', '9/10/2007 16:22:00', 'Server Admin')
INSERT INTO #AssignmentLog (ticket_id, event_timestamp,
group_assigned)
VALUES ('ABC001', '9/11/2007 8:30:00', 'Server Admin')
INSERT INTO #AssignmentLog (ticket_id, event_timestamp,
group_assigned)
VALUES ('ABC001', '9/11/2007 9:05:00', 'Field Services')

So after inserting the data, we'd have something like this:

SELECT ticket_id, event_timestamp, group_assigned
FROM #AssignmentLog
ORDER BY ticket_id, event_timestamp

ticket_idevent_timestamp group_assigned
--------- ------------------- ---------------
ABC001 09/10/2007 15:30:00 Helpdesk
ABC001 09/10/2007 15:42:00 Server Admin
ABC001 09/10/2007 15:50:00 Helpdesk
ABC001 09/10/2007 15:52:00 Helpdesk
ABC001 09/10/2007 16:01:00 Helpdesk
ABC001 09/10/2007 16:22:00 Server Admin
ABC001 09/11/2007 08:30:00 Server Admin
ABC001 09/11/2007 09:05:00 Field Services

So what this data shows us is ticket ABC001 originated at the Helpdesk
on 9/10/07 15:30, was dispatched to the Server Admins, came back to
the helpdesk, floated around there a few times, went back to the
Server Admins, and finally to Field Services.

Notice rows 3 - 5 have multiple consecutive helpdesk entries, and rows
6 and 7 have two back to back Server Admin entries. This is where my
problem lies.

First, this is not a data error. The term "Helpdesk" is actually a
parent label that encompasses multiple sub-groups within the
Helpdesk. Same thing for Server Admin and others. But since I'm only
interested in looking at dispatches between these parent groups, ahead
of time I lumped those sub-groups into their parent name. In other
words, looking at rows 3 - 6: Row three: the ticket came back to the
helpdesk from Server Admin. Row four: the helpdesk dispatched the
ticket to another team within the helpdesk. Row five: the helpdesk
dispatched it again internally: Row six: the helpdesk finally sent it
back to the Server Admins...

I'm not interested in these internal dispatches... I just want to know
when the ticket left the helpdesk to go to an outside group like
Server Admins, Field Services, etc. So when there are consecutive
entries within the same group, I just need the earliest one of the
series. Here is the output I'm trying to generate:

ticket_idevent_timestamp group_assigned
--------- ------------------- ---------------
ABC001 09/10/2007 15:30:00 Helpdesk
ABC001 09/10/2007 15:42:00 Server Admin
ABC001 09/10/2007 15:50:00 Helpdesk
ABC001 09/10/2007 16:22:00 Server Admin
ABC001 09/11/2007 09:05:00 Field Services

But how can I do that in a query? I can't simply use a GROUP BY
clause, something like this:

SELECT ticket_id, MIN(event_timestamp) AS event_timestamp,
group_assigned
FROM #AssignmentLog
GROUP BY ticket_id, group_assigned
ORDER BY ticket_id, MIN(event_timestamp)

Because that will lump EVERYTHING to the earliest occurance of each
group_assigned entry. In other words that doesn't account for the
items in a consecutive series. The output looks like:

ticket_id event_timestamp group_assigned
--------- ------------------- ---------------
ABC001 09/10/2007 15:30:00 Helpdesk
ABC001 09/10/2007 15:42:00 Server Admin
ABC001 09/11/2007 09:05:00 Field Services

This shows the first time the helpdesk had the ticket, but doesn't
show when the ticket came BACK to the helpdesk.

I've tried multiple variations of sub-queries with unequal joins (<,
>, etc) but keep running into similar results... can't get the query
to notice consecutive entries.

I would really appreciate any direction or ideas someone may have
about how best to accomplish this. For reference I'm using MS SQL
2000.

Thanks!

Tom Cooper

unread,
Sep 13, 2007, 6:21:07 PM9/13/07
to
First, thanks for providing the DDL and sample data.

SELECT a.ticket_id, a.event_timestamp, a.group_assigned
FROM #AssignmentLog a
WHERE NOT EXISTS (SELECT * FROM #AssignmentLog a1
WHERE a1.ticket_id = a.ticket_id
AND a1.event_timestamp = (SELECT MAX(event_timestamp)
FROM #AssignmentLog a2
WHERE a2.event_timestamp < a.event_timestamp
AND a2.ticket_id = a.ticket_id)
AND a1.group_assigned = a.group_assigned)
ORDER BY ticket_id, event_timestamp

Tom

"Arthur Dent" <dwt1...@gmail.com> wrote in message
news:1189718875.4...@19g2000hsx.googlegroups.com...

Arthur Dent

unread,
Sep 14, 2007, 11:29:10 AM9/14/07
to
On Sep 13, 5:21 pm, "Tom Cooper"

<tomcoo...@comcast.no.spam.please.net> wrote:
> First, thanks for providing the DDL and sample data.
>
> SELECT a.ticket_id, a.event_timestamp, a.group_assigned
> FROM #AssignmentLog a
> WHERE NOT EXISTS (SELECT * FROM #AssignmentLog a1
> WHERE a1.ticket_id = a.ticket_id
> AND a1.event_timestamp = (SELECT MAX(event_timestamp)
> FROM #AssignmentLog a2
> WHERE a2.event_timestamp < a.event_timestamp
> AND a2.ticket_id = a.ticket_id)
> AND a1.group_assigned = a.group_assigned)
> ORDER BY ticket_id, event_timestamp
>
> Tom
>
> "Arthur Dent" <dwt12...@gmail.com> wrote in message
> > Thanks!- Hide quoted text -
>
> - Show quoted text -

This is outstanding. Works perfectly, and now I just have to stare at
it and figure out WHY it works! Thank you!

Steve Dassin

unread,
Sep 20, 2007, 5:42:47 AM9/20/07
to
What you really need is a dense rank 'super function'. Just plug and play
:-)

http://beyondsql.blogspot.com/2007/09/dataphor-super-function-ii.html


0 new messages