index rebuilding...

Glen A Stromquist

unread,

Jan 29, 2003, 8:40:52 PM1/29/03

to

I want to rebuild the indexes on one of our databases, this particular
one has some indexes that have what seems to be some really out-to-lunch
storage settings, so I want to set new ones in the rebuild.

I want to write a dynamic script to build the rebuild statement with the
optimum storage params for each index, either that or find a script
that does such a thing, but my initial searches have proven fruitless
for what I want to do. Which dd tables/views should I be hitting for
this query to build the rebuild statement?

suggestions?

TIA

Sybrand Bakker

unread,

Jan 29, 2003, 9:04:52 PM1/29/03

to

"Glen A Stromquist" <glen_st...@no.spam.yahoo.com> wrote in message
news:o%WZ9.61302$c41.1...@news2.telusplanet.net...

alter index rebuild can rebuild the index without knowing it's column
definition.
That said you would only need dba_indexes and dba_segments (the segment_name
equals the index_name)
I saw a script today, but unfortunately I don't remember the author.
The commercial version of Toad has a separate module which allows you to
point and shoot at the problem.

Hth

--
Sybrand Bakker
Senior Oracle DBA

to reply remove '-verwijderdit' from my e-mail address

Ryan

unread,

Jan 29, 2003, 9:24:15 PM1/29/03

to

"Sybrand Bakker" <pos...@sybrandb.demon.nl> wrote in message
news:v3ghe8q...@corp.supernews.com...

I could have swarn I read something either in Tom Kyte's book or on his site
stating that rebuilding indexes is a bad idea? I saw something about how an
index will eventually get to a state of equilibrium and you may experience
slowdowns while it reestablishes that equilibrium.

you know any documentation on this? about when to rebuild indices and such?

Glen A Stromquist

unread,

Jan 29, 2003, 10:48:05 PM1/29/03

to

well, I also discovered a few indexes that were also in the data
tablespace, so I ran the script that this script generated:

select 'alter index <OWNER>.'||segment_name||
' rebuild tablespace INDEX storage(initial '||round(bytes/1024)||
'K next '||round(next_extent/1024)||'K pctincrease 0);'
from dba_segments
where owner = '<OWNER>'
and tablespace_name not like '%INDEX%'
and segment_type = 'INDEX'

This moved the index to the correct tablespace, and set the initial
extent to the size of the index in K, and next_extent to what it already
was set at, next I will find all of the indexes that have a pctincrease
> 0 and adjust accordingly, and I think I should be able to use this
same script to modify the storage for all the indexes with out-to-lunch
storage settings by modifying the next_extent clause to a standard size
of 1M or 5M.

Still like to hear opinions if there is a better way to do this....

TIA

Frank

unread,

Jan 30, 2003, 5:42:23 AM1/30/03

to

Think about fragmentation of your tablespace, if you change only the storage
params and not the tablespace, you will fagment your tablespace.
On this way you can move the INDEX into an other Tablespace (why not
local-managed).
It is better to create three local-managed tablespaces (index_large,
index_middle and index_small) with large, middle and small extent size.

Frank (Germany)

Glen A Stromquist <glen_st...@no.spam.yahoo.com> schrieb in im
Newsbeitrag: FSYZ9.61355$c41.1...@news2.telusplanet.net...

Howard J. Rogers

unread,

Jan 30, 2003, 7:35:15 PM1/30/03

to

"Ryan" <rgaf...@cox.net> wrote in message
news:3EXZ9.53855$GX4.2...@news2.east.cox.net...

> I could have swarn I read something either in Tom Kyte's book or on his
site
> stating that rebuilding indexes is a bad idea?

You would have done. Tom claims (or, I should say, once told me) that he's
only ever rebuilt an index about 8 times in his working life.

>I saw something about how an
> index will eventually get to a state of equilibrium and you may experience
> slowdowns while it reestablishes that equilibrium.
>
> you know any documentation on this? about when to rebuild indices and
such?

You might also find a snippet on this subject in 'Beginning Oracle
Programming' (Wrox) -end of shameless plug- in the quite brilliantly-written
chapter on Indexes. Jonathan Lewis also has some advice on the matter in his
Practical 8i book.

The point Tom was making, and one I did some testing on, is that an index
may grow and grow and grow, and in the process acquire a rather large number
of extents, many of which may contain bits of empty space as a result of
updates and deletes on the associated table. If you rebuild the index, then
sure enough the index will be compacted down to a smaller size (and hence
fewer extents). Great. Except that presumably DML on the table doesn't
suddenly come grinding to a halt, so that the very next bit of DML that
happens on the table will cause the index to start getting bloated again.
And (here's Tom's particular point) as the index starts getting bloated
again, it has to *re*acquire the extents which it had already gone to the
trouble of acquiring before the rebuild. In dictionary-managed tablespaces,
extent acquisition can be an expensive process (especially if there's
contention for the UET$ and FET$ dictionary tables which are involved).
Therefore, performance can actually slow down after an index rebuild,
because you have extent allocations to deal with. The index itself also has
to re-organise itself as DML takes place on the table, with block splits and
so on. Before the rebuild, there was plenty of empty space into which new
entries could be fitted without inducing heaps of cascading re-organisation.
With a freshly-compacted index, re-organisation can be expensive. I confess
that most of my testing indicated not much performance degradation after a
rebuild, but some of it did. So it can happen.

Jonathan makes much the same point in his book: it's daft to constantly
rebuild indexes in pursuit of a mythical 100% efficiency, because continuing
DML on the table means that the nanosecond you achieve 100% efficiency, you
start losing it again (fresh block splits etc etc etc).

There's also the fact that a hoary old myth still exists that Oracle never
re-uses deleted space in an index. But that's just not true. If you delete
from EMP where name='KELLY', and then insert into emp (name) values 'KIM'
then you will indeed (probably) re-use the index entry space for the dead
KELLY entry. Meaning that your second insert does NOT have to induce a block
split, and that an analysis of the index would reveal zero deleted leaf
rows. In other words, Oracle uses index space rather more efficiently than
many people think, and as a result the initial *need* for an index rebuild
is rather lower than many people think.

Tom went so far as to say that 'only truly degenerate indexes' should ever
need to be rebuilt. I wasn't so much of a true believer, I confess, so I
watered that down to 'an occasional rebuild' *might* be appropriate.

Bearing in mind that even online rebuilds take exclusive table locks at the
beginning and end of the rebuild (and an offline rebuild takes an exclusive
lock for the duration of the rebuild, of course) then the costs of rebuilds
can be considered large, and so you are weighing a small benefit of
increased efficiency (which, as explained above, might not be much of a
benefit in any case) versus a big drawback of exclusive locking, lots of I/O
and so on.

Net result: rebuilds should be rare.

Regards
HJR

Glen A Stromquist

unread,

Jan 30, 2003, 8:17:11 PM1/30/03

to

Frank wrote:
> Think about fragmentation of your tablespace, if you change only the storage
> params and not the tablespace, you will fagment your tablespace.
> On this way you can move the INDEX into an other Tablespace (why not
> local-managed).
> It is better to create three local-managed tablespaces (index_large,
> index_middle and index_small) with large, middle and small extent size.
>
> Frank (Germany)
>

I think I may just do that, but the database compatible is set a 8.0.5
so I am waiting to hear from the vendor if there is a reason it has to
be at that, as I need it at 8.1.x to have locally managed tablespaces.
Then I need to research in to setting uniform allocation or letting
Oracle do it.

Glen A Stromquist

unread,

Jan 30, 2003, 8:54:02 PM1/30/03

to

I hoping that this rebuilding will be a rare occasion Howard, in this
case I mostly want to set the storage params to something resembling
normal, and at the same time move them to a more suitable tablspace,
perhaps as Frank suggested...

cheers!

DA Morgan

unread,

Jan 30, 2003, 10:34:20 PM1/30/03

to

"Howard J. Rogers" wrote:

> <whole lot of snipping>

You have just brought up an interesting question I've never heard addressed.
Tables and indexes are all just segments and a segment is a segment is a
segment. With a table, if you empty a table with a delete there is a high water
mark that when a full table scan takes place forces a read of every block. What
happens with an index scan? Does it also read all of the empty index blocks? My
guess is that it must because it has no way to know when to stop.

So while an index may begin becoming inefficient again after a rebuild, what
doesn't, there is presumably a point at which there is a net benefit to a
rebuild. And can we identify some guidlines around where that is?

Comments appreciated.

Daniel Morgan

Richard Foote

unread,

Jan 31, 2003, 3:21:08 AM1/31/03

to

DA Morgan <damo...@exesolutions.com> wrote in message news:<3E39A86C...@exesolutions.com>...
<Just snipping Howard going on again ;)>

>
> You have just brought up an interesting question I've never heard addressed.
> Tables and indexes are all just segments and a segment is a segment is a
> segment. With a table, if you empty a table with a delete there is a high water
> mark that when a full table scan takes place forces a read of every block. What
> happens with an index scan? Does it also read all of the empty index blocks? My
> guess is that it must because it has no way to know when to stop.
>
> So while an index may begin becoming inefficient again after a rebuild, what
> doesn't, there is presumably a point at which there is a net benefit to a
> rebuild. And can we identify some guidlines around where that is?
>
> Comments appreciated.
>
> Daniel Morgan

Hi Daniel,

It depends on what you mean by an "Index Scan".

If you mean a "Full Index Scan" or an "Index Range Scan", then no,
these empty blocks have no effect as they're not part of the current
index structure and are therefore not traversed by Oracle.

If however you mean a "Fast Full Index Scan", then yes we have a
potential issue as Oracle performs a multiblock read and has no choice
but to read in these empty blocks until it reaches the HWM of the
index segment. And just like a table, if we have heaps of empty blocks
below the HWM, such scans are going to be correspondingly inefficient.

Note however that should such a situation arise, the corresponding
table is likewise going to be inefficient and benefit from a reorg
anyway (unless a large insert is imminent).

Cheers

Richard

DA Morgan

unread,

Jan 31, 2003, 7:30:17 AM1/31/03

to

Richard Foote wrote:

That is what I was assuming. Thanks for the confirmation.

Daniel Morgan

Message has been deleted

Jonathan Lewis

unread,

Jan 31, 2003, 4:07:14 PM1/31/03

to

Don,

It's perfectly okay to say that the article that
comes up is one of yours - after all, a user who
is searching google in their local language may
have something completely different as the
second hit.

A couple of points, by the way:

Oracle B-tree indexes are balanced: every leaf
is the same height from the root, it is not possible
for some leaf blocks to have height three and others
to have height four. (This is an error I used to make
as well, before I sat down and though about block
splits).

The 'gets per key' is affected by the typical number
of table rows per key - so rebuilding the index because
it exceeds any particular level is unlikely to be of any
benefit. The formula for this column is roughly:
height + half of (rows in index / distinct keys in index)

I can't think why the factor of a half has been included
though.

--
Regards

Jonathan Lewis
http://www.jlcomp.demon.co.uk

Coming soon a new one-day tutorial:
Cost Based Optimisation
(see http://www.jlcomp.demon.co.uk/tutorial.html )

____UK_______March 19th
____USA_(FL)_May 2nd

Next Seminar dates:
(see http://www.jlcomp.demon.co.uk/seminar.html )

____USA_(CA, TX)_August

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html

Don Burleson wrote in message
<998d28f7.0301...@posting.google.com>...
>Hi,
>
>Do a Google search on "oracle index rebuild".
>
>The second hit has a good article on the topic.

DA Morgan

unread,

Jan 31, 2003, 5:04:20 PM1/31/03

to

Jonathan Lewis wrote:

I never thought I'd ever try to correct you but I believe (sticking it
way out there on ths one) it is possible for there to be four on one and
three on the other from time-to-time as Oracle doesn't rebalance every
time someone performs a single insert or delete. Isn't it more a case of
never having 5 and 3?

Daniel Morgan

Howard J. Rogers

unread,

Jan 31, 2003, 5:48:29 PM1/31/03

to

"Don Burleson" <d...@burleson.cc> wrote in message
news:998d28f7.0301...@posting.google.com...

> Hi,
>
> Do a Google search on "oracle index rebuild".
>
> The second hit has a good article on the topic.

What sort of advice is that? Do you really think Google will always return
the same results in the same order, globally and forever? Why not just
provide the link directly (which, for anyone interested, I imagine was
intended to be Don's own article at http://www.dba-oracle.com/art_index1.htm

As it is, it's one of the weakest, error-ridden pieces of writing I've seen
in a long time, and certainly is NOT a "good article on the topic". Let's
start with the first line, shall we:

"Indexes require rebuilding when deleted leaf nodes appear". I don't even
know what a deleted leaf node is. Do you mean 'deleted leaf rows'? In which
case, you're wrong, because deleted leaf rows can disappear just as easily
as appear. Oracle re-uses deleted leaf row space, all on its own.

Next. " Oracle index nodes are not physically deleted when table rows are
deleted". I would have more confidence in the advice you offer if you could
get the distinction between a node (ie, a block)and a row right. You go on
"deleted leaf nodes can be easily identified " and then show a report one of
whose headings is, in plain black and white, 'Deleted Leaf Rows'.

As for this: "Note that Oracle indexes will "spawn" to a fourth level only
in areas of the index where a massive insert has occurred, such that 99% of
the index has three levels, but the index is reported as having four levels.
" ... well, all I can say is: this is complete baloney. Oracle's index
structures are B*Trees, meaning that they are 'balanced' trees. Meaning that
if it takes you three steps to get to the leaf nodes on one side of the
index, it is utterly impossible for it to take 4 on the other side. Oracle
guarantees that the height of the index is always balanced across the entire
index. the situation you describe does not, cannot and will not happen in
Oracle.

"Another rebuild condition would be cases where deleted leaf nodes comprise
more than 20% of the index nodes". Not true. You may well have more than 20%
of your leaf ROWS shown as being deleted. But they'll be re-used if you wait
long enough, because Oracle re-uses the space:

SQL> create index emp_ename_idx on emp(ename);
Index created.

SQL> analyze index emp_ename_idx validate structure;
Index analyzed.

SQL> select lf_rows, del_lf_rows from index_stats;

LF_ROWS DEL_LF_ROWS
---------- -----------
19 0

[ie, no deleted leaf ROWS (and please note Oracle's own terminology)]

SQL> update emp set ename='ZEBEDEE' where ename='ALLEN';
1 row updated.
SQL> commit;
Commit complete.

SQL> analyze index emp_ename_idx validate structure;
Index analyzed.

SQL> select lf_rows, del_lf_rows from index_stats;
LF_ROWS DEL_LF_ROWS
---------- -----------
20 1

[Oh Lord, a deleted leaf row has appeared!!! Better get ready to rebuild, I
suppose. But hang on...]

SQL> insert into emp (empno, ename) values (9993, 'ABBOT');
1 row created.
SQL> commit;
Commit complete.

SQL> analyze index emp_ename_idx validate structure;
Index analyzed.

SQL> select lf_rows, del_lf_rows from index_stats;

LF_ROWS DEL_LF_ROWS
---------- -----------
20 0

[Phew! A fresh insert has gotten rid of the deleted leaf row, meaning that
the space has been re-used. And note the original entry was 'ALLEN' and the
new one was 'ABBOT', so it's not a question of needing a new entry identical
to the old before the space is reused. In point of fact, an insert that
makes use of a deleted leaf row space clears out ALL deleted leaf rows in
that block, so if I'd updated 'ALSOP', 'ALBERTT', 'ATHGUARD' and 'ATHLON',
the insertion of 'ABBOT' would have removed all four deleted leaf rows. As
proof:

insert various new rows into EMP;
SQL> select ename from emp
2 where ename like 'A%';
ENAME
----------
ABBOT
ADAMS
ALBERT
ALSOPP
ATHERGUARD
ATHLIN

SQL> drop index emp_ename_idx;
Index dropped.

SQL> create index emp_ename_idx on emp(ename);
Index created.

SQL> analyze index emp_ename_idx validate structure;
Index analyzed.

SQL> select lf_rows, del_lf_rows from index_stats;
LF_ROWS DEL_LF_ROWS
---------- -----------
24 0

Now time for the updates.

SQL> update emp set ename='ALSOP' where ename='ALSOPP';
1 row updated.
SQL> update emp set ename='ALBERTT' where ename='ALBERT';
1 row updated.
SQL> update emp set ename='ATHGUARD' where ename='ATHERGUARD';
1 row updated.
SQL> update emp set ename='ATHLON' where ename='ATHLIN';
1 row updated.
SQL> commit;

SQL> analyze index emp_ename_idx validate structure;
Index analyzed.

SQL> select lf_rows, del_lf_rows from index_stats;

LF_ROWS DEL_LF_ROWS
---------- -----------
28 4

SQL> insert into EMP (empno, ename)
2 values (4453, 'ABBOT');
1 row created.
SQL> commit;
Commit complete.

SQL> analyze index emp_ename_idx validate structure;
Index analyzed.

SQL> select lf_rows, del_lf_rows from index_stats;
LF_ROWS DEL_LF_ROWS
---------- -----------
25 0

So one insert cleans out 4 deleted entries, making their space available for
re-use. And not a rebuild in sight.

If this were an index on a sequence number, then yes -the nature of
sequences is such that you'll never be assigned a new number which could
make use of the previously-deleted rows, so a rebuild MIGHT be warranted.
(But I'd try a coalesce first, if I had 8i -and if your article was only
ever written for Oracle 7, then don't you think it's a bit 'off' offering it
as 'good advice' on index management 3 major releases later?)].

Next... "As you may know, you can easily rebuild an Oracle index with the
command: ALTER INDEX index_name REBUILD". You're kidding, right?? "easily
rebuild"???? Never mind the vast amounts of I/O, the exclusive table
locking. Oh, I see... your description of what a rebuild does talks about it
'walking the existing index' and 'populating temporary segments', but the
minor matter of locking out all users from performing DML on the indexed key
doesn't warrant a mention. Uh huh.

I won't even get into whether the statement that 'bitmapped indexes are
faster when the index key has less than 25 distinct values' has any validity
at all. Faster at what, and than what? And why 25? You mean a bitmap index
with 26 distinct values is going to be slower? Unacceptably slower?
Imperceptibly slower? Than knitting? Than using a b*tree index? Than doing a
full table scan? The number of distinct values for which a bitmap index
might be beneficial is *relative* to the number of records in the table, and
there is no specific cut-off point at which they begin to be useful or cease
to be useful. The number 25 is a figment of your imagination.

And I notice you cheerfully recommend converting a b*tree index into a
bitmap one if the magic number 25 is true, without once mentioning the
horrendous side effects of doing so in a DML-intensive environment.

I dunno. Maybe I'm missing something. But this article is so trite,
incorrect, full of wrong terminology, and downright misleading that it
wouldn't be my first recommendation (or indeed any recommendation at all)
for anyone wanting to know something meaningful about index rebuilds. Maybe
that's why you didn't include the actual article link?

HJR

Howard J. Rogers

unread,

Jan 31, 2003, 5:52:37 PM1/31/03

to

"DA Morgan" <damo...@exesolutions.com> wrote in message

> I never thought I'd ever try to correct you but I believe (sticking it
> way out there on ths one) it is possible for there to be four on one and
> three on the other from time-to-time as Oracle doesn't rebalance every
> time someone performs a single insert or delete. Isn't it more a case of
> never having 5 and 3?

Jonathan can answer for himself, of course. But I'll take a stab at it
too... no, it's not possible for a single insert to result in an unbalanced
index. Oracle's indexes are *always* balanced. That's why you really don't
want to create indexes unnecessarily, because they can slow down simple DML
on the table dramatically. Your simple insert that threatens to unbalance an
index will cause a whole raft of cascading adjustments to the index
structure precisely to avoid imbalance.

I don't know whether it's just a myth or not, but Oracle refers to its
indexes as "B*Tree" structures, not 'b-tree' ones. And that's because the
'b' doesn't stand for 'binary', but 'balanced'.

Whether that bit of apocrypha is true or not, Oracle index structures are
always re-balanced.

Regards
HJR

Jonathan Lewis

unread,

Jan 31, 2003, 9:54:02 PM1/31/03

to

There are two reasons why I used to think this.
First - the term 'rebalancing' misdirects the intuition,
and second the manuals (once upon a time, perhaps)
implied that the target of the B*tree mechanism was
to ensure that no leaf was more than one layer deeper
than any other leaf.

The mechanism on inserts is simple and recursive:
If you need to insert an index entry and the leaf
is full, then it splits. But if it splits, the branch block
above it has to have an extra entry (pointing to
the value that starts the new leaf block) - so Oracle
has to insert an entry into the branch block.

But if the branch block is full, it has to split - which
means the branch block above it has to have a
new entry - but if that block is full ..... and so on
until the block above is the root block.

So what do you do if the root block is full ?
Split it in two, and create a new root block
above it with just two entries - at which point
EVERY leaf block has changed height.

--
Regards

Jonathan Lewis
http://www.jlcomp.demon.co.uk

Coming soon a new one-day tutorial:
Cost Based Optimisation
(see http://www.jlcomp.demon.co.uk/tutorial.html )

____UK_______March 19th
____USA_(FL)_May 2nd

Next Seminar dates:
(see http://www.jlcomp.demon.co.uk/seminar.html )

____USA_(CA, TX)_August

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html

DA Morgan wrote in message <3E3AAC94...@exesolutions.com>...

DA Morgan

unread,

Jan 31, 2003, 11:56:47 PM1/31/03

to

"Howard J. Rogers" wrote:

Just went through my files and found a posting by Jonathan from 23 May, 2002
subject title "unbalance indexes --common widsom?". And I quote:

"No matter how perfectly an index is maintained, it will always be possible to
ensure that you can get it to a state where either just one 3rd level node has
split to produce an "imbalance" or (the only possible alternative) every 3rd
level node has to split to leave the index half-populated if you want "balance"
- i.e. every leaf has to be at the fourth level.

The important point about balance b-trees is that no leaf is MORE THAN ONE level
deeper that every other leaf."

(even the above misspelling accurately copied).

So unless Jonathan has changed his mind ... my recollection is correct.

Jonathan?

Daniel Morgan

Howard J. Rogers

unread,

Feb 1, 2003, 12:59:39 AM2/1/03

to

He's changed his mind.

Regards
HJR

"DA Morgan" <damo...@exesolutions.com> wrote in message

news:3E3B0D3F...@exesolutions.com...

DA Morgan

unread,

Feb 1, 2003, 1:12:27 AM2/1/03

to

"Howard J. Rogers" wrote:

So now you ARE speaking for him. <g>

Thanks.

Daniel Morgan

Richard Foote

unread,

Feb 1, 2003, 2:34:47 AM2/1/03

to

"DA Morgan" <damo...@exesolutions.com> wrote in message

news:3E3B0D3F...@exesolutions.com...

Hi Daniel,

I remember these posts as it was around the time I appeared on the scene !!

I questioned him at the time and he subsequently corrected himself later in
the thread.

Indexes are *always* balanced.

Cheers

Richard

Howard J. Rogers

unread,

Feb 1, 2003, 1:36:48 AM2/1/03

to

No, I read his reply.

Regards
HJR

Howard J. Rogers

unread,

Feb 1, 2003, 1:37:21 AM2/1/03

to

"Richard Foote" <richar...@bigpond.com> wrote in message
news:XcF_9.37805> Hi Daniel,

>
> I remember these posts as it was around the time I appeared on the scene
!!
>
> I questioned him at the time and he subsequently corrected himself later
in
> the thread.
>
> Indexes are *always* balanced.
>

Thanks, Richard.
Regards
HJR

Message has been deleted

Howard J. Rogers

unread,

Feb 1, 2003, 5:07:49 AM2/1/03

to

"Don Burleson" <d...@burleson.cc> wrote in message

news:998d28f7.03013...@posting.google.com...

> > "Indexes require rebuilding when deleted leaf nodes appear". I don't
even
> > know what a deleted leaf node is.
>

> Yes, that is very apparent.

The point, which appears to have sailed straight over your head, is that
they are called leaf ROWS, not leaf NODES. As the report in your article
actually goes on to show.

>Despite your delusions of knowledge about
> a "pure" B*Tree, an Oracle index tress cannot be a pure b*tree because
> it is impossible to index 100m rows in only four levels in a b*tree.
> Did you flunk your basic data structures class?

Er, where exactly did I say that such an index had to have a height of four?
Nowhere, that's where. What I said, and which is true, is that if its 4 on
one side, it is 4 on the other, not 3 and 4 as you wrote in the article.

> Oracle maintains many sub-keys within each inode.
> When the number of
> entries exceeds the internal "index_block_contains" threshold
> (normally 50, depending on the key length), the Oracle index splits
> until it reaches the maximum supported level of its superior node. At
> that point, the index "spawns" into another level.

Big deal. We know this. What you've missed out that if it spawns a new
level, there's a deal of re-organisation in the branch blocks (and the root
block, if required) to maintain a consistent height across the entire index.

> I cannot
> comprehend that you think that Oracle will re-level a whole index tree
> in real-time. Think about it!

I have. It does. Don't believe me? Ask Jonathan. Or Tom Kyte. Or Steve
Adams. Or, come to that, me.

> > Oracle re-uses deleted leaf row space, all on its own.
>

> Yes, Einstein, in your simplistic example Oracle will re-use an
> adjacent deleted node that rarely happens in the real-world. In most
> cases, indexes with significant DML will have large amounts of dead
> leaf nodes.

Leaf rows you mean.

And how you can say that beats me. Unless your index is on a monotonically
incrementing sequence number, then you can't rule out the possibility of an
insert or an update wanting to make use of deleted space. So where you have
'significant DML' is precisely where the deleted leaf rows will be re-used.
Assuming that your DML is broadly-speaking 'randomized'.

And why you persist in inventing your own terminology for what Oracle has
already perfectly clearly named also beats me.

But it doesn't surprise me.

>
> > Next. " Oracle index nodes are not physically deleted when table rows
are
> > deleted". I would have more confidence in the advice you offer if you
could
> > get the distinction between a node (ie, a block) and a row right.
>

> Yes, I agree, one of us does not know what they are talking about.

Please do a describe on INDEX_STATS. Please report back what the column
headings are. If there's one called 'LEAF_NODES', I'll eat copious
quantities of Humble Pie. Oh, alright. Tell you what: I'll do it for you.

HEIGHT
BLOCKS
NAME
PARTITION_NAME
LF_ROWS
LF_BLKS
LF_ROWS_LEN
LF_BLK_LEN
BR_ROWS
BR_BLKS
BR_ROWS_LEN
BR_BLK_LEN
DEL_LF_ROWS
DEL_LF_ROWS_LEN
DISTINCT_KEYS
MOST_REPEATED_KEY
BTREE_SPACE
USED_SPACE
PCT_USED
ROWS_PER_KEY
BLKS_GETS_PER_ACCESS
PRE_ROWS
PRE_ROWS_LEN
OPT_CMPR_COUNT
OPT_CMPR_PCTSAVE

Gosh. Not a "LEAF_NODE" column in sight, though I see one called "LF_ROWS"
and another called "DEL_LF_ROWS". Guess I have to go hungry then.

If you're going to post to an *Oracle* newsgroup, citing an article
addressing *Oracle* index issues, then it behoves you to stick to *Oracle*
terminology. It also behoves you to get it right, but that's a tad harder to
pull off, obviously.

> > As for this: "Note that Oracle indexes will "spawn" to a fourth level
only
> > in areas of the index where a massive insert has occurred, such that 99%
of
> > the index has three levels, but the index is reported as having four
levels.
> > " ... well, all I can say is: this is complete baloney.
>

> You really should take the time to check your facts before you
> embarrass yourself.

Hi Kettle. You must be Pot.

>
> > Oracle's index structures are B*Trees, meaning that they are 'balanced'
trees.
>

> No genius, look at the DESCTs. If you take the trouble to decompile
> the code and look at a few block dumps, you can see exactly how it
> works.

Been there, done that. You're wrong.

Your index article is about as good as some of the other performance tuning
stuff you've published in the past. IE, not very.

HJR

DA Morgan

unread,

Feb 1, 2003, 7:09:17 AM2/1/03

to

"Howard J. Rogers" wrote:

Thank you all for this spirited, if somewhat testosterone-laden, celebration of
index balancing.

My students will no doubt continue to benefit from your knowledge and wisdom.

Now about that Macallan I had last evening.

Daniel Morgan

Howard J. Rogers

unread,

Feb 1, 2003, 7:29:03 AM2/1/03

to

"DA Morgan" <damo...@exesolutions.com> wrote in message

> Thank you all for this spirited, if somewhat testosterone-laden,
celebration of
> index balancing.

Now, clearly, I appear to be giving out an impression which it wasn't my
intention to give. So I just want to clarify: whilst Don might throw in the
odd 'Einstein' comment, and other bits of sarcasm directed against me
personally, I have criticised the content of his *writing*. I don't know Don
Burleson from Adam, and I'm sure he makes a pretty mean Pavlova whilst
juggling icicles on a hot summer's day.

But his article is trash. And whilst my prolix prose might give the
impression of testosterone oozing with every key-press, that wasn't my
intention, and I apologise to the extent that anyone's got that impression.

I just can't stand people trotting out technical garbage and then getting
personal about it when called on it. That's all.

In the spirit of backing off, I just quote two small passages:

Don Burleson: "I cannot comprehend that you think that Oracle will re-level

a whole index tree in real-time."

Jonathan Lewis: "The mechanism on inserts is simple and recursive: If you

need to insert an index entry and the leaf is full, then it splits. But if
it splits, the branch block above it has to have an extra entry (pointing
to the value that starts the new leaf block) - so Oracle has to insert an
entry into the branch block. But if the branch block is full, it has to
split - which means the branch block above it has to have a new entry - but
if that block is full ..... and so on until the block above is the root
block. So what do you do if the root block is full ? Split it in two, and
create a new root block above it with just two entries - at which point
EVERY leaf block has changed height."

Which is precisely what I said, but I realise no-one believes it when I say
it.

>
> My students will no doubt continue to benefit from your knowledge and
wisdom.
>
> Now about that Macallan I had last evening.

Ughhh. Jamieson's or nothing, I'm afraid. Or, if all else fails, this rather
fine 1993 Shiraz wot I'm quaffing as we speak.

And who was posting about Bushmill's Paintstripper??? I'd rather rebuild an
index, thanks all the same!!

Regards
HJR

Jonathan Lewis

unread,

Feb 1, 2003, 8:21:53 AM2/1/03

to

Daniel,

I think that's the thread where someone suggested
I was talking rubbish - and where I subsequently
sat down and thought it through and decided that
I was talking rubbish. I think you'll even find my
retraction and explanation further on in that thread.

--
Regards

Jonathan Lewis
http://www.jlcomp.demon.co.uk

Coming soon a new one-day tutorial:
Cost Based Optimisation
(see http://www.jlcomp.demon.co.uk/tutorial.html )

____UK_______March 19th
____USA_(FL)_May 2nd

Next Seminar dates:
(see http://www.jlcomp.demon.co.uk/seminar.html )

____USA_(CA, TX)_August

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html

DA Morgan wrote in message <3E3B0D3F...@exesolutions.com>...

Howard J. Rogers

unread,

Feb 1, 2003, 1:47:03 PM2/1/03

to

Incidentally, Don: I noticed you cut everything off before we got to your
pearls of wisdom about bitmap indexes. So that I can be enlightened:

Pray tell me: where did the magic number 25 come from?

And why was there not the slightest mention of the inadvisability of bitmap
indexes in a DML-heavy environment??

And why did you say that index rebuilds were "easy", when in fact they
require exclusive table locks (even with the 'online' option in 8i and
higher) and a great deal of performance-degrading I/O?

Did you just run out of space in your article, or what?

And since you deem my examples demonstrating deleted index entry re-use to
be too simplistic and not-real-world, can we see your real-world tests
demonstrating that deleted leaf rows *won't* be re-used????

Oh: and one more thing. The re-use of deleted leaf rows I demonstrated did
not require the new entries to be 'adjacent' to the old ones, as you
claimed. The first entry made into a leaf node that contains deleted entries
causes *all* deleted entries for that node to be cleaned out, regardless of
the specific value. An entry of 'BOLTON' would have cleared out all my 'A%'
deleted entries, because all of them were in the same block.

HJR

"Don Burleson" <d...@burleson.cc> wrote in message

news:998d28f7.03013...@posting.google.com...

> > "Indexes require rebuilding when deleted leaf nodes appear". I don't
even
> > know what a deleted leaf node is.
>

> Yes, that is very apparent. Despite your delusions of knowledge about

> a "pure" B*Tree, an Oracle index tress cannot be a pure b*tree because
> it is impossible to index 100m rows in only four levels in a b*tree.
> Did you flunk your basic data structures class?
>

> Oracle maintains many sub-keys within each inode. When the number of
> entries exceeds the internal "index_block_contains" threshold
> (normally 50, depending on the key length), the Oracle index splits
> until it reaches the maximum supported level of its superior node. At

> that point, the index "spawns" into another level. I cannot

> comprehend that you think that Oracle will re-level a whole index tree

> in real-time. Think about it!

>
> > Oracle re-uses deleted leaf row space, all on its own.
>

> Yes, Einstein, in your simplistic example Oracle will re-use an
> adjacent deleted node that rarely happens in the real-world. In most
> cases, indexes with significant DML will have large amounts of dead
> leaf nodes.
>

> > Next. " Oracle index nodes are not physically deleted when table rows
are
> > deleted". I would have more confidence in the advice you offer if you
could

> > get the distinction between a node (ie, a block) and a row right.
>
> Yes, I agree, one of us does not know what they are talking about.
>

> > As for this: "Note that Oracle indexes will "spawn" to a fourth level
only
> > in areas of the index where a massive insert has occurred, such that 99%
of
> > the index has three levels, but the index is reported as having four
levels.
> > " ... well, all I can say is: this is complete baloney.
>

> You really should take the time to check your facts before you
> embarrass yourself.
>

> > Oracle's index structures are B*Trees, meaning that they are 'balanced'
trees.
>

Richard Foote

unread,

Feb 1, 2003, 2:59:41 PM2/1/03

to

Hi Don

Comments embedded

"Don Burleson" <d...@burleson.cc> wrote in message

news:998d28f7.03013...@posting.google.com...

> > "Indexes require rebuilding when deleted leaf nodes appear". I don't
even
> > know what a deleted leaf node is.
>

> Yes, that is very apparent. Despite your delusions of knowledge about
> a "pure" B*Tree, an Oracle index tress cannot be a pure b*tree because
> it is impossible to index 100m rows in only four levels in a b*tree.
> Did you flunk your basic data structures class?

Not sure what you mean here. Are you saying that 100m row index can't fit in
a 4 level index ? Well it can, although the block size and index key length
would have a say I would think. Or are you suggesting that Oracle indexes
can only have 4 levels ?

>
> Oracle maintains many sub-keys within each inode. When the number of
> entries exceeds the internal "index_block_contains" threshold
> (normally 50, depending on the key length), the Oracle index splits
> until it reaches the maximum supported level of its superior node. At
> that point, the index "spawns" into another level. I cannot
> comprehend that you think that Oracle will re-level a whole index tree
> in real-time. Think about it!

Sorry Don, this is incorrect. Although there is some overhead to an index
splitting, I've always thought Oracle was rather efficient in the way it
maintained the index balance. Worst possible case (assuming a 3 level index
about to jump to 4 to keep balanced):

- leaf block splits, extra IO (probably) as Oracle grabs the next free block
on the freelist
- if the last 'block' in the index structure then new index entry simply
goes to the newly added block, else 50% of the existing entries go into each
split block and the new value goes where appropriate
- corresponding branch 'block' needs to be updated to add an entry that
points to the newly added block but if full, it too is split, extra IO
(probably) as Oracle grabs the next free block on the freelist
- same as the leaf block, existing entires get redistributed and new branch
pointer is added
- root block needs to be updated to add an entry that points to the newly
add branch block but if full, it too is split, extra IO (probably) as Oracle
grabs the next free block on the freelist
- same as the branch block, existing entires get redistributed and the new
branch pointer is added
- a new block is required for a new root node, extra IO (probably) as Oracle
grabs the next free block on the freelist
- pointers are added to point to the two (now) second level branch nodes

Index rebalance is complete. So in total, we had to split 3 blocks and
redistribute (possibly) their contents and perform 4 additional IOs. Now our
poor little insert (or possibly update) has had to wait for all this to
complete and yes there is some overhead, but such a "total" rebalance would
occur once in a blue moon and isn't "that" huge a one (IMHO).

>
> > Oracle re-uses deleted leaf row space, all on its own.
>

> Yes, Einstein, in your simplistic example Oracle will re-use an
> adjacent deleted node that rarely happens in the real-world. In most
> cases, indexes with significant DML will have large amounts of dead
> leaf nodes.

Now I'm confused. If you mean leaf "rows", then note these are completely
reusable by subsequent inserts. If you mean leaf "blocks" then these are no
longer part of the index structure and except for fast full index scans,
have no effect on the performance of the index. These dead "things" are only
a concern if:

1. The table has "shrunk" and the same volume of rows is not expect to be
reinserted. In which case, a table rebuild as well as an index rebuild is in
order

2. The index has incremental entries and the deleted rows are not going to
be reused by subsequent inserts. This is the scenario when an index rebuild
is most likely to be of use (if some but not many entires remain in a whole
bunch of existing leaf blocks).

>
> > Next. " Oracle index nodes are not physically deleted when table rows
are
> > deleted". I would have more confidence in the advice you offer if you
could

> > get the distinction between a node (ie, a block) and a row right.
>
> Yes, I agree, one of us does not know what they are talking about.
>

> > As for this: "Note that Oracle indexes will "spawn" to a fourth level
only
> > in areas of the index where a massive insert has occurred, such that 99%
of
> > the index has three levels, but the index is reported as having four
levels.
> > " ... well, all I can say is: this is complete baloney.
>

> You really should take the time to check your facts before you
> embarrass yourself.
>

> > Oracle's index structures are B*Trees, meaning that they are 'balanced'
trees.
>

> No genius, look at the DESCTs. If you take the trouble to decompile
> the code and look at a few block dumps, you can see exactly how it
> works.

Hummmm. I would love to see evidence to the contrary because the block dumps
I taken over the years support my understanding on how indexes work.

Create a table, create an index, populate it slowing and dump to see how the
index "grows" and keeps balanced. Then delete rows slowing and reinsert
again and you'll see exactly how Oracle uses it's indexes and reuses the
deleted space. You see "pointers" or slots that get reused and map to the
newly insert index entries.

Report back on your findings ;)

Cheers

Richard

PS. I'm no Einstein but a David Bowie fan who got ripped off today on one
exceedingly bad haircut !!

Message has been deleted

Howard J. Rogers

unread,

Feb 1, 2003, 10:47:11 PM2/1/03

to

"Don Burleson" <d...@burleson.cc> wrote in message

news:998d28f7.03020...@posting.google.com...
> Hi Richard,
>
> At last, a polite response!
>
> Is it possible that the index algorithms have changed since I did that
> article in 1995? Maybe I am obsolete!

You can say that again.

>
> My understanding of Oracle indexing was based on "borrowed" 7.3.3
> source code (donated by a disgruntled ex-Oracle employee), and the
> article that the arrogant, insulting, and self-aggrandizing Howard
> calls "rubbish" was published in back in 1995, when 7.3.4 was just
> released.

You posted in February 2003 that I knew nothing about it, that I was
suffering from "delusions of knowledge", that "one of us doesn't know what
they are talking about", and (here's the classic) "You really should take

the time to check your facts before you
embarrass yourself".

Only now do we discover that your information is 8 years out of date (though
a modicum of testing in that time would have put you straight).

You think it's OK to recommend 8-year-old information that is technically
inaccurate, do you?

It always amuses me to see people whingeing about how "arrogant" and
"insulting" I am when I point out they're wrong.

> Each data block within the index contains "nodes" in the index tree,
> with the bottom nodes (leaf blocks), containing pairs of symbolic keys
> and ROWID values. As an Oracle tree grows (via inserting rows into
> the table), Oracle fills the block, and when the block is full, it
> splits, creating new index nodes (data blocks) to manage the symbolic
> keys within the index. Hence, an Oracle index block may contain
> pointers to other index nodes or ROWID/Symbolic-key pairs.
>
> The number of entries within each index data block is a function of
> two values:
>
> 1 - The length of the symbolic key
> 2 - The blocksize for the index tablespace
>
> Because the blocksize affects the number of keys within each index
> block, it follows that the blocksize will have an effect on the
> structure of the index tree. All else being equal, large 32k
> blocksizes will have more keys, resulting in a flatter index than the
> same index created in a 2k tablespace.

In plain English, big blocks store more stuff than small blocks. Absolutely
true. That nugget of information, at any rate, will still be true in 8
years' time.

>
> According to an article by Christopher Foot
> (www.dbazine.com/foot3.html):
>
> "A bigger block size means more space for key storage in the branch
> nodes of B-tree indexes, which reduces index height and improves the
> performance of indexed queries."
>
> (Oh dear, Mr. Foot used the "node" word. I'll bet Howard will be
> really pissed . . . .)

Not at all. He used the term correctly. An index is comprised of a root
node, branch nodes and leaf nodes, within which are stored leaf entries. Mr
Foot is referring to branch nodes perfectly properly: Unlike you, he hasn't
referred to the entries themselves as nodes, but to the blocks within which
they are stored.

> In any case, there appears to be evidence that block size affects the
> tree structure,

That much we've known for years.

>which supports the argument that data blocks (yes
> "nodes"),

So, you finally acknowledge the true terminology. Progress, I suppose. Now,
if you could just also acknowledge that your 8-year-old article is not a
technically-accurate description of index behaviour in 8.0, 8i, 9iR1 and
9iR2, we might be on to something.

>affect the structure of the tree.

HJR

Sybrand Bakker

unread,

Feb 1, 2003, 11:14:51 PM2/1/03

to

Comments embedded

On 1 Feb 2003 14:14:32 -0800, d...@burleson.cc (Don Burleson) wrote:

>Hi Richard,
>
>At last, a polite response!
>
>Is it possible that the index algorithms have changed since I did that
>article in 1995? Maybe I am obsolete!

I don't think they have changed.

>
>My understanding of Oracle indexing was based on "borrowed" 7.3.3
>source code (donated by a disgruntled ex-Oracle employee), and the
>article that the arrogant, insulting, and self-aggrandizing Howard
>calls "rubbish" was published in back in 1995, when 7.3.4 was just
>released.

So your article may well not turn up as the second reference in Google
and too many posters here don't search Google before posting.

>
>At that time, (and I have the source code filed away somewhere) Oracle
>index nodes were organized by data blocks, where each data block was
>an index node.

I don't think this has ever been true. An index leaf block will
always contain multiple leaf rows. You seem to imply each data block
has only one index node, whatever that may be.

That is why you cannot specify PCTUSED for an index,
>because Oracle must govern the freelist re-link during splitting and
>spawning operations.
>
>Please correct me if any of the following is incorrect:

>
>Each data block within the index contains "nodes" in the index tree,
>with the bottom nodes (leaf blocks), containing pairs of symbolic keys
>and ROWID values. As an Oracle tree grows (via inserting rows into
>the table), Oracle fills the block, and when the block is full, it
>splits, creating new index nodes (data blocks) to manage the symbolic
>keys within the index. Hence, an Oracle index block may contain
>pointers to other index nodes or ROWID/Symbolic-key pairs.

If this is a truly B*tree algorithm, it will split the index bucket,
turning it into a non-leaf bucket, creating two new buckets, and
balance the tree. An index block is either contains leaf rows or
branch rows, by virtue of the (generally available) algorithms, it can
*NEVER* contain a MIXTURE of leaf and non-leaf rows.
If you look at the index_stats results after an ANALYZE INDEX
that is EXACTLY what you are looking at.

>
>The number of entries within each index data block is a function of
>two values:
>
>1 – The length of the symbolic key
>2 – The blocksize for the index tablespace
>

AND the room for locking info
AND the freelist to mention only a few things.

>Because the blocksize affects the number of keys within each index
>block, it follows that the blocksize will have an effect on the
>structure of the index tree. All else being equal, large 32k
>blocksizes will have more keys, resulting in a flatter index than the
>same index created in a 2k tablespace.
>

>According to an article by Christopher Foot
>(www.dbazine.com/foot3.html):
>
>"A bigger block size means more space for key storage in the branch
>nodes of B-tree indexes, which reduces index height and improves the
>performance of indexed queries."

I would consider this a myth. The only thing that would improve
performance is a decreased depth of the index. Having observed depth
behavior at various block sizes, I can safely say the depth of an
index will almost always be 3, or 4, and a depth of 2 implies a very
small table.

>
>(Oh dear, Mr. Foot used the "node" word. I'll bet Howard will be
>really pissed . . . .)
>

>In any case, there appears to be evidence that block size affects the

>tree structure, which supports the argument that data blocks (yes
>"nodes"), affect the structure of the tree.

Which evidence? There isn't any, at least not in your response.
The only thing which does hold true is that size of the block header
is more or less fixed, and the net available storage per block is
bigger. My experience however is that this doesn't result in indexes
with a smaller depth.

Regards

Sybrand Bakker, Senior Oracle DBA

To reply remove -verwijderdit from my e-mail address

Richard Foote

unread,

Feb 2, 2003, 7:25:04 AM2/2/03

to

Hi again Don

Comments embedded (warning, a bit long in places ...)

"Don Burleson" <d...@burleson.cc> wrote in message

news:998d28f7.03020...@posting.google.com...

> Hi Richard,
>
> At last, a polite response!
>
> Is it possible that the index algorithms have changed since I did that
> article in 1995? Maybe I am obsolete!

I'm not sure as I was only a baby in nappies back them ;) However, I'm
pretty sure that at least version 7 of Oracle had a similar algorithm.

>
> My understanding of Oracle indexing was based on "borrowed" 7.3.3
> source code (donated by a disgruntled ex-Oracle employee), and the
> article that the arrogant, insulting, and self-aggrandizing Howard
> calls "rubbish" was published in back in 1995, when 7.3.4 was just
> released.
>

> At that time, (and I have the source code filed away somewhere) Oracle
> index nodes were organized by data blocks, where each data block was

> an index node. That is why you cannot specify PCTUSED for an index,

> because Oracle must govern the freelist re-link during splitting and
> spawning operations.

The reason why you can't specify a PCTUSED is that indexes behave quite
differently from (conventional) tables in that if there's space in a node,
Oracle will simply use it, if there isn't then Oracle simply splits the
block. The concept of having a block *within the index structure* not being
available for insert and having to await reaching a PCTUSED mark is
therefore not applicable. In summary, if an index block is currently
allocated to the index structure, it must by definition be off the freelist.

>
> Please correct me if any of the following is incorrect:
>
> Each data block within the index contains "nodes" in the index tree,
> with the bottom nodes (leaf blocks), containing pairs of symbolic keys
> and ROWID values. As an Oracle tree grows (via inserting rows into
> the table), Oracle fills the block, and when the block is full, it
> splits, creating new index nodes (data blocks) to manage the symbolic
> keys within the index. Hence, an Oracle index block may contain
> pointers to other index nodes or ROWID/Symbolic-key pairs.

This is true expect that a particular "node" is either a "leaf node" which
means it contains an range of index entires along with their associated
rowids or is a "branch node" which means it contains a set of range pointers
that point to other branch nodes or leaf nodes within the index structure.
These "pointers" are represented by a "less than" value from which the
required path through the index structure can be derived. Note a particular
block within the index structure can only be a leaf or a branch node.

Like I mentioned earlier, the best way to see all this is via a few block
dumps. Following is an extract from a dump of a "branch" block. I don't want
to necessarily describe the whole structure but the odd comments with a ***
are mine:

Start dump data blocks tsn: 9 file#: 9 minblk 1420 maxblk 1420
buffer tsn: 9 rdba: 0x0240058c (9/1420)
scn: 0x0000.019233ab seq: 0x01 flg: 0x00 tail: 0x33ab0601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Block header dump: 0x0240058c
Object id on Block? Y
seg/obj: 0x79a7 csc: 0x00.192339f itc: 1 flg: E typ: 2 - INDEX ***
index object
brn: 0 bdba: 0x2400589 ver: 0x01
inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0xffff.000.00000000 0x00000000.0000.00 C--- 0 scn
0x0000.0192339f

Branch block dump *** Block dump highlights this block as a branch node
=================
header address 49221708=0x2ef104c
kdxcolev 1
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 69
kdxcofbo 166=0xa6
kdxcofeo 6907=0x1afb
kdxcoavs 6741
kdxbrlmc 37750157=0x240058d
kdxbrsno 0
kdxbrbksz 8060
row#0[8043] dba: 37750158=0x240058e *** first branch pointer
col 0; len 5; (5): 4d 44 53 59 53 *** go this way for
values less than this value (MDSYS)
col 1; len 6; (6): 02 40 04 ba 00 38 *** location of "pointed
to" node (in this case a leaf node)
row#1[8028] dba: 37750159=0x240058f *** second branch pointer, etc ...
col 0; len 3; (3): 4f 44 4d
col 1; len 6; (6): 02 40 04 bf 00 2a
row#2[8007] dba: 37750160=0x2400590
col 0; len 9; (9): 4f 45 4d 5f 5a 49 47 47 59
col 1; len 6; (6): 02 40 04 ac 00 42
row#3[7986] dba: 37750161=0x2400591
col 0; len 9; (9): 4f 45 4d 5f 5a 49 47 47 59
col 1; len 6; (6): 02 40 04 c0 00 2d

And this is an extract of the first leaf node that is being pointed to by
the above branch node...

Start dump data blocks tsn: 9 file#: 9 minblk 1421 maxblk 1421
buffer tsn: 9 rdba: 0x0240058d (9/1421)
scn: 0x0000.019233a1 seq: 0x01 flg: 0x04 tail: 0x33a10601
frmt: 0x02 chkval: 0x1090 type: 0x06=trans data
Block header dump: 0x0240058d
Object id on Block? Y
seg/obj: 0x79a7 csc: 0x00.192339f itc: 2 flg: E typ: 2 - INDEX
brn: 0 bdba: 0x2400589 ver: 0x01
inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc
0x0000.00000000
0x02 0xffff.000.00000000 0x00000000.0000.00 C--- 0 scn
0x0000.0192339f

*** above are the two (minimum) transaction slots

Leaf block dump *** OK, this is a leaf node ...
===============
header address 49221732=0x2ef1064
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 413
kdxcofbo 862=0x35e
kdxcofeo 1680=0x690
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 37750158=0x240058e
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8021] flag: -----, lock: 0 *** first index entry header
(currently unlocked)
col 0; len 5; (5): 42 4f 57 49 45 *** first index entry value
(BOWIE)
col 1; len 6; (6): 02 40 20 60 00 29 *** rowid of first index entry
row#1[8006] flag: -----, lock: 0 *** second index entry, etc ....
col 0; len 5; (5): 42 4f 57 49 45
col 1; len 6; (6): 02 40 20 60 00 2a
row#2[7991] flag: -----, lock: 0
col 0; len 5; (5): 42 4f 57 49 45
col 1; len 6; (6): 02 40 20 60 00 2b
row#3[7976] flag: -----, lock: 0
col 0; len 5; (5): 42 4f 57 49 45
col 1; len 6; (6): 02 40 20 60 00 2c

When you delete an index entry, it looks like this :

SQL> delete insert_bowie where owner = 'BOWIE' and rownum = 1;

1 row deleted.

Execution Plan
----------------------------------------------------------
0 DELETE STATEMENT Optimizer=CHOOSE
1 0 DELETE OF 'INSERT_BOWIE'
2 1 COUNT (STOPKEY)
3 2 TABLE ACCESS (BY INDEX ROWID) OF 'INSERT_BOWIE'
4 3 INDEX (RANGE SCAN) OF 'INSERT_BOWIE_IDX' (NON-UNIQUE
)

SQL> commit;

Commit complete.

Start dump data blocks tsn: 9 file#: 9 minblk 1421 maxblk 1421
buffer tsn: 9 rdba: 0x0240058d (9/1421)
scn: 0x0000.0192ad54 seq: 0x01 flg: 0x00 tail: 0xad540601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Block header dump: 0x0240058d
Object id on Block? Y
seg/obj: 0x79a7 csc: 0x00.192339f itc: 2 flg: E typ: 2 - INDEX
brn: 0 bdba: 0x2400589 ver: 0x01
inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc
0x0000.00000000
0x02 0x000a.01f.0000703f 0x008003ce.022c.2a ---- 1 fsc
0x0011.00000000

*** Note here that the second slot has/had a transaction (which happens to
be the transaction that deleted an index entry)

Leaf block dump
===============
header address 99618916=0x5f01064
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 413
kdxcofbo 862=0x35e
kdxcofeo 1680=0x690
kdxcoavs 818
kdxlespl 0
kdxlende 1
kdxlenxt 37750158=0x240058e
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8021] flag: ---D-, lock: 2 *** Note here that the first entry is/has
been locked by the transaction in transaction slot number 2. Also note that
the entry has been marked as deleted "D".

col 0; len 5; (5): 42 4f 57 49 45 *** This is hence now a "deleted
row" entry
col 1; len 6; (6): 02 40 20 60 00 29
row#1[8006] flag: -----, lock: 0
col 0; len 5; (5): 42 4f 57 49 45
col 1; len 6; (6): 02 40 20 60 00 2a

Now to see how this deleted entry gets reused, one final dump (I promise )

SQL> insert into insert_bowie (owner) values ('AADVARK');

1 row created.

SQL> commit;

Commit complete.

Start dump data blocks tsn: 9 file#: 9 minblk 1421 maxblk 1421
buffer tsn: 9 rdba: 0x0240058d (9/1421)
scn: 0x0000.0192b7b8 seq: 0x01 flg: 0x02 tail: 0xb7b80601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Block header dump: 0x0240058d
Object id on Block? Y
seg/obj: 0x79a7 csc: 0x00.192b7b3 itc: 2 flg: E typ: 2 - INDEX
brn: 0 bdba: 0x2400589 ver: 0x01
inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc
0x0000.00000000
0x02 0x0005.022.0000668b 0x00800a3f.01ff.29 --U- 1 fsc
0x0000.0192b7b8

*** Again note, we have/had a transaction going on in slot 2

Leaf block dump
===============
header address 99618916=0x5f01064
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 413
kdxcofbo 862=0x35e
kdxcofeo 1663=0x67f
kdxcoavs 816
kdxlespl 0
kdxlende 0
kdxlenxt 37750158=0x240058e
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[1663] flag: -----, lock: 2 *** this entry is/has been
locked by a transaction in slot 2
col 0; len 7; (7): 41 41 44 56 41 52 4b *** we now have an index entry for
the value (AADVARK)
col 1; len 6; (6): 02 40 20 61 00 24 *** a search through the whole
dump will find that all trace of the previously deleted index has been
erased.

row#1[8006] flag: -----, lock: 0
col 0; len 5; (5): 42 4f 57 49 45
col 1; len 6; (6): 02 40 20 60 00 2a

>
> The number of entries within each index data block is a function of
> two values:
>
> 1 - The length of the symbolic key
> 2 - The blocksize for the index tablespace
>

So the number of possible entries within an index block is a function of the
total length of the index column(s) values (although note that this can now
be compressed to be made more effiicient) and the size of the index block.

> Because the blocksize affects the number of keys within each index
> block, it follows that the blocksize will have an effect on the
> structure of the index tree. All else being equal, large 32k
> blocksizes will have more keys, resulting in a flatter index than the
> same index created in a 2k tablespace.

By having a larger block size, you can indeed fit more values into less
blocks resulting in less leaf nodes. If you have less leaf nodes, you
therefore don't need as many "pointers" to these nodes and so require less
branch nodes. If you can reduce the total number of branch nodes required,
you thereby reduce the size of the index structure sitting above the leaf
nodes which *might* result in a lower index height. However, because you can
fit so many entries into a particular branch node, and because so many
branch nodes can be pointed to by the root or intermediate branch nodes, you
generally have far fewer branch nodes allocated than the maximum capacity of
the index structure and so an actual reduction in index height is actually
quite rare.

>
> According to an article by Christopher Foot
> (www.dbazine.com/foot3.html):
>
> "A bigger block size means more space for key storage in the branch
> nodes of B-tree indexes, which reduces index height and improves the
> performance of indexed queries."

I can't comment on my almost name sake (don't you just hate it when people
can't spell; fancy dropping off the last "e", it's so crucial to well
balanced name ;) but I would have written the comment above as "a bigger

block size means more space for key storage in the branch nodes of B-tree

indexes, which *might in some particular situations* reduce index height
*but nonetheless will improve the performance of indexed queries which
require a "range scan operation" as fewer leaf nodes now need to be
accessed, potentially reducing IO and associated overheads*". But then
again, I'm not Christopher ...

>
> (Oh dear, Mr. Foot used the "node" word. I'll bet Howard will be
> really pissed . . . .)
>
> In any case, there appears to be evidence that block size affects the
> tree structure, which supports the argument that data blocks (yes
> "nodes"), affect the structure of the tree.

No argument there. Indexes love larger blocks sizes because they can fit
more data into fewer nodes meaning a more efficient structure. Regardless of
whether or not such savings represents a lower index, it will result in
fewer leaf nodes making index range scans more efficient (which is a pretty
popular way to use an index).

I hope this helps to clear a few things up.

Cheers

Richard

Peter J. Holzer

unread,

Feb 2, 2003, 8:32:25 AM2/2/03

to

On 2003-01-31 17:52, Howard J. Rogers <howard...@yahoo.com.au> wrote:
> I don't know whether it's just a myth or not, but Oracle refers to its
> indexes as "B*Tree" structures, not 'b-tree' ones. And that's because the
> 'b' doesn't stand for 'binary', but 'balanced'.

Not quite. B-trees aren't binary, either. They have basically the same
node layout with n-1 keys and n pointers/data items[1], and they are
also always balanced. What they do not have is the distinction between
branch nodes and leaf nodes. Each node can contain both pointers to
other nodes and data.

B+trees push all the data down to the leaf nodes and keep only pointers
to other nodes in the higher level nodes.

B*trees are like b+trees, but use a modified split algorithm to keep the
nodes at least 2/3 filled.

hp

[1] In the case of an index separate from the table, the "data" are
pointers into the table.

Jonathan Lewis

unread,

Feb 2, 2003, 9:55:02 AM2/2/03

to

Of course, this does mean that Oracle doesn't
use B*Trees, as it's manuals have claimed since
at least 5.1 - it uses a version of B+trees.

Correct me if I'm wrong, but aren't B*trees supposed
to redistribute leaf entries to adjacent blocks without
doing a split if certain conditions are met - and this is
something that Oracle never does.

--
Regards

Jonathan Lewis
http://www.jlcomp.demon.co.uk

Coming soon a new one-day tutorial:
Cost Based Optimisation
(see http://www.jlcomp.demon.co.uk/tutorial.html )

____UK_______March 19th
____USA_(FL)_May 2nd

Next Seminar dates:
(see http://www.jlcomp.demon.co.uk/seminar.html )

____USA_(CA, TX)_August

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html

Peter J. Holzer wrote in message ...

Noons

unread,

Feb 2, 2003, 10:08:31 AM2/2/03

to

"Jonathan Lewis" <jona...@jlcomp.demon.co.uk> wrote in
news:b1ipul$gtj$1$8300...@news.demon.co.uk and I quote:

>
> Of course, this does mean that Oracle doesn't
> use B*Trees, as it's manuals have claimed since
> at least 5.1 - it uses a version of B+trees.

Akshally, the whole thing is terminally confusing.
According to Knuth, B+trees and B*trees are quite
different. And Oracle doesn't help with their
"modified algorithm" B+trees thing. It all adds to
the confusion.

I'd bet if someone spends some time going through the
actual index handling source code, it will come out
that the algorithm is neither one nor the other and
is in fact quite unique.

It's one of the few areas where makers can still have some
significant implementation differences, and of course
distinguishing factors. The only reason we don't see the
marketing dingbats pouring all over it is that the whole thing
is arcane, to put it very mildly.

--
Cheers
Nuno Souto
nso...@optusnet.com.au.nospam

sk

unread,

Feb 2, 2003, 8:18:08 PM2/2/03

to

Wow, isn't it great when to big dogs come out to play. Very educational
also.

Thanks to all.

DA Morgan

unread,

Feb 2, 2003, 10:11:10 PM2/2/03

to

Noons wrote:

Sounds to me like someone is going to be doing some block dumps pretty
soon.

Or at least I hope so. It would be great to have a definitive answer.

Daniel Morgan

Noons

unread,

Feb 3, 2003, 11:55:30 AM2/3/03

to

DA Morgan <damo...@exesolutions.com> wrote in

news:3E3D977E...@exesolutions.com and I quote:

>
> Sounds to me like someone is going to be doing some block dumps pretty
> soon.
>
> Or at least I hope so. It would be great to have a definitive answer.
>

I'm afraid just block dumps won't be enough for this one.
Source code would be needed, and fairly recent too.
Maybe one of the "Danish round table" blokes will do it, they
should have access to this.

Thing is: if you look at the definitions of B*tree and
B+tree on the Net as well as on Knuth's books, they
don't exactly match what is described in the Oracle literature,
or what happens inside them old blocks. And then we have
index key compression....

All that makes the Oracle implementation, IMHO, quite
difficult to pigeon-hole.

Richard Foote

unread,

Feb 3, 2003, 1:54:55 PM2/3/03

to

"DA Morgan" <damo...@exesolutions.com> wrote in message

news:3E3D977E...@exesolutions.com...

>
> Sounds to me like someone is going to be doing some block dumps pretty
> soon.
>
> Or at least I hope so. It would be great to have a definitive answer.
>

Hi Daniel,

*** WARNING WARNING WARNING ***

*** BLOCK DUMPS BLOCK DUMPS BLOCK DUMPS ***

The following is a very simple but hopefully illustrative demonstration of
how Oracle maintains it's indexes. It's not however meant to be a
"definitive answer".

So long as one follows the basic principles that once a leaf node is full,
it needs to "split" and by doing so the branch node that points to it
requires a new entry and if it's full, it needs to split etc., this will
show how Oracle puts it all together.

OK, firstly, lets build a simple table and associated index:

SQL> create table bowie_stuff (id number, album_title varchar(30));

Table created.

SQL> create index bowie_stuff_idx on bowie_stuff(album_title);

Index created.

Now let's populate with some data and get the index structure building
along. Note the values I'm using, they'll come in handy later.

SQL> insert into bowie_stuff values (1, 'David Bowie');

1 row created.

SQL> insert into bowie_stuff values (2, 'The Man Who Sold The World');

1 row created.

SQL> insert into bowie_stuff values (3, 'Hunky Dory');

1 row created.

SQL> insert into bowie_stuff values (4, 'Ziggy Stardust');

1 row created.

SQL> insert into bowie_stuff values (5, 'Aladdin Sane');

1 row created.

SQL> insert into bowie_stuff values (6, 'Pin Ups');

1 row created.

SQL> insert into bowie_stuff values (7, 'Diamond Dogs');

1 row created.

SQL> insert into bowie_stuff values (8, 'Young Americans');

1 row created.

SQL> insert into bowie_stuff values (9, 'Station To Station');

1 row created.

SQL> insert into bowie_stuff values (10, 'Low');

1 row created.

SQL> commit;

Commit complete.

SQL> insert into bowie_stuff select * from bowie_stuff;

10 rows created.

SQL> /

20 rows created.

SQL> /

40 rows created.

SQL> /

80 rows created.

SQL> /

160 rows created.

SQL> /

320 rows created.

SQL> commit;

Commit complete.

Having played with this table structure for many years now, I know I'm close
to having a "full" index structure. Let's see the current state of affairs:

SQL> analyze index bowie_stuff_idx validate structure;

Index analyzed.

SQL> select height, lf_blks, br_blks from index_stats;

HEIGHT LF_BLKS BR_BLKS
---------- ---------- ----------
2 2 1

We currently have a nice simple index structure with one branch (root) node
pointing to two leaf nodes.

I now check to see which blocks I need to dump.

SQL> select file_id, block_id, blocks from dba_extents where segment_name =
'BOW
IE_STUFF_IDX';

FILE_ID BLOCK_ID BLOCKS
---------- ---------- ----------
9 1513 8

What I'll do now is take a before and after dump of the branch and leaf node
block while inserting a new row. For the record, three further inserts did
the trick and caused the first leaf node to "split".

SQL> insert into bowie_stuff values (7, 'Diamond Dogs');

1 row created.

SQL> commit;

Commit complete.

So now I've "captured" what Oracle has done in order to maintain it's index
structure after a node split.

First of all, lets look at an extract of the branch node, just *before* the
leaf node split. My comments are preceded with a *** :

Start dump data blocks tsn: 9 file#: 9 minblk 1516 maxblk 1516
buffer tsn: 9 rdba: 0x024005ec (9/1516)
scn: 0x0000.01974320 seq: 0x02 flg: 0x02 tail: 0x43200602

frmt: 0x02 chkval: 0x0000 type: 0x06=trans data

Block header dump: 0x024005ec

Object id on Block? Y

seg/obj: 0x79ab csc: 0x00.197431c itc: 1 flg: E typ: 2 - INDEX
brn: 0 bdba: 0x24005e9 ver: 0x01
inc: 0 exflg: 0

*** above portion is the block header which includes the last SCN, rdba,
object type (2 index), etc. ***

Itl Xid Uba Flag Lck Scn/Fsc

0x01 0x0001.00b.00006e99 0x00800888.02ca.01 -BU- 1 fsc
0x0000.01974320

*** above is the transaction slot (used by Oracle to maintain branch node
structure) ***

Branch block dump
=================
header address 99815500=0x5f3104c

kdxcolev 1
KDXCOLEV Flags = - - -

kdxcolok 1

kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 2

kdxcosdc 1
kdxconro 1
kdxcofbo 30=0x1e
kdxcofeo 8041=0x1f69
kdxcoavs 8011
kdxbrlmc 37750255=0x24005ef *** An error with my previous post to Don
(which Jonathan Lewis kindly pointed out) is that this actually marks the
first "pointer" to the leaf nodes. The logic being that values > null but
less than the next pointer go to this leaf node. So 0x24005ef marks the
first rdba of the first leaf node ***
kdxbrsno 0
kdxbrbksz 8060

*** above section is the index (branch) header details, more on this later

row#0[8041] dba: 37750256=0x24005f0 *** This therefore marks the pointer
to the second leaf node in the index, all values greater than or equal to
col 0 belong in this leaf node. As there are only currently 2 leaf nodes in
this index, that's about it for now ***
col 0; len 7; (7): 50 69 6e 20 55 70 73 *** values greater than or equal
to PIN UPS go here ***
col 1; len 6; (6): 02 40 05 e8 00 eb
----- end of branch block dump -----
End dump data blocks tsn: 9 file#: 9 minblk 1516 maxblk 1516

Next, let's look at the poor index block in "desperate need of a bucket",
ie. she's about to blow (split) ;)

Start dump data blocks tsn: 9 file#: 9 minblk 1519 maxblk 1519
buffer tsn: 9 rdba: 0x024005ef (9/1519)
scn: 0x0000.01975306 seq: 0x01 flg: 0x02 tail: 0x53060601

frmt: 0x02 chkval: 0x0000 type: 0x06=trans data

Block header dump: 0x024005ef

Object id on Block? Y

seg/obj: 0x79ab csc: 0x00.1975305 itc: 2 flg: E typ: 2 - INDEX
brn: 0 bdba: 0x24005e9 ver: 0x01
inc: 0 exflg: 0

*** above, block header details. Note that I can confirm this is the "first"
index node by the rdba list above which we noted was the first "pointed" to
block in the branch node ***

Itl Xid Uba Flag Lck Scn/Fsc

0x01 0x0001.02c.00006e97 0x00800489.02cb.04 C--- 0 scn
0x0000.01974321
0x02 0x0008.029.00006d59 0x0080025d.0264.33 --U- 1 fsc
0x0000.01975306

*** Above, the index transaction slots ***

Leaf block dump
===============
header address 99815524=0x5f31064

kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 2
kdxcosdc 0

kdxconro 376
kdxcofbo 788=0x314
kdxcofeo 802=0x322
kdxcoavs 14
kdxlespl 0
kdxlende 0
kdxlenxt 37750256=0x24005f0 *** This points to the "next" leaf node in the
index structure. Oracle uses this when performing index range scans ***
kdxleprv 0=0x0 *** As this is the first leaf node, there is no pointer
the other way, that is, there is no previous leaf node (again used by Oracle
when performing range scans, especially when avoiding a DESC sort) ***
kdxledsz 0
kdxlebksz 8036
row#0[8014] flag: -----, lock: 0 *** Header for the first index entry ***
col 0; len 12; (12): 41 6c 61 64 64 69 6e 20 53 61 6e 65 *** This represent
the first index entry (ALADDIN SANE) ***
col 1; len 6; (6): 02 40 05 e4 00 02 *** and it's associated row id ***
row#1[7992] flag: -----, lock: 0 *** and this is the second index entry,
and so on ... ***
col 0; len 12; (12): 41 6c 61 64 64 69 6e 20 53 61 6e 65
col 1; len 6; (6): 02 40 05 e4 00 0c
row#2[7970] flag: -----, lock: 0
col 0; len 12; (12): 41 6c 61 64 64 69 6e 20 53 61 6e 65
col 1; len 6; (6): 02 40 05 e4 00 16
.
. <snip>
.
row#374[841] flag: -----, lock: 0
col 0; len 7; (7): 50 69 6e 20 55 70 73
col 1; len 6; (6): 02 40 05 e8 00 d7
row#375[824] flag: -----, lock: 0 *** last index entry in this leaf node
***
col 0; len 7; (7): 50 69 6e 20 55 70 73
col 1; len 6; (6): 02 40 05 e8 00 e1
----- end of leaf block dump -----
End dump data blocks tsn: 9 file#: 9 minblk 1519 maxblk 1519

Now let's see how Oracle has "documented" the additional leaf node in the
branch node:

Start dump data blocks tsn: 9 file#: 9 minblk 1516 maxblk 1516
buffer tsn: 9 rdba: 0x024005ec (9/1516)
scn: 0x0000.01975359 seq: 0x01 flg: 0x02 tail: 0x53590601

frmt: 0x02 chkval: 0x0000 type: 0x06=trans data

Block header dump: 0x024005ec

Object id on Block? Y

seg/obj: 0x79ab csc: 0x00.1975358 itc: 1 flg: E typ: 2 - INDEX
brn: 0 bdba: 0x24005e9 ver: 0x01
inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc

0x01 0x0004.01e.00006ed7 0x00800a31.024b.03 --U- 1 fsc
0x0000.01975359

Branch block dump
=================
header address 99618892=0x5f0104c

kdxcolev 1
KDXCOLEV Flags = - - -

kdxcolok 1
kdxcoopc 0x81: opcode=1: iot flags=--- is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 2
kdxcofbo 32=0x20
kdxcofeo 8017=0x1f51
kdxcoavs 7985
kdxbrlmc 37750255=0x24005ef *** The first leaf pointer remains unchanged,
values greater than null go here ***
kdxbrsno 0
kdxbrbksz 8060
row#0[8017] dba: 37750253=0x24005ed *** But we now have a new leaf pointer
inserted between the previous two. This points to the new leaf node that has
been added as a result of the leaf node split ***
col 0; len 12; (12): 44 69 61 6d 6f 6e 64 20 44 6f 67 73 *** values greater
or equal to DIAMOND DOGS go here ***
col 1; len 6; (6): 02 40 05 e8 00 88
row#1[8041] dba: 37750256=0x24005f0 *** and this entry now points to the
third, or last leaf node ***
col 0; len 7; (7): 50 69 6e 20 55 70 73 *** values greater or equal to PIN
UPS go here ***
col 1; len 6; (6): 02 40 05 e8 00 eb
----- end of branch block dump -----
End dump data blocks tsn: 9 file#: 9 minblk 1516 maxblk 1516

So Oracle has simply added in an appropriate pointer for the new leaf node
and values from which are to go into it.

Let's see how our poor "splitted" leaf node looks like:

Start dump data blocks tsn: 9 file#: 9 minblk 1519 maxblk 1519
buffer tsn: 9 rdba: 0x024005ef (9/1519)
scn: 0x0000.0197535a seq: 0x01 flg: 0x02 tail: 0x535a0601

frmt: 0x02 chkval: 0x0000 type: 0x06=trans data

Block header dump: 0x024005ef

Object id on Block? Y

seg/obj: 0x79ab csc: 0x00.1975359 itc: 2 flg: E typ: 2 - INDEX
brn: 0 bdba: 0x24005e9 ver: 0x01
inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc

0x01 0x0004.01e.00006ed7 0x00800a30.024b.01 CB-- 0 scn
0x0000.01975359
0x02 0x0004.02c.00006ed7 0x00800a2e.024b.3c --U- 1 fsc
0x0000.0197535a

Leaf block dump
===============
header address 99815524=0x5f31064

kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 2

kdxcosdc 1
kdxconro 175
kdxcofbo 386=0x182
kdxcofeo 4250=0x109a
kdxcoavs 3864
kdxlespl 0
kdxlende 0
kdxlenxt 37750253=0x24005ed *** note that this has now changed and points to
the newly added leaf node as it logically is the next node in the index
structure ***
kdxleprv 0=0x0 *** still no previous pointer as this node is still the
first leaf node in the index structure ***
kdxledsz 0
kdxlebksz 8036
row#0[4272] flag: -----, lock: 0
col 0; len 12; (12): 41 6c 61 64 64 69 6e 20 53 61 6e 65
col 1; len 6; (6): 02 40 05 e4 00 02
row#1[4294] flag: -----, lock: 0
col 0; len 12; (12): 41 6c 61 64 64 69 6e 20 53 61 6e 65
col 1; len 6; (6): 02 40 05 e4 00 0c
row#2[4316] flag: -----, lock: 0
col 0; len 12; (12): 41 6c 61 64 64 69 6e 20 53 61 6e 65
col 1; len 6; (6): 02 40 05 e4 00 16
.
. <snip>
.
row#173[7992] flag: -----, lock: 0
col 0; len 12; (12): 44 69 61 6d 6f 6e 64 20 44 6f 67 73
col 1; len 6; (6): 02 40 05 e8 00 74
row#174[8014] flag: -----, lock: 0 *** we now only have 175 leaf entries
in this leaf node (when previously we had 376. The other index entries have
been "moved" to the new leaf node. Now wait a minute, 376 divided by 2 is
not 175, what about the 50/50 split. Well the 50/50 split is based on
"volume", not number of index entries. Both split nodes are approximately
50% full, this one happens to have larger index values on average***
col 0; len 12; (12): 44 69 61 6d 6f 6e 64 20 44 6f 67 73
col 1; len 6; (6): 02 40 05 e8 00 7e
----- end of leaf block dump -----
End dump data blocks tsn: 9 file#: 9 minblk 1519 maxblk 1519

And finally, an extract of the new leaf node dump and how it "slots" into
the index structure:

Start dump data blocks tsn: 9 file#: 9 minblk 1517 maxblk 1517
buffer tsn: 9 rdba: 0x024005ed (9/1517)
scn: 0x0000.0197535b seq: 0x01 flg: 0x00 tail: 0x535b0601

frmt: 0x02 chkval: 0x0000 type: 0x06=trans data

Block header dump: 0x024005ed

Object id on Block? Y

seg/obj: 0x79ab csc: 0x00.197535b itc: 2 flg: E typ: 2 - INDEX
brn: 0 bdba: 0x24005e9 ver: 0x01
inc: 0 exflg: 0

*** note above in the block header that this indeed is the new index leaf
node as pointed to in the branch node ***

Itl Xid Uba Flag Lck Scn/Fsc

0x01 0x0004.01e.00006ed7 0x00800a31.024b.01 CB-- 0 scn
0x0000.01975359
0x02 0x0008.029.00006d59 0x0080025d.0264.33 C--- 0 scn
0x0000.01975306

Leaf block dump
===============
header address 99618916=0x5f01064
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 2
kdxcosdc 0

kdxconro 202
kdxcofbo 440=0x1b8
kdxcofeo 4566=0x11d6
kdxcoavs 4126
kdxlespl 0
kdxlende 0
kdxlenxt 37750256=0x24005f0 *** note that the "next" leaf node is the third
node as pointed to in the branch node ***
kdxleprv 37750255=0x24005ef *** and the previous node is the first node we
previously looked at ***
kdxledsz 0
kdxlebksz 8036
row#0[4566] flag: -----, lock: 0
col 0; len 12; (12): 44 69 61 6d 6f 6e 64 20 44 6f 67 73 *** and as
recorded in the branch node, entires in this now second leaf node start with
DIAMOND DOGS values ***
col 1; len 6; (6): 02 40 05 e8 00 88
row#1[4588] flag: -----, lock: 0
col 0; len 12; (12): 44 69 61 6d 6f 6e 64 20 44 6f 67 73
col 1; len 6; (6): 02 40 05 e8 00 92
row#2[4610] flag: -----, lock: 0
col 0; len 12; (12): 44 69 61 6d 6f 6e 64 20 44 6f 67 73
col 1; len 6; (6): 02 40 05 e8 00 9c
.
. <snip>
.
row#200[8002] flag: -----, lock: 0
col 0; len 7; (7): 50 69 6e 20 55 70 73
col 1; len 6; (6): 02 40 05 e8 00 d7
row#201[8019] flag: -----, lock: 0 *** and the other 202 index entires
(including the new one inserted that caused the leaf node split) are all
found in this new leaf node ***
col 0; len 7; (7): 50 69 6e 20 55 70 73
col 1; len 6; (6): 02 40 05 e8 00 e1
----- end of leaf block dump -----
End dump data blocks tsn: 9 file#: 9 minblk 1517 maxblk 1517

So what does all this show ?

Well hopefully anyone that may have been a bit unsure how indexes hang
together, especially when a block split occurs, will be a bit more cluey on
what's going on.

Cheers

Richard

Peter J. Holzer

unread,

Feb 3, 2003, 2:19:48 PM2/3/03

to

On 2003-02-03 13:54, Richard Foote <richar...@bigpond.com> wrote:
> *** WARNING WARNING WARNING ***
>
> *** BLOCK DUMPS BLOCK DUMPS BLOCK DUMPS ***
>
> The following is a very simple but hopefully illustrative demonstration of
> how Oracle maintains it's indexes. It's not however meant to be a
> "definitive answer".

Illustrative it was, thanks. But if I read a few more of your articles
I'll start to think of Bowie as "the guy with the hexadecimal album
titles" :-)

hp

DA Morgan

unread,

Feb 3, 2003, 4:37:01 PM2/3/03

to

Richard Foote wrote:

Thanks. Thanks. Thanks.

So when are you publishing "The Definitive Book On Oracle Indexes"?

Daniel Morgan

Brian Peasland

unread,

Feb 3, 2003, 4:28:52 PM2/3/03

to

One other point is that Oracle handles removing entries from leaf nodes
differently than "theory" holds. On an entry removal from a leaf node,
the theory holds that the tree would be modified in some way. But we
know this isn't true in Oracle index trees.

Cheers,
Brian

Jonathan Lewis wrote:
>
> There are two reasons why I used to think this.
> First - the term 'rebalancing' misdirects the intuition,
> and second the manuals (once upon a time, perhaps)
> implied that the target of the B*tree mechanism was
> to ensure that no leaf was more than one layer deeper
> than any other leaf.

>
> The mechanism on inserts is simple and recursive:
> If you need to insert an index entry and the leaf
> is full, then it splits. But if it splits, the branch block
> above it has to have an extra entry (pointing to
> the value that starts the new leaf block) - so Oracle
> has to insert an entry into the branch block.
>
> But if the branch block is full, it has to split - which
> means the branch block above it has to have a
> new entry - but if that block is full ..... and so on
> until the block above is the root block.
>
> So what do you do if the root block is full ?
> Split it in two, and create a new root block
> above it with just two entries - at which point
> EVERY leaf block has changed height.
>

> --
> Regards
>
> Jonathan Lewis
> http://www.jlcomp.demon.co.uk
>
> Coming soon a new one-day tutorial:
> Cost Based Optimisation
> (see http://www.jlcomp.demon.co.uk/tutorial.html )
>
> ____UK_______March 19th
> ____USA_(FL)_May 2nd
>
> Next Seminar dates:
> (see http://www.jlcomp.demon.co.uk/seminar.html )
>
> ____USA_(CA, TX)_August
>
> The Co-operative Oracle Users' FAQ
> http://www.jlcomp.demon.co.uk/faq/ind_faq.html
>

> DA Morgan wrote in message <3E3AAC94...@exesolutions.com>...

> >
> >I never thought I'd ever try to correct you but I believe (sticking
> it

> >way out there on ths one) it is possible for there to be four on one
> and

Pete Sharman

unread,

Feb 3, 2003, 4:46:59 PM2/3/03

to

In article <Xns9317E6CDF5A...@210.49.20.254>, Noons says...

>
>DA Morgan <damo...@exesolutions.com> wrote in
>news:3E3D977E...@exesolutions.com and I quote:
>
>>
>> Sounds to me like someone is going to be doing some block dumps pretty
>> soon.
>>
>> Or at least I hope so. It would be great to have a definitive answer.
>>
>
>I'm afraid just block dumps won't be enough for this one.
>Source code would be needed, and fairly recent too.
>Maybe one of the "Danish round table" blokes will do it, they
>should have access to this.

OK, we have to put a stop to this immediately. The thought of Mogens Norgaard
running around like King Arthur (which is where this analogy would get very
quickly) just ain't pretty at all. :)

In any case, there are only a couple of people on the OakTable that I know of
with access to the source code, and it's more than their job is worth to post
details of it (or how it works) here. Sorry.

>
>Thing is: if you look at the definitions of B*tree and
>B+tree on the Net as well as on Knuth's books, they
>don't exactly match what is described in the Oracle literature,
>or what happens inside them old blocks. And then we have
>index key compression....
>
>All that makes the Oracle implementation, IMHO, quite
>difficult to pigeon-hole.

Totally agree. I'd even take it one step further and ask does it really matter
anyway (apart from being interesting theoretical stuff of course!)? Still, I
must say the "discussion" between Don and HJR has put a smile on my face each
time I read a new bit!

Pete

>
>
>--
>Cheers
>Nuno Souto
>nso...@optusnet.com.au.nospam

HTH. Additions and corrections welcome.

Pete

SELECT standard_disclaimer, witty_remark FROM company_requirements;

ctc...@hotmail.com

unread,

Feb 3, 2003, 9:35:22 PM2/3/03

to

"Richard Foote" <richar...@bigpond.com> wrote:

> Now let's see how Oracle has "documented" the additional leaf node in the
> branch node:

...

> row#0[8017] dba: 37750253=0x24005ed *** But we now have a new leaf
> pointer inserted between the previous two. This points to the new leaf
> node that has been added as a result of the leaf node split ***
> col 0; len 12; (12): 44 69 61 6d 6f 6e 64 20 44 6f 67 73 *** values
> greater or equal to DIAMOND DOGS go here ***
> col 1; len 6; (6): 02 40 05 e8 00 88

Richards wonderfull explanation prompted me to ponder a question. "Diamond
Dogs" was embedded in the branch block in it's entirety. I thought I had
read somewhere that only the minimum distinguishing length was stuck in
branch blocks, so I was wondering if that was true. Surely all twelve
characters aren't needed to distinguish "Diamond Dogs" from something else.
But then I realized that the index is very non-unique. And indeed, if you
look at the last row of the next block dump below, and first row of the one
after that, you see "Diamond Dogs" appears on both sides of the split.

So it appears that when Oracle finds an equal key in a branch node, it has
to follow both branches (or more likely, follow one branch and then
use the linked list to get where the other branch was pointing to.)

To attempt to distinguish Diamond Dogs from Diamond Dogs, of course it
takes all 12 letters. But if the split had happened to fall between David
Bowie and Diamond Dogs, would the branch block have recorded "Diamond
Dogs", or would it just use "Di" as being sufficiently distinctive?

...

> Let's see how our poor "splitted" leaf node looks like:

> row#173[7992] flag: -----, lock: 0
> col 0; len 12; (12): 44 69 61 6d 6f 6e 64 20 44 6f 67 73
> col 1; len 6; (6): 02 40 05 e8 00 74

Diamond Dogs here, too.

> And finally, an extract of the new leaf node dump and how it "slots" into
> the index structure:

...

> row#0[4566] flag: -----, lock: 0
> col 0; len 12; (12): 44 69 61 6d 6f 6e 64 20 44 6f 67 73 *** and as
> recorded in the branch node, entires in this now second leaf node start
> with DIAMOND DOGS values ***

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service New Rate! $9.95/Month 50GB

DA Morgan

unread,

Feb 3, 2003, 10:23:14 PM2/3/03

to

Pete Sharman wrote:

Us academics hang on every pointed jab. <g>

Daniel Morgan

Howard J. Rogers

unread,

Feb 4, 2003, 8:47:24 AM2/4/03

to

"DA Morgan" <damo...@exesolutions.com> wrote in message
news:3E3EEBD1...@exesolutions.com...

> Pete Sharman wrote:
> >
> > Totally agree. I'd even take it one step further and ask does it really
matter
> > anyway (apart from being interesting theoretical stuff of course!)?
Still, I
> > must say the "discussion" between Don and HJR has put a smile on my face
each
> > time I read a new bit!
> >
> > Pete

Not much of a "discussion". He references an 8-year old paper, I point out
its inaccuracies, he gets personal about it, Richard Foote chips in to point
out he's wrong, Don Burleson goes mysteriously quiet.

> Us academics hang on every pointed jab. <g>
>
> Daniel Morgan

I'd actually be sad if it came to that. Don wrote a lot of inaccurate stuff,
and I pointed it out, albeit 'pungently'. He immediately got down and dirty
with the personal stuff. There were no personalised pointed jabs coming from
this direction. It was his writing that was woefully naiive and inaccurate
and out-of-date (and this isn't the only article of his that suffers in that
way: you should read his stuff on hit ratios).

Still, I've a mind to get my hands on some 8-year-old dodgily-obtained
source code, and go into the consultancy business. It would appear to be a
sure-fire route to success.

There: I've got personal now!

Regards
HJR

Howard J. Rogers

unread,

Feb 4, 2003, 8:49:00 AM2/4/03

to

Oracle does branch-block trimming wherever possible, so your speculation is
accurate.

Regards
HJR

Howard J. Rogers

unread,

Feb 4, 2003, 8:49:40 AM2/4/03

to

Yet another example of why block dumps are an essential tool. Nice one,
Richard.
HJR

Noons

unread,

Feb 4, 2003, 12:20:34 PM2/4/03

to

Pete Sharman <peter....@oracle.com> wrote in
news:b1m6e...@drn.newsguy.com and I quote:

> OK, we have to put a stop to this immediately. The thought of Mogens
> Norgaard running around like King Arthur (which is where this analogy
> would get very quickly) just ain't pretty at all. :)

Gawd! The imagery!...... :)

> In any case, there are only a couple of people on the OakTable that I
> know of with access to the source code, and it's more than their job is
> worth to post details of it (or how it works) here. Sorry.

Thought so.

> Totally agree. I'd even take it one step further and ask does it really
> matter anyway (apart from being interesting theoretical stuff of
> course!)? Still, I must say the "discussion" between Don and HJR has
> put a smile on my face each time I read a new bit!
>

IMHO, it doesn't. We're getting into the sort of 10% of extra
information that consumes 90% of our spare time. Diminishihg
returns, yadda yadda. Not worth it unless there is a genuine
interest in extracting the last bit of de-assembly via block
dumps.

Besides, the Oracle dumps are not true dumps: they are
interpreted. Which means we see off a block what Oracle
wants us to see, period. Not exactly the same as a good
ole A-bend hex dump...

Mind you, for my purposes, I'd rather have the Oracle dumps
than the hex stuff, but there are tastes for everything!

Has anyone here tried to dump the first extent off a LMT,
table with just one block of data, 64K uniform alloc, 4096
block size? Ie, 16 blocks/extent? Now, that was a surprise
for me: Oracle takes an apparently weird choice as to which
block gets data first! Weird, I tell you. :)

Now do the same for an index on that same table, same tablespace.
One block of index only, before it splits. Weirder.

9r2, of course.

Noons

unread,

Feb 4, 2003, 12:35:00 PM2/4/03

to

"Howard J. Rogers" <howard...@yahoo.com.au> wrote in
news:VZK%9.40107$jM5.1...@newsfeeds.bigpond.com and I quote:

>
> Still, I've a mind to get my hands on some 8-year-old dodgily-obtained
> source code, and go into the consultancy business. It would appear to be
> a sure-fire route to success.

I think he does mostly books. Didn't know it was consultancy
as well?

Mind you, this points out the eternal duality:

On one hand, one may know the theory behind the product
and then 8 years later the knowledge is still very actual,
although nowhere as detailed down to the "dirty" bits.

On the other, one knows all the dirty details and bits
of one single version and 8 years later it's still partly
correct but wide-open to "bombing" by the next version-aware
person.

I know which one I'd go for. And it has a LOT to do with
why I hate the OCP in its current format...

> There: I've got personal now!

C'mon! You're not such a bad person! ;)

Niall Litchfield

unread,

Feb 4, 2003, 1:02:46 PM2/4/03

to

"Noons" <nso...@optusnet.com.au.nospam> wrote in message
news:Xns9318EB0C520...@210.49.20.254...

> Pete Sharman <peter....@oracle.com> wrote in
> news:b1m6e...@drn.newsguy.com and I quote:

> Has anyone here tried to dump the first extent off a LMT,
> table with just one block of data, 64K uniform alloc, 4096
> block size? Ie, 16 blocks/extent? Now, that was a surprise
> for me: Oracle takes an apparently weird choice as to which
> block gets data first! Weird, I tell you. :)

Take a look at http://www.oaktable.net/fullArticle.jsp?id=4 which I think
deals with what you are on about about halfway down.

--
Niall Litchfield
Oracle DBA
Audit Commission UK

Jonathan Lewis

unread,

Feb 4, 2003, 1:20:56 PM2/4/03

to

Not entirely weird.

It sounds like you have gone in with the
semi-default 'ASS management' feature,
which is not a feature of the LMT as such,
but a feature to replace freelist management.

Each segment has blocks reserved to hold
maps of used blocks, with an indication of
space available per block. Oracle then uses
the process ID to decide which space-management
block you should use to find some space, and which
block referenced in the space management block you
should first try. Consequently, a single row in a
newly created 16block table COULD end up in the
last block of the table.

Try the experiment again, but run up several sessions,
keep them open, and repeat the experiment in each
session in turn. I think in your case you will see
either the first 8 or second 8 blocks formatted (if you
don't get something so weird as an unformatted gap,
go for a 32 block table), and the block used then
varying from session to session.

In the case of the index - the root block is traditionally
the block after the segment header block - so I think a
special case is followed to format that block when
the index is created, whatever the process id of the
session that creates the index.

--
Regards

Jonathan Lewis
http://www.jlcomp.demon.co.uk

Coming soon a new one-day tutorial:
Cost Based Optimisation
(see http://www.jlcomp.demon.co.uk/tutorial.html )

____UK_______March 19th
____USA_(FL)_May 2nd

Next Seminar dates:
(see http://www.jlcomp.demon.co.uk/seminar.html )

____USA_(CA, TX)_August

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html

Noons wrote in message ...

Paul Brewer

unread,

Feb 4, 2003, 6:29:45 PM2/4/03

to

"Peter J. Holzer" <hjp-u...@hjp.at> wrote in message
news:slrnb3suk4.a...@teal.hjp.at...

> Illustrative it was, thanks. But if I read a few more of your articles
> I'll start to think of Bowie as "the guy with the hexadecimal album
> titles" :-)
>

Bihexual, I rather gathered?

Regards,
Paul

Paul Brewer

unread,

Feb 4, 2003, 7:33:44 PM2/4/03

to

"DA Morgan" <damo...@exesolutions.com> wrote in message
news:3E3EEBD1...@exesolutions.com...

>
> Us academics hang on every pointed jab. <g>
>

Can't resist, sorry:

Is that "U.S. academics", or should it be "We academics"?

:-)

Regards,
Paul

Pete Sharman

unread,

Feb 4, 2003, 9:59:40 PM2/4/03

to

In article <3e402...@mk-nntp-1.news.uk.worldonline.com>, "Paul says...

And nor could I. He must have meant "we academics" because "U.S. academics" is
an oxymoron. (ducks for cover!) :)

Noons

unread,

Feb 5, 2003, 11:37:54 AM2/5/03

to

"Niall Litchfield" <n-litc...@audit-commission.gov.uk> wrote in
news:3e3fb9f6$0$248$ed9e...@reading.news.pipex.net and I quote:

>
> Take a look at http://www.oaktable.net/fullArticle.jsp?id=4 which I think
> deals with what you are on about about halfway down.

No, not that. Jonathan has it in his reply.

Noons

unread,

Feb 5, 2003, 11:49:57 AM2/5/03

to

"Jonathan Lewis" <jona...@jlcomp.demon.co.uk> wrote in

news:b1oene$qt0$1$8302...@news.demon.co.uk and I quote:

>
> Not entirely weird.

Ya reckon? :D

>
> It sounds like you have gone in with the
> semi-default 'ASS management' feature,
> which is not a feature of the LMT as such,
> but a feature to replace freelist management.

Straight forward "extent management uniform size 64K".

> space available per block. Oracle then uses
> the process ID to decide which space-management
> block you should use to find some space, and which

the "process ID"? Why on Earth?
I mean: fine, if it is a matter of finding a free extent,
it can use whatever algorithm. But once a free extent
of 16 blocks is allocated, why would it then have to
allocate anywhere else other than after the segment header?

> block referenced in the space management block you
> should first try. Consequently, a single row in a
> newly created 16block table COULD end up in the
> last block of the table.

Similar to my case. I just created a table with 4 rows,
in tablespace USERS. It ended up in an extent of 16 blocks
(as expected), with a single data block in block 9 of the
16 blocks (totally flabbergasted by this one!).
50:50 split?

> session in turn. I think in your case you will see
> either the first 8 or second 8 blocks formatted (if you
> don't get something so weird as an unformatted gap,
> go for a 32 block table), and the block used then
> varying from session to session.

Is this to avoid the problem solved before by the FREELIST
stuff? I mean, a "hash" using the process ID to decide
which block in the extent the session will write to?

>
> In the case of the index - the root block is traditionally
> the block after the segment header block - so I think a
> special case is followed to format that block when
> the index is created, whatever the process id of the
> session that creates the index.

Yup, it goes to the first block after segment header.

Noons

unread,

Feb 5, 2003, 11:51:55 AM2/5/03

to

Pete Sharman <peter....@oracle.com> wrote in

news:b1pd4...@drn.newsguy.com and I quote:

> And nor could I. He must have meant "we academics" because "U.S.
> academics" is an oxymoron. (ducks for cover!) :)
>

<sound of breaking glass>
LOL!

Howard J. Rogers

unread,

Feb 5, 2003, 11:55:02 AM2/5/03

to

"Niall Litchfield" <n-litc...@audit-commission.gov.uk> wrote in message
news:3e3fb9f6$0$248$ed9e...@reading.news.pipex.net...

It's a good article. But it contains one minor error. Locally-managed system
tablespaces are NOT the default in 9iR2. They are if you use the dbca to
create your database (the error I fell into). But issue manual 'create
database' commands, and DMT for system is still the default.

Regards
HJR

Noons

unread,

Feb 5, 2003, 12:05:10 PM2/5/03

to

Noons <nso...@optusnet.com.au.nospam> wrote in
news:Xns9319E5D9D8A...@210.49.20.254 and I quote:

> I mean: fine, if it is a matter of finding a free extent,
> it can use whatever algorithm. But once a free extent
> of 16 blocks is allocated, why would it then have to
> allocate anywhere else other than after the segment header?

Sorry, just re-read this and it doesn't make the point I wanted.
This is more to it:

It's ok to use an algorithm for selecting an extent for
the data segment out of the pool of free extents.

But why use yet another algorithm for allocating data
blocks to the table *within* an already allocated extent?

As I said later on: FREELISTS?

Richard Foote

unread,

Feb 5, 2003, 1:33:02 PM2/5/03

to

Hi Nuno,

The reason for this "not entirely weird" behaviour is to avoid contention
and the buffer busy waits that could result in a heavily inserted, poorly
freelisted segment. By effectively making the block accessed for inserts
"random" (providing there is sufficient space), you reduce this problem.
Note that multiple freelists kinda have this effect as well (that is
increased numbers of empty blocks below the HWM)

The price you pay is the "nth" block being selected rather than the "next"
block.

BTW, I have actually found ASSM tables to have as good (and in some cases )
better performance with regard to FTS than with non ASSM tablespaces. These
are well populated tables, each with something like 500-600 64K extents.

Cheers

Richard

"Noons" <nso...@optusnet.com.au.nospam> wrote in message

news:Xns9319E5D9D8A...@210.49.20.254...

Richard Foote

unread,

Feb 5, 2003, 1:42:22 PM2/5/03

to

"Howard J. Rogers" <howard...@yahoo.com.au> wrote in message
news:NP60a.41233$jM5.1...@newsfeeds.bigpond.com...

>
> "Niall Litchfield" <n-litc...@audit-commission.gov.uk> wrote in message
> news:3e3fb9f6$0$248$ed9e...@reading.news.pipex.net...
> > "Noons" <nso...@optusnet.com.au.nospam> wrote in message
> > news:Xns9318EB0C520...@210.49.20.254...
> > > Pete Sharman <peter....@oracle.com> wrote in
> > > news:b1m6e...@drn.newsguy.com and I quote:
> > > Has anyone here tried to dump the first extent off a LMT,
> > > table with just one block of data, 64K uniform alloc, 4096
> > > block size? Ie, 16 blocks/extent? Now, that was a surprise
> > > for me: Oracle takes an apparently weird choice as to which
> > > block gets data first! Weird, I tell you. :)
> >
> > Take a look at http://www.oaktable.net/fullArticle.jsp?id=4 which I
think
> > deals with what you are on about about halfway down.
>
> It's a good article. But it contains one minor error. Locally-managed
system
> tablespaces are NOT the default in 9iR2.

Hi Howard,

Ummmm, yes they are ? If you do not specify an extent management clause, the
default is LMT, regardless of the System TS.

>They are if you use the dbca to
> create your database (the error I fell into). But issue manual 'create
> database' commands, and DMT for system is still the default.

This is correct. The default TS for *System* is DMT, but regardless, the
default for all other tablespaces is LMT.

Cheers

Richard

Howard J. Rogers

unread,

Feb 5, 2003, 7:09:32 PM2/5/03

to

On Wed, 05 Feb 2003 23:42:22 +1000, Richard Foote wrote:

> "Howard J. Rogers" <howard...@yahoo.com.au> wrote in message
> news:NP60a.41233$jM5.1...@newsfeeds.bigpond.com...
>>
>> "Niall Litchfield" <n-litc...@audit-commission.gov.uk> wrote in message
>> news:3e3fb9f6$0$248$ed9e...@reading.news.pipex.net...
>> > "Noons" <nso...@optusnet.com.au.nospam> wrote in message
>> > news:Xns9318EB0C520...@210.49.20.254...
>> > > Pete Sharman <peter....@oracle.com> wrote in
>> > > news:b1m6e...@drn.newsguy.com and I quote:
>> > > Has anyone here tried to dump the first extent off a LMT,
>> > > table with just one block of data, 64K uniform alloc, 4096
>> > > block size? Ie, 16 blocks/extent? Now, that was a surprise
>> > > for me: Oracle takes an apparently weird choice as to which
>> > > block gets data first! Weird, I tell you. :)
>> >
>> > Take a look at http://www.oaktable.net/fullArticle.jsp?id=4 which I
> think
>> > deals with what you are on about about halfway down.
>>
>> It's a good article. But it contains one minor error. Locally-managed
> system
>> tablespaces are NOT the default in 9iR2.
>
> Hi Howard,
>
> Ummmm, yes they are ? If you do not specify an extent management clause, the
> default is LMT, regardless of the System TS.
>

Have another read Richard! 'LMT SYSTEM tablespaces are not the default'.

>>They are if you use the dbca to
>> create your database (the error I fell into). But issue manual 'create
>> database' commands, and DMT for system is still the default.
>
> This is correct.

Which is all I was saying!

> The default TS for *System* is DMT,

Ok. Compare these two sentences:

"The default TS for System is DMT"
"LMT system tablespaces are not the default"

Now. What's the logical difference between them?

Good.

>but regardless, the
> default for all other tablespaces is LMT.
>

I didn't say otherwise.

HJR

> Cheers
>
> Richard

Richard Foote

unread,

Feb 6, 2003, 12:17:19 AM2/6/03

to

"Howard J. Rogers" <howard...@yahoo.com.au> wrote in message news:<pan.2003.02.05....@yahoo.com.au>...

Hi Howard,

I totally missed the first word "SYSTEM" when you said "LMT SYSTEM
tablespaces are not the default" which kinda changes its meaning
somewhat.

There's really only one thing I can say in a situation like this and
that's to say "OOOpppppsssss".

And sorry ;)

Richard

Noons

unread,

Feb 6, 2003, 1:02:57 PM2/6/03

to

"Richard Foote" <richar...@bigpond.com> wrote in
news:Qd70a.41251$jM5.1...@newsfeeds.bigpond.com and I quote:

> freelisted segment. By effectively making the block accessed for inserts
> "random" (providing there is sufficient space), you reduce this problem.
> Note that multiple freelists kinda have this effect as well (that is
> increased numbers of empty blocks below the HWM)

Cool, I can live with this. Like it a lot, in fact.

> The price you pay is the "nth" block being selected rather than the "next"
> block.

Cheap.

>
> BTW, I have actually found ASSM tables to have as good (and in some cases )
> better performance with regard to FTS than with non ASSM tablespaces. These
> are well populated tables, each with something like 500-600 64K extents.
>

Is ASSM the default now with 9r2? I thought it wasn't,
but I may well be wrong.

Howard J. Rogers

unread,

Feb 6, 2003, 3:08:07 PM2/6/03

to

It isn't. And a good thing too.

Regards
HJR

"Noons" <nso...@optusnet.com.au.nospam> wrote in message

news:Xns931AF238EF7...@210.49.20.254...