Oracle is hanging? Don’t forget hanganalyze and systemstate!

February 6, 2014, 6:16 pm

≫ Next: Troubleshooting: ORA-29283 when oracle is member of a group with read/write privileges

sqlplus / as sysdba oradebug setmypid oradebug unlimit oradebug hanganalyze 3 oradebug dump ashdumpseconds 30 oradebug systemstate 266 oradebug tracefile_name
Your Oracle database – production DB, of course – is hanging. All users are blocked. You quickly check the obvious suspects (archivelog destination full, system swapping, etc.) but it’s something else. Even you, the Oracle DBA, cannot do anything: any select is hanging. And maybe you’re even not able to connect with a simple ‘sqlplus / as sysdba’.

What do you do ? There may be several ways to investigate deeper (strace or truss for example) but it will take time. And your boss is clear: the only important thing is to get the production running again as soon as possible. No time to investigate. SHUTDOWN ABORT and restart.

Ok, but now that everything is back to normal, your boss rules has changed: the system was down for 15 minutes. We have to provide an explanation. Root Cause Analysis.

But how will you investigate now ? You have restarted everything, so all V$ information is gone. You have Diagnostic Pack ? But the system was hanged: no ASH information went to disk. You can open an SR but what information will you give?

Hang Analyze

The next time it happens, you need to have a way to get some information that can be analyzed post mortem. But you need to be able to do that very quickly just before your boss shouts ‘shutdown abort now’. And this is why I’ve put it at the begining of the post, so that you can find it quickly if you need it…

That takes only a few seconds to generate all post-mortem necessary information. If you can take 1 more minute, you will even be able to read the first lines of hanganalyze output, and you will be able to identify a true hanging situation and maybe just kill the root of the blocking sessions instead of a merciless restart.

In order to show you the kind of output you get, I’ve run a few jobs locking the same resources (TM locks) – which is not a true hanging situation because the blocking session can resolve the situation.

Here is the first lines from the oradebug hanganalyze:
Chains most likely to have caused the hang: [a] Chain 1 Signature: 'PL/SQL lock timer'Systemstate has all information about System Objects (sessions, processes, ...) but you have to navigate into it in order to understand the wait chain. In my example:
SO: 0x914ada70, type: 4, owner: 0x91990478, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x91990478, name=session, file=ksu.h LINE:13580, pg=0 conuid=0 (session) sid: 23 ser: 7 trans: 0x8ea8e3e8, creator: 0x91990478 ... service name: SYS$USERS client details: O/S info: user: oracle, term: UNKNOWN, ospid: 7929 machine: vboxora12c program: oracle@vboxora12c (J002) Current Wait Stack: 0: waiting for 'enq: TM - contention' name|mode=0x544d0003, object #=0x1737c, table/partition=0x0 wait_id=10 seq_num=11 snap_id=1 wait times: snap=15.991474 sec, exc=15.991474 sec, total=15.991474 sec wait times: max=40.000000 sec, heur=15.991474 sec wait counts: calls=6 os=6 in_wait=1 iflags=0x15a0 There is at least one session blocking this session. Dumping 1 direct blocker(s): inst: 1, sid: 254, ser: 5 Dumping final blocker: inst: 1, sid: 256, ser: 5 This is a session that is waiting, and we have the final blocker: inst: 1, sid: 256, ser: 5
Then we get to the final blocker by searching the sid: 256:
SO: 0x9168a408, type: 4, owner: 0x9198d058, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x9198d058, name=session, file=ksu.h LINE:13580, pg=0 conuid=0 (session) sid: 256 ser: 5 trans: 0x8ea6b618, creator: 0x9198d058 ... service name: SYS$USERS client details: O/S info: user: oracle, term: UNKNOWN, ospid: 7925 machine: vboxora12c program: oracle@vboxora12c (J000) Current Wait Stack: 0: waiting for 'PL/SQL lock timer' duration=0x0, =0x0, =0x0 wait_id=0 seq_num=1 snap_id=1 wait times: snap=25.936165 sec, exc=25.936165 sec, total=25.936165 sec wait times: max=50.000000 sec, heur=25.936165 sec wait counts: calls=1 os=9 in_wait=1 iflags=0x5a0 There are 5 sessions blocked by this session. Dumping one waiter: inst: 1, sid: 254, ser: 5 wait event: 'enq: TM - contention' p1: 'name|mode'=0x544d0004 p2: 'object #'=0x1737c p3: 'table/partition'=0x0 row_wait_obj#: 95100, block#: 0, row#: 0, file# 0 min_blocked_time: 19 secs, waiter_cache_ver: 44
Analysing the System State takes much longer than the hanganalyze, but it has more information.

V$WAIT_CHAINS

When the blocking situation is not so desesperate, but you just want to see what is blocking, the hanganalyze information is also available online in V$WAIT_CHAINS. The advantage over ASH is that you see all processes (not only foreground, not only active ones).

Here is an example:

CHAIN_ID	CHAIN	CHAIN_SIGNATURE	INSTANCE	OSID	PID	SID	BLOCK
1	FALSE	‘PL/SQL lock timer’ <=’enq: TM – contention’ <=’enq: TM – contention’	1	7929	42	23	TRUE
1	FALSE	‘PL/SQL lock timer’ <=’enq: TM – contention’ <=’enq: TM – contention’	1	7927	41	254	TRUE
1	FALSE	‘PL/SQL lock timer’ <=’enq: TM – contention’ <=’enq: TM – contention’	1	7925	39	256	FALSE
2	FALSE	‘PL/SQL lock timer’ <=’enq: TM – contention’ <=’enq: TM – contention’	1	7933	46	25	TRUE
3	FALSE	‘PL/SQL lock timer’ <=’enq: TM – contention’ <=’enq: TM – contention’	1	7931	45	260	TRUE
4	FALSE	‘PL/SQL lock timer’ <=’enq: TM – contention’ <=’enq: TM – contention’	1	7935	47	262	TRUE

ASH Dump

There is something else that you can get if you have Diagnostic Pack. The ASH information can be dumped to trace file even if it cannot be collected in the database.

oradebug dump ashdumpseconds 30

that will gather ASH from latest 30 seconds, and the trace file will even have the sqlldr ctl file to load it in an ASH like table.

sqlplus -prelim

But what can you do if you can’t even connect / as sysdba ?
There is the ‘preliminary connection’ that does not create a session:
sqlplus -prelim / as sysdba

With that you will be able to get a systemstate. You will be able to get a ashdump.
But unfortunately, since 11.2.0.2 you cannot get a hanganalyze:

ERROR: Can not perform hang analysis dump without a process state object and a session state object.

But there is a workaround for that (from Tanel Poders’s blog): try to use a session that is already connected.

For exemple I use the DIAG background process (it’s better not to use vital processes for that)
SQL> oradebug setorapname diag Oracle pid: 8, Unix process pid: 7805, image: oracle@vboxora12c (DIAG)

Core message

Even in hurry,

Always check an hanganalyze to understand the problem.
Always get a systemstate before a shutdown abort.

and you will have information to investigate later, or to provide to Oracle Support.

Cet article Oracle is hanging? Don’t forget hanganalyze and systemstate! est apparu en premier sur Blog dbi services.

↧

Troubleshooting: ORA-29283 when oracle is member of a group with read/write privileges

March 10, 2014, 9:53 pm

≫ Next: Investigating Oracle lock issues with event 10704

≪ Previous: Oracle is hanging? Don’t forget hanganalyze and systemstate!

Recently, I worked on an Oracle database migration for a customer and I faced a strange behavior when accessing a file stored on the file system of the Oracle database. The database has been migrated on a new server, and consolidated with existing databases. Several jobs are running into the database and need to access the file system to read or write some files. A particular file is received daily from a remote job over FTP. This file is owned by the “dc_ftp” user in /home/dc_ftp.

First, this is the directory where the file is located:

SQL> select directory_name, directory_path from dba_directories where directory_name='MYDIR';
DIRECTORY_NAME            DIRECTORY_PATH
------------------------- -------------------------------------------------------
MYDIR                     /home/dc_ftp/out

The file is named myfile.txt and exists in /home/dc_ftp/out:

dc_ftp@srvora01$ ls /home/dc_ftp/out/
myfile.txt

During functional tests, I was able to successfully read the file from the database:

SQL> declare
  p_file utl_file.file_type;
begin
  p_file := utl_file.fopen ('MYDIR', 'myfile.txt', 'w');
end;
/
 
PL/SQL procedure successfully completed.

But members of the application team, connected from SQL Developer, were not able to read this file:

SQL> declare
  p_file utl_file.file_type;
begin
  p_file := utl_file.fopen ('MYDIR', 'myfile.txt', 'r');
end;
/  2    3    4    5    6
declare
*
ERROR at line 1:
ORA-29283: invalid file operation
ORA-06512: at "SYS.UTL_FILE", line 536
ORA-29283: invalid file operation
ORA-06512: at line 4

My first hypotheses, such as misconfiguration in SQL Developer tool, wrong entries in the TNS Names, or wrong permissions on the listener binaries, were quickly eliminated.

Both instance and host are correct, so the TNS entry seems to be OK:

SQL> select instance_name, host_name from v$instance;
INSTANCE_NAME    HOST_NAME
---------------- ----------------------------------------------------------------
DB11G            srvora01

The listener is owned by oracle, as expected:

oracle@srvora01$  ps -ef | grep tnslsnr
oracle    9684     1  0 16:33 ?        00:00:01 /u00/app/oracle/product/11.2.0/db_3_0/bin/tnslsnr LISTENER -inherit

And to finish, the problem occurs with or without SQL Developer.

The next step was to check permissions. First at the DB level:

SQL> select grantee, privilege, table_name from dba_tab_privs where table_name='MYDIR';
GRANTEE                        PRIVILEGE                                TABLE_NAME
------------------------------ ---------------------------------------- ---------------
SYSTEM                         WRITE                                    MYDIR
SYSTEM                         READ                                     MYDIR

Then at the operating system level:

oracle@srvora01$ ls -altr /home/dc_ftp/out
total 12
-rw-r-----. 1 dc_ftp dc_ftp   16 Feb 24 16:48 myfile.txt
drwxr-x---. 2 dc_ftp dc_ftp 4096 Feb 24 16:48 .
drwxr-xr-x. 5 dc_ftp dc_ftp 4096 Feb 24 16:54 ..

We can see that files and directory are not directly owned by oracle, but oracle is member of the group dc_ftp. It gets the read privilege on the file from the group dc_ftp:

oracle @ srvora01$ id oracle
uid=54321(oracle) gid=54321(oinstall) groups=54321(oinstall),54322(dba),54325(oper),54330(dc_ftp)

Finally, oracle user can clearly access the file from an operating system point of view, since it can order the “cat” command on the file:

oracle@srvora01$ cat /home/dc_ftp/out/myfile.txt
test
testéàè

Now let’s see which problem occurs when Oracle tries to access the directory from the session.

SQL> set define #
SQL> column spid new_value unix_pid
SQL> select spid from v$process p join v$session s on p.addr=s.paddr and s.sid=sys_context('userenv','sid');
SPID
------------------------
10914

SQL> host strace -e trace=open -p #unix_pid & echo $! > .tmp.pid
Process 10914 attached - interrupt to quit
 
SQL> declare
  p_file utl_file.file_type;
begin
  p_file := utl_file.fopen ('MYDIR', 'myfile2.txt', 'w');
end;
/
  2    3    4    5    6  
  
declare
*
ERROR at line 1:
ORA-29283: invalid file operation
ORA-06512: at "SYS.UTL_FILE", line 536
ORA-29283: invalid file operation
ORA-06512: at line 4
 
open("/home/dc_ftp/out/myfile2.txt", O_WRONLY|O_CREAT|O_TRUNC, 0666) = -1 EACCES (Permission denied)

The previous message clearly indicates that it is a permission issue. Oracle has no privilege to access the directory to read or write…

We can verifiy if it works when setting the permission READ to the file mytext.txt for non owner or group (i.e.644):

root@srvora01# chmod 644 /home/dc_ftp/out/myfile.txt
root@srvora01# ls -altr /home/dc_ftp/out
total 12
-rw-r--r--. 1 dc_ftp dc_ftp   16 Feb 24 16:48 myfile.txt
drwxr-xr-x. 2 dc_ftp dc_ftp 4096 Feb 24 16:48 .
drwxr-xr-x. 5 dc_ftp dc_ftp 4096 Feb 24 16:54 ..

oracle@srvora01$ sqlplus system/****@DB11G
 
SQL> declare
  p_file utl_file.file_type;
begin
  p_file := utl_file.fopen ('MYDIR', 'myfile.txt', 'r');
end;
/  2    3    4    5    6 
 
PL/SQL procedure successfully completed.

This is the workaround.

But what if the file must not be read by the rest of the world? The file is generated with 640 permissions for a good readon here. So I had to fix the issue definitely.

Fortunately, I fell on a Metalink note which helped me find the solution to my problem. According to the note ID 832323.1, if the file or directory is accessed through a group on which oracle is member of (here dc_ftp group has read privileges on the file), adding oracle to that group while the database is running may cause the error ORA-29283 to occur.

Fine! But the database was restarted several times during the migration process, because of some initialization parameters to apply. I was totally sure that the database had been restarted after oracle user had been added to dc_ftp group! Moreover, I got no error when connected locally on the server. The error only occured when connected remotely through SQLNET, using sqlplus system/****@DB11G for instance through the listener…

That’s why I had an idea. If the database processes are starting with the oracle user’s permissions and if the database need to be restarted in order to make changes to take effect, the listener should be affected too! And the listener was never restarted during the migration. It was only reloaded to reflect changes from the new instance. This could explain why only sessions opened through the listener are affected…

I restarted the listener and… all problems went away!!

oracle@srvora01$ lsnrctl stop listener
oracle@srvora01$ lsnrctl start listener
oracle@srvora01$ sqlplus system/*****@DB11G
 
SQL> declare
  p_file utl_file.file_type;
begin
  p_file := utl_file.fopen ('MYDIR', 'myfile.txt', 'r');
end;
/  2    3    4    5    6
 
PL/SQL procedure successfully completed.

To summarize, do not forget that the listener can also be impacted when changing permissions while it is running. It would be better to restart it instead of a simple reload when you make these kinds of modifications.

Cet article Troubleshooting: ORA-29283 when oracle is member of a group with read/write privileges est apparu en premier sur Blog dbi services.

↧

Investigating Oracle lock issues with event 10704

March 14, 2014, 1:44 am

≫ Next: Kerberos SSO with Documentum D2 3.1 SP1

≪ Previous: Troubleshooting: ORA-29283 when oracle is member of a group with read/write privileges

Did you ever encounter unexplained Oracle lock issues? They may be coming from unindexed foreign keys (which is worse in 11g). It’s not easy to monitor. Of course you can check Oracle locks from V$LOCKED_OBJECT, but that is a solution only for locks that remain. Some Oracle locks are there only for a short duration. How do you check which lock is acquired by a statement?

Event 10704 is the solution. I’ll show some examples in order to explain which lines from the trace are interresting, and how to interpret them.

In this posting, I’ll illustrate how to use event 10704 to understand locking on referential integrity (the well know index on foreign key issue) in 12c.

I set the trace (10704 level 3 for the locks, 10046 to see the statements):

SQL> alter session set events='10046 trace name context forever, level 1 : 10704 trace name context forever, level 3' tracefile_identifier='ForeignKey';
Session altered.

I delete from the DEPT table:

SQL> DELETE FROM SCOTT.DEPT WHERE DEPTNO=0;
0 rows deleted.

and then terminate the transaction:

SQL> ROLLBACK;
Rollback complete.

and stop the trace:

SQL> alter session set events='10046 trace name context off : 10704 trace name context off ';
Session altered.

Now let’s get the trace file name:

SQL> column tracefile new_value tracefile
SQL> select tracefile from v$process where addr=(select paddr from v$session where sid=sys_context('USERENV','SID'));

TRACEFILE
--------------------------------------------------------------------------------
/u01/app/oracle/diag/rdbms/demo/DEMO/trace/DEMO_ora_28042_ForeignKey.trc

and grep the interresting lines from the trace file:

SQL> host grep -A 1 -E 'ksqgtl [*]{3}|ksqrcl: [A-Z]|ksqcnv: [A-Z]|XCTEND|PARSE ERROR|PARSING' &tracefile

The output is:

--
PARSING IN CURSOR #140064855052048 len=37 dep=0 uid=0 oct=7 lid=0 tim=294484898588 hv=3450586461 ad='822782c8' sqlid='c1fnpd76urjax'
DELETE FROM SCOTT.DEPT WHERE DEPTNO=0
--
ksqgtl *** TM-00017EA8-00000000-00000000-00000000 mode=3 flags=0x401 timeout=21474836 ***
ksqgtl: xcb=0x8eb5bcf8, ktcdix=2147483647, topxcb=0x8eb5bcf8
--
ksqgtl *** TM-00017EAA-00000000-00000000-00000000 mode=4 flags=0x401 timeout=21474836 ***
ksqgtl: xcb=0x8eb5bcf8, ktcdix=2147483647, topxcb=0x8eb5bcf8
--
ksqrcl: TM-00017EAA-00000000-00000000-00000000
ksqcmi: TM-00017EAA-00000000-00000000-00000000 mode=0 timeout=0
--
PARSING IN CURSOR #140064855052048 len=8 dep=0 uid=0 oct=45 lid=0 tim=294484900906 hv=2761672982 ad='0' sqlid='8sst43uk9rk8q'
ROLLBACK
--
XCTEND rlbk=1, rd_only=1, tim=294484900964
ksqrcl: TM-00017EA8-00000000-00000000-00000000
ksqcmi: TM-00017EA8-00000000-00000000-00000000 mode=0 timeout=0
--

In order to interpret it, we need one more information – the OBJECT_ID in hexadecimal:

SQL> column object_name format a20
SQL> column object_type format a20
SQL> select object_id , to_char(object_id,'0XXXXXXX') , object_name,object_type from all_objects where owner='SCOTT' order by 2;

 OBJECT_ID  TO_CHAR(O OBJECT_NAME          OBJECT_TYPE
---------- --------- -------------------- --------------------
 97960      00017EA8  DEPT                 TABLE
 97961      00017EA9  PK_DEPT              INDEX
 97962      00017EAA  EMP                  TABLE
 97963      00017EAB  PK_EMP               INDEX
 97964      00017EAC  BONUS                TABLE
 97965      00017EAD  SALGRADE             TABLE

7 rows selected.

Great. Now let’s interpret that.
You see the DELETE statement in the trace file (written by event 10046 which is the sql_trace). Then we have:

ksqgtl *** TM-00017EA8-00000000-00000000-00000000 mode=3

…which means we get lock (ksqgtl) on table (lock_type=TM) SCOTT.DEPT (object_id=17EA8) in mode Row-X (mode=3).
This is expected as we have the intention to delete rows, thus we request Row-X lock on the table.
And next to it you see:

TM-00017EAA-00000000-00000000-00000000 mode=4

which is a table lock on SCOTT.EMP (object_id=17EAA) in Share mode (mode=4) – the child lock that we see since 8.1.7 when the foreign key is not indexed.
That lock is released immediately (which is the behaviour since 9.2) after the delete:

ksqrcl: TM-00017EAA-00000000-00000000-00000000

ksqrcl is the function that releases lock. That is done before the end of the transaction (XCTEND from sql_trace).
And the Row-X on DEPT is released once the transaction is ended:

ksqrcl: TM-00017EA8-00000000-00000000-00000000

If we create a index in order to avoid the Share mode lock:

SQL> CREATE INDEX SCOTT.FK_DEPTNO on SCOTT.EMP(DEPTNO);
Index created.SQL> DELETE FROM SCOTT.DEPT WHERE DEPTNO=0;
0 rows deleted.
SQL> ROLLBACK;
Rollback complete.

…then here is what we get from the 10704 trace:

ksqgtl *** TM-00017EAA-00000000-00000000-00000000 mode=4 flags=0x401 timeout=0 ***

This is the Share mode lock on the table when creating the index.
This is what a delete on the parent generates:

ksqgtl *** TM-00017EA8-00000000-00000000-00000000 mode=3 flags=0x401 timeout=21474836 ***
ksqgtl *** TM-00017EAA-00000000-00000000-00000000 mode=3 flags=0x401 timeout=21474836 ***
XCTEND rlbk=1, rd_only=1, tim=294485532138
ksqrcl: TM-00017EAA-00000000-00000000-00000000
ksqrcl: TM-00017EA8-00000000-00000000-00000000

…which are only Row-X (mode=3) locks, but are released only at the end of the transaction.

And besides requesting a lock (ksqgtl) and releasing a lock (ksqrcl), the third interresting function is when a lock is converted (ksqcnv) from one mode to another mode.

Let’s see an exemple when we first delete rows from the child (without the index on the foreign key):

SQL> DELETE FROM SCOTT.EMP WHERE DEPTNO=0
ksqgtl *** TM-00017EA8-00000000-00000000-00000000 mode=2 flags=0x401 timeout=21474836 ***
ksqgtl *** TM-00017EAA-00000000-00000000-00000000 mode=3 flags=0x401 timeout=21474836 ***

DML on EMP (object_id=17EAA) requests a Row-X and because of the referential integrity it request also a Row-S on the opposite side DEPT (object_id=17EA8)

Note that I did this example on 12c. That behaviour has changed in each Oracle version (and you can use event 10704 on your version in order to check on yours). Basically it was Row-S before 11g. Then 11g changed it to Row-X which introduced a lot of issues (see here and here for example). Fortunately 12c get it back to Row-S for two situations: insert into parent and delete from child.

SQL> DELETE FROM SCOTT.DEPT WHERE DEPTNO=0
ksqcnv: TM-00017EA8-00000000-00000000-00000000 mode=3 timeout=21474836
ksqcnv: TM-00017EAA-00000000-00000000-00000000 mode=5 timeout=21474836
ksqcnv: TM-00017EAA-00000000-00000000-00000000 mode=3 timeout=21474836

Here we see the lock conversions. On DEPT (object_id=17EA8) we had a Row-S (mode=2) and now requesting a Row-X (mode=3) because of the DML on it.
And because of the unindexed foreign key we temporarily need a Share lock in addition to the Row-X we had. This is Share Row-X (mode=5). And it is converted back to Row-X as soon as the delete is done.

So remember the following in order to interpret the 10704 trace dump:

ksqgtl: lock request
ksqcnv: lock conversion
ksqrrcl: lock release

For more reference, you can get the lock type description, such as TM , from V$LOCK_TYPE:

SQL> select * from v$lock_type where type='TM';

`TYPE`	`NAME`	`ID1_TAG`	`ID2_TAG`	`IS_`	`IS_`	`DESCRIPTION`	`CON_ID`
`TM`	`DML`	`object #`	`table/partition`	`YES`	`NO`	`Synchronizes accesses to an object`	`0`

And when ID1 is an object# then the first hexadecimal is the OBJECT_ID from DBA_OBJECTS.
About the lock modes, you have the intended row locks which have the goal to block concurrent DDL:

mode=2 is Row-S, mode=3 is Row-X
and the table locks which have the goal to prevent concurrent row locks :
mode=4 is Share, mode=5 is Row-X + Share, mode=6 is eXclusive.

You can also check the blocking matrix:

http://prezi.com/cdckwsgqxeyi/oracle-table-lock-modes/

Cet article Investigating Oracle lock issues with event 10704 est apparu en premier sur Blog dbi services.

↧

Kerberos SSO with Documentum D2 3.1 SP1

March 27, 2014, 1:06 am

≫ Next: Oracle TNS_ADMIN issues due to bad environment settings

≪ Previous: Investigating Oracle lock issues with event 10704

Last week, when I was working on how to setup the Kerberos SSO for D2 3.1 SP1, I faced some issues due to a documentation that doesn’t seem to be up to date… In fact, our version is D2 3.1 SP1 and there is no specific documentation for SP1. In consequence, I read the D2 3.1 Installation Guide and the D2 4.0 Installation Guide. The first time I read the D2 3.1 documentation, I found it very light and I knew I would have some problems. Fortunately, I already had experience with Kerberos, essentially with the MIT distribution on Linux (how to setup a KDC, kerberize an application, manage users, etc.).

The first thing that is important to know is that as D2 isn’t a WDK client, the setup of the SSO using Kerberos just involves the D2 host server and the Active Directory server. There is no need to setup the Content Store to use Kerberos. So here is the configuration that were used:

Microsoft Active Directory on Windows Server 2008 R2. Let’s name this server ad001 and the related domain domain.com
D2 3.1 SP1 on Microsoft Server 2008 R2. Let’s name this server wd231

The second thing that is important to know is that not all Directory Servers are supported. Indeed, Documentum doesn’t support Linux Key Distribution Center (KDC).

1. D2 3.1 documentation steps

The first part of this blog will describe which steps the official D2 3.1 Installation Guide provides to help Administrator to setup a SSO using Kerberos in D2. You will see that those steps aren’t very descriptive but with a little bit of imagination, you could do something with that.

a. Edit the shiro.ini file

Open the file shiro.ini and add the following lines:

[main] 
D2-Kerberos=eu.c6.d2.web.filters.authc.D2KerberosHttpAuthenticationFilter 
D2-Kerberos.servicePrincipal=HTTP/computerName.domainName 
D2-Kerberos.krbConfLocation=C:/Windows/krb5.ini 
D2-Kerberos.keyTabLocation=C:/computerName.keytab 
D2-Kerberos.docbases=docbase1,login1,password1,domain1|docbase2,... 
D2-Kerberos.debug=true 
[urls] 
/** = D2-Kerberos

b. On the AD

Create a user on the AD with the computer name of your application server and add the following options:

Use Kerberos DES encryption types for this account
This account supports Kerberos AES 128 bit encryption

Generate a keytab using the command below. Well in fact the official documentation only display the command and don’t explain what is it or where to execute it.

Transfer this keytab on the D2 host server.

c. krb5.ini

[libdefaults]
    default_realm = DOMAINNAME
[realms]
    DOMAINNAME = {
        kdc = adserver.domainName
    }

d. On Windows 7

Run gpedit.msc
Expand “Local Computer Policy” / “Computer Configuration” / “Windows Settings” / “Security Settings” / “Local Policies” / “Security Options” / “Network security:
Configure encryption types allowed for Kerberos”
Double click “Network security: Configure encryption types allowed for Kerberos”
Select all.
Press “OK”

2. D2 3.1 SP1 steps

The second part of this blog will present which steps must be done to get Kerberos SSO working with D2 3.1 SP1. If you only follow steps describes in the official documentation, as some explanations are missing, you will probably get issues. Now here are the steps that were required to get the SSO working on our D2 3.1 SP1:

Let’s begin with the beginning contrary to the official documentation which begin with the end. So logic!

a. On the Active Directory

Create a user on the Active Directory with the following properties:

Username doesn’t matter (don’t need to be the D2 server hostname). Let’s name this user: dmskrbsso
Password: dmskrbssoPassword
Password never expire
This account support Kerberos AES 128 bits encryption
Trust for Delegation to any service (Kerberos Only)
This account support Kerberos DES encryption

This last configuration isn’t mandatory as Kerberos will always use the most secure encryption available (AES 256 in general). So there is no problem if your Active Directory admin doesn’t want to enable DES encryption as this isn’t enough secure.

When the user is successfully created, open an administrator command prompt and create the keytab for D2 using the following command:

This command will create a file named wd231.keytab which should be transfer on the D2 host server. Let’s place this file at the following location: C:/Kerberos/wd231.keytab

b. On Documentum Administrator

Create a repository user with the same Name, Login Name and password as the one created on the AD:

State: Active
Name: dmskrbsso
Login Name: dmskrbsso
Login Domain: domain.com
User Source: Inline Password
Password: dmskrbssoPassword
Privileges: None
Extended Privileges: None
Client Capability: Consumer

This first user will just reflect the new user created on the AD but I think this user isn’t mandatory.

Create another repository user which will be used by the shiro.ini file to connect all other users through SSO:

State: Active
Name: d2krbsso
Login Name: d2krbsso
Login Domain: domain.com (This is mandatory! The SSO will not work without the AD domain here)
User Source: Inline Password
Password: d2krbssoPassword
Privileges: Superuser
Extended Privileges: None
Client Capability: Consumer

From a command prompt on the D2 server, execute the following command to get the encrypted password of the user d2krbsso:

This command assume that your D2-Client web application is at “C:/Tomcat6D2-Client31SP1/webapps/D2-Client/”. Remember the result of this command as the encrypted password will be needed later in the shiro.ini file. Let’s name this encrypted password userEncryptedPw

c. krb5.ini

Create the file C:/Kerberos/krb5.ini with the following content:

[libdefaults]
  default_realm = DOMAIN.COM
[realms]
  DOMAIN.COM = {
    kdc = ad001.domain.com
  }
[domain_realm]
  .domain.com = DOMAIN.COM
  domain.com = DOMAIN.COM
[logging]
  default = FILE:C:/Kerberos/logs/kdc_default.log
  kdc = FILE:C:/Kerberos/logs/kdc.log

Don’t forget to create the folder C:/Kerberos/logs/.

d. Edit the shiro.ini file

The shiro.ini file is the file used by D2 to authenticate user through Kerberos. This file could be found at “C:/Tomcat6D2-Client31SP1/webapps/D2-Client/WEB-INF/classes/shiro.ini”. Replace the properties in this file with the following:

[main]
D2-Kerberos=com.emc.d2.web.filters.authc.D2KerberosHttpAuthenticationFilter
D2-Kerberos.servicePrincipal=HTTP/
D2-Kerberos.krbConfLocation=C:/Kerberos/krb5.ini
D2-Kerberos.keyTabLocation=C:/Kerberos/wd231.keytab
D2-Kerberos.docbases=docbase1,d2krbsso,userEncryptedPw,DOMAIN.COM
D2-Kerberos.debug=true
[urls]
/**=D2-Kerberos

docbase1 correspond to a repository using Kerberos. You could set more than 1 docbase using the following property:
D2-Kerberos.docbases=docbase1,login1,password1,domain1|docbase2,…

Maybe you didn’t see the difference with the original documentation but if you look at the property named “D2-Kerberos”, you will see that the Java Class D2KerberosHttpAuthenticationFilter isn’t in the same package. In our D2 3.1 SP1, this Java Class is located in com.emc.d2.web.filters whereas the D2 3.1 official documentation indicate it on eu.c6.d2.web.filters. Something funny is that on the D2 4.0 official documentation, this property indicate again another location: eu.c6.d2.portal.server.filters.

e. Verify the Java Runtime Environment

It seems that Kerberos SSO for D2 require a java 1.6 jre. The simplest way to verify this is to check the service configuration of your application server. Another way could be to take a look at the registry:

Run regedit
Navigate HKEY_LOCAL_MACHINE / SOFTWARE / Wow6432Node / Apache Software Foundation / Procrun 2.0 / tomcat6D231SP1 / Parameters / Java
Verify that the JVM parameter point to Java 1.6: C:/Program Files/Java/jre6/bin/server/jvm.dll

This assume that the JAVA_HOME of your application server is C:/Program Files/Java/jre6/

f. Install the Java Cryptography Extension (JCE)

Download Java Cryptography Extension (JCE) for the java version used by the JVM and copy both jars into:
C:/Program Files/Java/jre6/lib/security/

g. Restart

Restart your D2 application server and look for errors on the tomcat error log files:
C:/Tomcat6D2-Client31SP1/logs/tomcat6d231sp1-stdout.YYYY-MM-DD.log

h. User configuration

For D2 Kerberos SSO, there is no need to change anything on user properties. That means that, for example, a Login Domain of LDAP and User Source of LDAP is fine.

i. On client computer

Run gpedit.msc
Expand “Local Computer Policy” / “Computer Configuration” / “Windows Settings” / “Security Settings” / “Local Policies” / “Security Options” / “Network security:
Configure encryption types allowed for Kerberos”
Double click “Network security: Configure encryption types allowed for Kerberos”
Select all.
Press “OK

Please be aware that the D2-Client URL must be detected by Internet Explorer as an “Intranet Site”. This could be done through Internet Explorer options.

This finally concludes the configuration of Kerberos SSO in D2 3.1 SP1. To get a Kerberos ticket, just log in on the client machine with a user defined in AD and if you have followed the steps above, SSO should work. If this is not the case, please let me know and I might be able to help.

Good luck!

Cet article Kerberos SSO with Documentum D2 3.1 SP1 est apparu en premier sur Blog dbi services.

↧

Oracle TNS_ADMIN issues due to bad environment settings

April 14, 2014, 5:11 pm

≫ Next: SQL Server 2014: sys.dm_exec_query_profiles, a new promising feature

≪ Previous: Kerberos SSO with Documentum D2 3.1 SP1

Recently, I faced a TNS resolution problem at a customer. The reason was a bad environment setting: The customer called the service desk because of a DBLINK pointing to a bad database.

The users were supposed to be redirected to a development database, and the DBLINK was redirecting to a validation database instead. The particularity of the environment is that development and validation databases are running on the same server, but on different Oracle homes, each home having its own tnsnames.ora. Both tnsnames.ora contain common alias names, but pointing on different databases. Not exactly best practice, but this is not the topic here.

The problem started with some issues to reproduce the case. Our service desk was not able to reproduce the situation without understanding that the customer was trying to access the database remotely via a development tool (through the listener), while we were connected locally on the server.

Let me present the case with my environment.
First, this is the database link concerned by the issue:

SQL> select * from dba_db_links;
OWNER      DB_LINK              USERNAME                       HOST       CREATED
---------- -------------------- ------------------------------ ---------- ---------
PUBLIC     DBLINK               DBLINK                         MYDB       21-MAR-14

And this is the output when we try to display the instance name through the DBLINK, when connected locally:

SQL> select instance_name from v$instance@DBLINK;
INSTANCE_NAME
----------------
DB2

The user is redirected on the remote database, as expected. Now, let’s see what happens when connected using the SQL*Net layer:

[oracle@srvora01 ~]$ sqlplus system@DB1
SQL*Plus: Release 11.2.0.3.0 Production on Mon Mar 24 10:07:45 2014
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
 
Enter password:
 
Connected to:
 
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
 
SQL> select instance_name from v$instance@DBLINK;
INSTANCE_NAME
----------------
DB1

Here we can see that the user is not redirected to the same database (here, for demonstration puproses, on the database itself).
The first thing to check is the TNS_ADMIN variable, if it exists:

[oracle@srvora01 ~]$ echo $TNS_ADMIN
/u00/app/oracle/product/11.2.0/db_3_0/network/admin

There is the content of the tnsnames.ora file on that location:

[oracle@srvora01 ~]$ cat /u00/app/oracle/product/11.2.0/db_3_0/network/admin/tnsnames.ora
DB1 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = tcp)(HOST = srvora01)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = DB1)
    )
  )
MYDB =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = tcp)(HOST = srvora01)(PORT = 1521))
    (CONNECT_DATA =
      (SERVICE_NAME = DB2)
    )
  )

Clearly, we have a problem with the TNS resolution. The local connection resolves the MYDB alias correctly, while the remote connection resolves a different database with the alias. In this case, we have two solutions:

The tnsnames.ora is not well configured: this is not the case, as you can see above
Another tnsnames.ora file exists somewhere on the server and is used by remote connections

To confirm that the second hypothesis is the good one, we can use the strace tool:

SQL> set define #
SQL> select spid from v$process p join v$session s on p.addr=s.paddr and s.sid=sys_context('userenv','sid');
SPID
------------------------
5578
 
SQL>  host strace -e trace=open -p #unix_pid & echo $! > .tmp.pid
Enter value for unix_pid: 5578
SQL> Process 5578 attached - interrupt to quit
 
SQL> select instance_name from v$instance @ DBLINK;
open("/u00/app/oracle/product/11.2.0/db_3_0/network/admin/tnsnames.ora", O_RDONLY) = 8
open("/etc/host.conf", O_RDONLY)        = 8
open("/etc/resolv.conf", O_RDONLY)      = 8
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 8
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 8
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 8
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 10
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 10
open("/etc/hostid", O_RDONLY)           = -1 ENOENT (No such file or directory)
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 10INSTANCE_NAME
----------------
DB2

The DBLINK is resolved using the file /u00/app/oracle/product/11.2.0/db_3_0/network/admin/tnsnames.ora.
Now, when connected remotely:

SQL> set define #
SQL> select spid from v$process p join v$session s on p.addr=s.paddr and s.sid=sys_context('userenv','sid');
SPID
------------------------
6838
 
SQL> host strace -e trace=open -p #unix_pid & echo $! > .tmp.pid
Enter value for unix_pid: 6838
SQL> Process 6838 attached - interrupt to quit
 
SQL> select instance_name from v$instance@DBLINK;
open("/u00/app/oracle/network/admin/tnsnames.ora", O_RDONLY) = 8
open("/etc/host.conf", O_RDONLY)        = 8
open("/etc/resolv.conf", O_RDONLY)      = 8
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 8
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 8
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 8
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 9
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 9
open("/etc/hostid", O_RDONLY)           = -1 ENOENT (No such file or directory)
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 9INSTANCE_NAME
----------------
DB1

Here the DBLINK is resolved with the file /u00/app/oracle/network/admin/tnsnames.ora.

Two different tnsnames.ora files are used according to the connection method! If we query the content of the second tnsnames.ora, we have an explanation for our problem:

[oracle@srvora01 ~]$ cat /u00/app/oracle/network/admin/tnsnames.ora
MYDB =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = tcp)(HOST = srvora01)(PORT = 1521))
    (CONNECT_DATA =
      (SERVICE_NAME = DB1)
    )
  )

It is not clearly documented by Oracle, but the database session can inherit the environment variables in three different ways:

When you connect locally to the server (no SQL*Net, no listener), the Oracle session inherits the client environment
When you connect remotely to a service statically registered on the listener, the Oracle session inherits the environment which started the listener
When you connect remotely to a service dynamically registered on the listener, the Oracle session inherits the environment which started the database

In our case, the database was restarted with the wrong TNS_NAMES value set. Then, the database registered this value for remote connections. We can check this with the following method:

[oracle @ srvora01 ~]$ ps -ef | grep pmon
oracle    3660     1  0 09:02 ?        00:00:00 ora_pmon_DB1
oracle    4006     1  0 09:05 ?        00:00:00 ora_pmon_DB2
oracle    6965  3431  0 10:44 pts/1    00:00:00 grep pmon
 
[oracle @ srvora01 ~]$ strings /proc/3660/environ | grep TNS_ADMIN
TNS_ADMIN=/u00/app/oracle/network/admin

Note that we can get the value for TNS_ADMIN using the dbms_system.get_env.

The solution was to restart the database with the correct TNS_ADMIN value:

[oracle @ srvora01 ~]$ echo $TNS_ADMIN
/u00/app/oracle/product/11.2.0/db_3_0/network/admin
 
[oracle@srvora01 ~]$ sqlplus / as sysdba
 
SQL*Plus: Release 11.2.0.3.0 Production on Mon Mar 24 10:46:03 2014
 
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
 
 
Connected to:
 
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
 
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
 
SQL> startup
ORACLE instance started.Total System Global Area 1570009088 bytes
Fixed Size                  2228704 bytes
Variable Size            1023413792 bytes
Database Buffers          536870912 bytes
Redo Buffers                7495680 bytes
Database mounted.
Database opened.
 
[oracle@srvora01 ~]$ ps -ef | grep pmon
oracle    4006     1  0 09:05 ?        00:00:00 ora_pmon_DB2
oracle    7036     1  0 10:46 ?        00:00:00 ora_pmon_DB1
oracle    7116  3431  0 10:46 pts/1    00:00:00 grep pmon
 
[oracle@srvora01 ~]$ strings /proc/7036/environ | grep TNS_ADMIN
TNS_ADMIN=/u00/app/oracle/product/11.2.0/db_3_0/network/admin

The value for TNS_ADMIN is now correct.

[oracle@srvora01 ~]$ sqlplus system @ DB1
 
SQL*Plus: Release 11.2.0.3.0 Production on Mon Mar 24 10:47:21 2014
 
Copyright (c) 1982, 2011, Oracle.  All rights reserved.Enter password:
 
Enter password:
 
Connected to:
 
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
 
SQL> select instance_name from v$instance @ DBLINK;
INSTANCE_NAME
----------------
DB2

Remote connections are now using the right tnsnames.ora.
I hope this will help you with your TNS resolution problems.

Cet article Oracle TNS_ADMIN issues due to bad environment settings est apparu en premier sur Blog dbi services.

↧

SQL Server 2014: sys.dm_exec_query_profiles, a new promising feature

June 13, 2014, 2:52 am

≫ Next: Documentum upgrade project – ActiveX and D2-Client 3.1Sp1

≪ Previous: Oracle TNS_ADMIN issues due to bad environment settings

Among the hidden features provided by SQL Server 2014 there is a very interesting dynamic management view named sys.dm_exec_query_profiles. This feature will help the database administrators and experienced developers to troubleshoot long running queries in real-time. I’m pretty sure you had to deal with the following questions: When will this query finish? What percentage of total workload is performed by this request? Which steps are the longest? Before SQL Server 2014 it was impossible to answer the questions above!

But first, I have to admit that this new DMV has raised of lot of questions during my tests. I will try to share my thoughts and findings with you in this post.

My test consisted of running this long query:

select

YEAR(TransactionDate) AS year_tran,

MONTH(TransactionDate) AS month_tran,

FIRST_VALUE(p.ProductNumber) OVER (PARTITION BY YEAR(TransactionDate), MONTH(TransactionDate) ORDER BY TransactionDate) AS first_product_nb,

LAST_VALUE(p.ProductNumber) OVER (PARTITION BY YEAR(TransactionDate), MONTH(TransactionDate) ORDER BY TransactionDate) AS last_product_nb

from AdventureWorks2012.dbo.bigTransactionHistory as a

join AdventureWorks2012.dbo.bigProduct as p

on a.ProductID = p.ProductID

On my computer this query takes 05’’47’ to run. The parallelism is enabled. My SQL Server instance can use up to 8 processors. The first time I was disappointed by viewing the output of the sys.dm_exec_query_profiles DMV during my query execution:

select *

from sys.dm_exec_query_profiles

blog_12_-_1_-_output_sys_dm_exec_query_profiles

… No output! Ok what are we supposed to do to see something with sys.dm_exec_query_profiles ? :-D Keep good habits and go back to the SQL Server BOL that says:

To serialize the requests of sys.dm_exec_query_profiles and return the final results to SHOWPLAN XML, use SET STATISTICS PROFILE ON; or SET STATISTICS XML ON;

In other words, to have a chance to see something with this DMV we must use some session options like STATISTICS PROFILE, STATISTICS XML or force SQL Server to display the execution plan after execution query with SQL Server Management Studio. My first though was: why do we have to use some additional options to produce some output for this new DMV? These constraints severely limit the scenarios where we can use this DMV … After some reflexions and discussions with others French MVPs we can think that is normal because tracking the execution plan information is expensive and we could potentially bring a production server to its knees by enabling this feature for all the workload statements. But using additional session options can be impractical in production environments because it requires to execute itself the concerned query and it is not always possible. Fortunately Microsoft provides the query_post_execution_showplan event which can be used into a SQL Server trace or an extended event session. But the implementation design of this event has a significant impact to the performance of a SQL Server instance. Indeed, even with a short-circuit predicate this event will be triggered each time a SQL statement will be executed because the query duration is not known ahead of time (please refer to this Microsoft connect item). To summarize, using this event in OLTP production environment should be used in a short period for troubleshooting purposes. In OLAP environment the story is not the same because we don’t have to deal with a lot of short queries but only with long running queries issued by cube processing or ETL processes for example.

After discussing the pros and cons of this new feature let’s start with my precedent T-SQL statement and the use of the session option: SET STATISTICS PROFILE ON

SET STATISTICS PROFILE ON;

select

YEAR(TransactionDate) AS year_tran,

MONTH(TransactionDate) AS month_tran,

FIRST_VALUE(p.ProductNumber) OVER (PARTITION BY YEAR(TransactionDate), MONTH(TransactionDate) ORDER BY TransactionDate) AS first_product_nb,

LAST_VALUE(p.ProductNumber) OVER (PARTITION BY YEAR(TransactionDate), MONTH(TransactionDate) ORDER BY TransactionDate) AS last_product_nb

from AdventureWorks2012.dbo.bigTransactionHistory as a

join AdventureWorks2012.dbo.bigProduct as p

on a.ProductID = p.ProductID

In other session:

select *

from sys.dm_exec_query_profiles

blog_12_-_2_-_output_sys_dm_exec_query_profiles_2

This DMV provides a lot of useful information. These information are more granular than SET STATISTICS IO because the counters returned are per operator per thread (node_id / physical_operator_name and thread_id columns). In my case the SQL Server instance can use up to 8 processors and we can notice that for some operators (node_id) we have several threads using in parallel (thread_id). Furthermore two others columns are interesting like estimate_row_count and row_count. The former is the number of estimated rows an operator should address and the latter is the current number of rows addressed by the same operator (remember these counters are per operator per thread …). We can compare these two columns to estimate a completion percentage per operator (or per operator per thread). Finally we can categorize some information provided by this DMV in two parts: information related either to the query execution engine (row_count, estimated_row_count) or the storage execution engine (logical_read_count, write_page_count, lob_logical_read_count etc.)

As you can see, this DMV provide a lot of information to us and you will certainly customize the final output yourself depending on what you want to focus on.

Here is an example of a custom script we can create that uses the sys.dm_exec_query_profiles, sys.objects, sys.indexes, sys.dm_exec_requets, sys.dm_exec_sessions, and sys.dm_os_waiting_tasks to cover both the real-time execution information and the related waits occurring in the same time.

USE AdventureWorks2012;

SELECT

qp.node_id,

qp.session_id,

o.name AS [object_name],

i.name AS index_name,

qp.physical_operator_name + QUOTENAME(CAST(COUNT(qp.thread_id) AS VARCHAR(4))) AS physical_operator_name,

SUM(qp.estimate_row_count) AS estimate_row_count,

SUM(qp.row_count) AS row_count,

CASE

WHEN SUM(qp.row_count) * 1. / SUM(qp.estimate_row_count + .00001) THEN CAST(CAST(SUM(qp.row_count) * 100. / SUM(qp.estimate_row_count + .00001) AS DECIMAL(5,2)) AS VARCHAR(6)) + ‘ %’

ELSE ‘100 % + (‘ + CAST(CAST((SUM(qp.row_count) * 100. / SUM(qp.estimate_row_count + .00001)) - 100 AS DECIMAL(5,2)) AS VARCHAR(6)) + ‘ %)’

END AS [completed_%],

– Query execution engine

CAST(CAST(SUM(qp.elapsed_time_ms) * 100. /(SUM(SUM(qp.elapsed_time_ms)) OVER() + .00001) AS DECIMAL(5,2)) AS VARCHAR(6)) + ‘ %’ AS [total_elapsed_time_%],

CAST(CAST(SUM(qp.cpu_time_ms) * 100. /(SUM(SUM(qp.cpu_time_ms)) OVER() + .00001) AS DECIMAL(5,2)) AS VARCHAR(6)) + ‘ %’ AS [total_cpu_%],

– Storage engine

CAST(CAST(SUM(qp.logical_read_count) * 100. / SUM(SUM(qp.logical_read_count)) OVER() + .00001 AS DECIMAL(5,2)) AS VARCHAR(6)) + ‘ %’ AS [total_logical_read_%],

CAST(CAST(SUM(qp.physical_read_count) * 100. /(SUM(SUM(qp.physical_read_count)) OVER() + .00001) AS DECIMAL(5,2)) AS VARCHAR(6)) + ‘ %’ AS [total_physical_read_%],

CAST(CAST(SUM(qp.lob_logical_read_count) * 100. /(SUM(SUM(qp.lob_logical_read_count)) OVER() + .00001) AS DECIMAL(5,2)) AS VARCHAR(6)) + ‘ %’ AS [lob_logical_read_%],

CAST(CAST(SUM(qp.lob_physical_read_count) * 100. /(SUM(SUM(qp.lob_physical_read_count)) OVER() + .00001) AS DECIMAL(5,2)) AS VARCHAR(6)) + ‘ %’ AS [lob_physical_read_%],

CAST(CAST(SUM(qp.write_page_count) * 100. /(SUM(SUM(qp.write_page_count)) OVER() + .00001) AS DECIMAL(5,2)) AS VARCHAR(6)) + ‘ %’ AS [total_write_%],

SUM(qp.logical_read_count) AS total_logical_read_count,

SUM(qp.lob_logical_read_count) AS total_lob_logical_read_count,

SUM(qp.lob_physical_read_count) AS total_lob_physical_read_count,

SUM(qp.lob_read_ahead_count) AS total_lob_read_ahead_count,

SUM(qp.physical_read_count) AS total_physical_read_count,

SUM(qp.read_ahead_count) AS total_read_ahead_count,

SUM(qp.write_page_count) AS total_write_page_count,

– Both

SUM(qp.cpu_time_ms) AS total_cpu_time_ms,

SUM(qp.elapsed_time_ms) AS total_elapsed_time_ms,

– wait info information

COALESCE(wait.wait_info, ”) AS wait_info

FROM sys.dm_exec_query_profiles AS qp(NOLOCK)

LEFT JOIN sys.objects AS o(NOLOCK)

ON qp.object_id = o.object_id

LEFT JOIN sys.indexes AS i(NOLOCK)

ON qp.index_id = i.index_id

AND qp.object_id = i.object_id

OUTER APPLY

(

SELECT

STUFF(

(SELECT

‘,’ + ws.wait_type + ‘ ‘ + QUOTENAME(CAST(SUM(COALESCE(ws.wait_duration_ms, 0)) AS VARCHAR(20)) + ‘ ms’)

FROM sys.dm_exec_requests AS r(NOLOCK)

JOIN sys.dm_exec_sessions AS s(NOLOCK)

ON r.session_id = s.session_id

JOIN sys.dm_os_waiting_tasks AS ws(NOLOCK)

ON ws.session_id = s.session_id

WHERE s.session_id = qp.session_id

GROUP BY ws.wait_type

FOR XML PATH (”), TYPE).value(‘.’, ‘varchar(max)’) , 1, 1, ”

)

) AS wait(wait_info)

GROUP BY qp.node_id, session_id, qp.physical_operator_name, o.name, i.name, COALESCE(wait.wait_info, ”)

ORDER BY node_id

OPTION(RECOMPILE);

The first section of the above script concerns the general information:

blog_12_-_2_-_output_sys_dm_exec_query_profiles_3

This above script groups the information per operator. The information in brackets is the number of thread used for the concerned operator.
Completion is provided as a ratio between rows addressed by the corresponding operator so far and the total estimated rows. As we can expect, the estimated rows and the real rows addressed by an operator can differ. The above script highlights this gap by showing the difference in percentage between brackets if row_count is greater than estimated_row_count. This information can be useful to “estimate” the general completion of the query.

However Total_elasped_time_% and total_cpu_% columns are the relative percentage of the elapsed time and cpu time consumed by all the query execution operators.

3. For some physical operators like Clustered Index Scan, Table Scan etc … we will show the related table and index objects

The second section concerns the storage engine statistic for the query execution:

blog_12_-_3_-_output_sys_dm_exec_query_profiles_4

The above information can be useful to highlight the top operators depending on the resource or the operation type. For example, in my case I can notice that the cluster index scan of the bigTransactionHistory table is the main consuming physical operator for logical reads. Furthermore, we can observe that the sort operator is the root cause of spilling. Finally as expected we can notice that by design hash match operator consumes a lot of CPU and it’s the top operator in this case.

The next section shows the absolute values per operator per type of resources and operation:

blog_12_-_4_-_output_sys_dm_exec_query_profiles_5

Finally the last section shows the wait types and aggregated duration by all related threads in brackets during the query execution in a real time:

blog_12_-_5_-_output_sys_dm_exec_query_profiles_6

In my case we retrieve the wait type related to a query execution in parallel (CXPACKET) and certainly one of the root cause of the CXPACKET wait type (IO_COMPLETION) but here we don’t have sufficient information to verify it. We can just notice that the query execution generates CXPACKET and IO_COMPLETION wait types in a real time.

At this point we can ask why we don’t have a remaining time information. In fact, computing an accurate remaining time is very difficult because a lot of factors must be correlated together. We can have a naive approach by computing the remaining time with the row_count, estimated_row_count and elapsed time columns for each operator, but you will be disappointed when testing this… According to my own tests, I can claim that the result is never as accurate as you would expect. Please feel free to share your results or thoughts about computing an accurate remaining time with us!

The sys.dm_exec_query_profiles DMV can be correlated by external tools like XPERF or Perfmon. I will prepare a next blog post (as soon as I have some time) to share my experience with you.

Happy query profiling!

Cet article SQL Server 2014: sys.dm_exec_query_profiles, a new promising feature est apparu en premier sur Blog dbi services.

↧

Documentum upgrade project – ActiveX and D2-Client 3.1Sp1

September 14, 2014, 6:31 pm

≫ Next: Oracle OPatch and “Archive not applied” error

≪ Previous: SQL Server 2014: sys.dm_exec_query_profiles, a new promising feature

This is another blog posting an our Documentum upgrade project. This time, the following issue occured: the ActiveX could not be installed using the D2-Client. We had to access the D2-Config url to have it installed. For a normal user, this could not be used.

Analyzes

The workstation had the ActiveX for D2 3.0 installed, the version before the upgrade. Under C:\Windows\Downloaded Program Files, we had:

ctx
ctx.ocx
D2UIHelper.dll

On my workstation where I could install (using D2-Config) the D2 3.1.1 ActiveX, I also had C:\Windows\Downloaded Program Files\CONFLICT.* folders containing D2UIHelper.dll and ctx.inf

By checking the content of ctx.inf of this new cab I saw that we had the wrong version (see FileVersion) of the

[ctx.ocx] file-win32-x86=thiscab
RegisterServer=yes
clsid={8C55AA13-D7D9-4539-8B20-78BC4A795681}
DestDir=
FileVersion=3,0,0,2

By checking the “ctx.cab” file in “D2-Client/install” and “D2-Config/install” on the application server I found that we did not have the same version, both ctx-cab had the same date and size but the digital signature was different:

D2-Config ctx-cab: &8206;17 &8206;September &8206;2013 10:56:11,
D2-Client: 19 &8206;April &8206;2013 17:03:08

Solution

To solve the issue I copied the ctx.cab” from “D2-Config/install” path to “D2-Client/install/”. Once this was done the activeX could be installed using the D2-Client url.

It was confirmed by the vendor that this is a bug in the delivered package

Cet article Documentum upgrade project – ActiveX and D2-Client 3.1Sp1 est apparu en premier sur Blog dbi services.

↧

Oracle OPatch and “Archive not applied” error

January 19, 2015, 4:10 pm

≫ Next: quickly exchange code or text between workstations or teams

≪ Previous: Documentum upgrade project – ActiveX and D2-Client 3.1Sp1

Recently, I have installed an Oracle 11.2.0.4 RDBMS on a server, and I wanted to install the latest Patch Set Update (PSU) 11.2.0.4.4 from October 2014. During the execution, I fell on the error “Archive not applied” and the job failed. This is how to quickly fix this issue.

While installing the PSU 11.2.0.4.4 from October 2014 on my Oracle 11.2.0.4 database server, I experienced the following error:

The following actions have failed: Archive not applied /oracle/soft/11.2.0.4/psu 11.2.0.4.4/19791364/19121551/17478514/files/lib/libserver11.a/kjb.o to /u00/app/oracle/product/11.2.0/db_4_0/lib/libserver11.a… ” Archive not applied /oracle/soft/11.2.0.4/psu 11.2.0.4.4/19791364/19121551/17478514/files/lib/libpls11.a/pevmexe.o to /u00/app/oracle/product/11.2.0/db_4_0/lib/libpls11.a… ” Archive not applied /oracle/soft/11.2.0.4/psu 11.2.0.4.4/19791364/19121551/17478514/files/lib/libclient11.a/knoggcap.o to /u00/app/oracle/product/11.2.0/db_4_0/lib/libclient11.a… ” Archive not applied /oracle/soft/11.2.0.4/psu 11.2.0.4.4/19791364/19121551/17478514/files/lib/libgeneric11.a/qcd.o to /u00/app/oracle/product/11.2.0/db_4_0/lib/libgeneric11.a… ” Archive not applied /oracle/soft/11.2.0.4/psu 11.2.0.4.4/19791364/19121551/17478514/files/lib/libpls11_pic.a/pevmexe_pic.o to /u00/app/oracle/product/11.2.0/db_4_0/lib/libpls11_pic.a… ” Archive not applied /oracle/soft/11.2.0.4/psu 11.2.0.4.4/19791364/19121551/17478514/files/lib/libxml11.a/lpxm.o to /u00/app/oracle/product/11.2.0/db_4_0/lib/libxml11.a… ”

Do you want to proceed? [y|n] n

Typically, this error appears when there are permission issues in the Oracle directories. Here, the /u00/app/oracle/product/11.2.0/db_4_0/lib directory seemed to by the cause of my error. So I checked my environment. By invoking OPatch with the Oracle user, access should be gained to that directory:

oracle@srvora01 ~]$ id uid=54321(oracle) gid=54321(oinstall) groups=54321(oinstall),54322(dba)

I am connected as “oracle” user, member of the “oinstall” group.

[oracle@srvora01 ~]$ ls -ld /u00/app/oracle/product/11.2.0/db_4_0/lib drwxr-xr-x 4 oracle oinstall 12288 Jan 16 13:10 /u00/app/oracle/product/11.2.0/db_4_0/lib/oracle/soft/11.2.0.4/psu 11.2.0.4.4/19791364/19121551/

As you can see, my “lib” directory is owned by “oracle” with the “oinstall” group. So there are definitely no issues with permissions here :-(
I was very surpised during a few minutes! But I discovered an issue in the path of the patch I wanted to deploy:

[oracle@srvora01 19121551]$ pwd /oracle/soft/11.2.0.4/psu 11.2.0.4.4/19791364/19121551/

Do you see my mistake? I used a space character in the name of the directory containing the patch to apply. And OPatch does not like space characters, definitely…

Once my mistake discovered, I just had to rename my directory and run the OPatch again:

[oracle@srvora01 ~]$ mv “/oracle/soft/11.2.0.4/psu 11.2.0.4.4/” “/oracle/soft/11.2.0.4/psu_11.2.0.4.4/“

[oracle@vmreforadg01 ~]$ cd /oracle/soft/11.2.0.4/psu_11.2.0.4.4/19791364/19121551 [oracle@vmreforadg01 19121551]$ opatch apply Oracle Interim Patch Installer version 11.2.0.3.6 Copyright (c) 2013, Oracle Corporation. All rights reserved. …

The patch was installed successfully:

[oracle@vmreforadg01 19121551]$ opatch lsinventory Oracle Interim Patch Installer version 11.2.0.3.6 Copyright (c) 2013, Oracle Corporation. All rights reserved. Oracle Home : /u00/app/oracle/product/11.2.0/db_4_0 Central Inventory : /u00/app/oraInventory from : /u00/app/oracle/product/11.2.0/db_4_0/oraInst.loc OPatch version : 11.2.0.3.6 OUI version : 11.2.0.4.0 Log file location : /u00/app/oracle/product/11.2.0/db_4_0/cfgtoollogs/opatch/opatch2015-01-16_13-33-02PM_1.log Lsinventory Output file location : /u00/app/oracle/product/11.2.0/db_4_0/cfgtoollogs/opatch/lsinv/lsinventory2015-01-16_13-33-02PM.txt ——————————————————————————– Installed Top-level Products (1): Oracle Database 11g 11.2.0.4.0 There are 1 product(s) installed in this Oracle Home. Interim patches (1) : Patch 19121551 : applied on Fri Jan 16 13:10:20 CET 2015 Unique Patch ID: 17949166 Patch description: “Database Patch Set Update : 11.2.0.4.4 (19121551)” Created on 6 Oct 2014, 10:07:57 hrs PST8PDT Sub-patch 18522509; “Database Patch Set Update : 11.2.0.4.3 (18522509)” Sub-patch 18031668; “Database Patch Set Update : 11.2.0.4.2 (18031668)” Sub-patch 17478514; “Database Patch Set Update : 11.2.0.4.1 (17478514)” Bugs fixed: 17288409, 17205719, 17811429, 17754782, 17726838, 13364795, 17311728 17284817, 17441661, 13645875, 18199537, 16992075, 16542886, 17446237 14565184, 17071721, 17610798, 17375354, 17449815, 17397545, 19463897 18230522, 17235750, 16360112, 13866822, 17982555, 17478514, 12905058 14338435, 13944971, 16929165, 12747740, 17546973, 14054676, 17088068 18264060, 17343514, 17016369, 17042658, 14602788, 14657740, 17332800 19211724, 13951456, 16315398, 17186905, 18744139, 16850630, 17437634 19049453, 18673304, 17883081, 18641419, 17296856, 18262334, 17006183 18277454, 17232014, 16855292, 10136473, 17705023, 17865671, 18554871 19121551, 17588480, 17551709, 17344412, 17842825, 18681862, 17390160 13955826, 13609098, 18139690, 17501491, 17239687, 17752121, 17299889 17602269, 18673325, 17313525, 17242746, 19544839, 17600719, 18191164 17571306, 19466309, 17951233, 18094246, 17165204, 17011832, 17040527 16785708, 16180763, 17477958, 17174582, 17465741, 18522509, 17323222 19463893, 16875449, 16524926, 17237521, 17596908, 17811438, 17811447 18031668, 16912439, 16494615, 18061914, 17545847, 17082359, 19554106 17614134, 17341326, 17891946, 19458377, 17716305, 17752995, 16392068 19271443, 17767676, 17614227, 17040764, 17381384, 18973907, 18673342 14084247, 17389192, 17006570, 17612828, 17721717, 13853126, 18203837 17390431, 17570240, 14245531, 16043574, 16863422, 19727057, 17468141 17786518, 17037130, 17267114, 18203838, 16198143, 16956380, 17478145 14829250, 17394950, 17027426, 16268425, 18247991, 19584068, 14458214 18436307, 17265217, 13498382, 16692232, 17786278, 17227277, 16042673 16314254, 17443671, 16228604, 16837842, 17393683, 17787259, 18009564 15861775, 16399083, 18018515, 16472716, 17050888, 14010183, 17325413 16613964, 17080436, 17036973, 17761775, 16721594, 18280813, 15979965 18203835, 17297939, 16731148, 17811456, 14133975, 17385178, 17586955 16450169, 17655634, 9756271, 17892268, 17648596, 16220077, 16069901 11733603, 16285691, 17587063, 18180390, 17393915, 18096714, 17238511 17824637, 14285317, 19289642, 14764829, 18328509, 17622427, 16943711 17346671, 18996843, 14852021, 17783588, 16618694, 17672719, 17546761 ——————————————————————————– OPatch succeeded.

Cet article Oracle OPatch and “Archive not applied” error est apparu en premier sur Blog dbi services.

↧

quickly exchange code or text between workstations or teams

June 12, 2015, 5:10 am

≫ Next: SSISDB is too big and it is not every time the retention period’s fault!

≪ Previous: Oracle OPatch and “Archive not applied” error

In a recent project I faced the following situation: One the one hand I had to execute scripts on a customer’s workstation while on the other hand I had to integrate the results of these scripts into a report on my own workstation. The question was how to efficiently do this without sending dozens of mails to myself. One solution would be to create one script which produces all output and send the results by mail (if there is a customer’s email account or there is access to my company’s webmail). But having all the results for different databases in one file or one file per database makes it harder to write the report as the report is structured by topic and database. A lot of scrolling forward and backwards is necessary.
Then I remembered a tool I used in the past which is called collabedit:

Simple create a new document and share the generated link with whoever you want. As soon as someone makes modifications the screen will be refreshed for all members.

This allowed me to pass scripts and the results between me and me(2) and integrate the results immediately into my report.
No need to say that this should not be used for confidential data. But for working on the same piece of text or code this a great tool. It even provides a chat where e.g. explain plans might be discussed while highlighting problematic parts in the main window. Far better than email discussions
Give it a try …

Cet article quickly exchange code or text between workstations or teams est apparu en premier sur Blog dbi services.

↧

SSISDB is too big and it is not every time the retention period’s fault!

October 9, 2015, 3:08 am

≫ Next: Delete an orphan user-database under SSISDB

≪ Previous: quickly exchange code or text between workstations or teams

In all blogs that I read, the solution is just to change the retention period to a reasonable value.
Yes, it is true but in some case you need to go deeper in the analysis…
Like you, I changed the retention period, ran the cleanup but finally, I had always a big database… Why, Why, Why?
My first step for this analysis, is to see in the SSISDB, the Disk Usage by Table Reports from SSMS.

I see that table internal.execution_component_phases is the biggest.
I select in this table what are the packages and how many entries per package I have in this table…
Query used:

select distinct package_name, count(*) as number_of_entry from internal.execution_component_phases group by package_name

In my case, the “Package1.dtsx” has 3756828 entries.
I go deeper with this package and search how many entries I have with my package per execution.
Query used:

select distinct execution_id, count(*) from internal.execution_component_phases where package_name='package1.dtsx' group by execution_id

It is for each execution almost the same.
I guess it comes from an execution parameter.
I select all parameters from one execution (id=719).
Query used:

SELECT * FROM catalog.execution_parameter_values where execution_id = 719

I see in my result, that the LOGGING_LEVEL is set to 2 and not 1 the default value…

To confirm my theory, I check this parameter for all executions.
Query used:

SELECT distinct ecp.execution_id, epv.parameter_name,epv.parameter_value
FROM catalog.execution_parameter_values epv inner join internal.execution_component_phases ecp
on ecp.execution_id=epv.execution_id where epv.parameter_name='LOGGING_LEVEL'

And as you can see, the LOGGING_LEVEL is always set to 2.

What is the signification of a LOGGING_LEVEL=2?
I go directly to the msdn to read the logging level on “Enable Logging for Package Execution on the SSIS Server”
In the documentation, you can read that the Logging Level “Performance” is linked to the catalog.execution_component_phases and the start and end times for the data flow components are logged for each phase of an execution. But this view is also available for the logging level “Verbose”…
In this webpage, I haven’t the match between the logging level and its number.
I find the match in the webpage from the “catalog.set_execution_parameter_value” and see that the 2 is the “Performance” Logging level.

Conclusion: I contacted the package owner and explained him the case. I asked him why this level of logging and he said me that he thought that was to have performance… He didn’t read correctly the documentation!
After the explanation of the logging level set to performance, he changed the logging level to Basic and the SSISDB grew normally again. ;-)

Cet article SSISDB is too big and it is not every time the retention period’s fault! est apparu en premier sur Blog dbi services.

↧

Delete an orphan user-database under SSISDB

December 24, 2015, 6:37 am

≫ Next: SQL Server Tips: Default trace enabled but no file is active…

≪ Previous: SSISDB is too big and it is not every time the retention period’s fault!

I post a little tips for orphan database-users that we cannot drop so easily in the SSISDB…
I discovered this by a customer and it was very tricky to find the explanation and the solution.
This is the reason why I will share it with you…

CONTEXT

All morning, a policy checks orphan database user on all databases from all instances through the Enterprise Policy Management (EPM) Framework.
If the number of orphan database user is more than 0, I have a failed policy per databases…
For information, an orphan database user is a database user with no link to a login.
Few weeks ago, I have installed a dedicated server for SSIS with the database SSISDB and this week, the policy failed.

To check if a user is orphan, I run the script from the policy:

SELECT * FROM  sys.database_principals a
LEFT OUTER JOIN sys.server_principals b ON a.sid = b.sid
WHERE b.sid IS NULL AND   a.type In ('U', 'G') AND   a.principal_id > 4

The result is one user-database:
Name: dbi\orphan_user
Type_desc: WINDOWS_USER
Default_schema_name: dbo
…

And now, I will delete the orphan database user with the classical DROP command:

USE [SSISDB]
DROP USER [dbi\orphan_user]
GO

But I haven’t a good result for my query :-(
I receive an Error:

Msg 27226, Level 16, State 1, Procedure ddl_cleanup_object_permissions, Line 16
The database principal has granted or denied permissions to catalog objects in the database and cannot be dropped
Msg 3609, Level 16, State 2, Line 1
The transaction ended in the trigger. The batch has been aborted.”

STEP 1

The first Step to analyze this error is to see the trigger “ddl_cleanup_object_permissions” from the Error message:

As you can see on the picture, you have in the trigger, 5 views/tables:

[internal].[object_permissions] –> view
[internal].[folder_permissions] –> table
[internal].[project_permissions] –> table
[internal].[environment_permissions] –> table
[internal].[operation_permissions] –> table

[internal].[object_permissions] is a view created with the 4 tables [internal].[folder_permissions], [internal].[project_permissions], [internal].[environment_permissions] and [internal].[operation_permissions] linked by ‘UNION ALL’ commands

I will search what permissions have this user.
I create this script to have a view between the Object Type (folder, project, environment or operation), permission associated and the user name:

SELECT  CASE (ObjPerm.object_type) 
                     WHEN 1 THEN 'folder' 
                     WHEN 2 THEN 'project' 
                     WHEN 3 THEN 'environment' 
                     WHEN 4 THEN 'operation' 
              END AS [object_type],
CASE (ObjPerm.permission_type)
                     WHEN 1 THEN 'READ' 
                     WHEN 2 THEN 'MODIFY' 
                     WHEN 3 THEN 'EXECUTE' 
                     WHEN 4 THEN 'MANAGE_PERMISSIONS' 
                     WHEN 100 THEN 'CREATE_OBJECTS' 
                     WHEN 101 THEN 'READ_OBJECTS' 
                     WHEN 102 THEN 'MODIFY_OBJECTS' 
                     WHEN 103 THEN 'EXECUTE_OBJECTS' 
                     WHEN 104 THEN 'MANAGE_OBJECT_PERMISSIONS' 
END AS [permission_description],
Princ.Name AS [database_user_name] 
FROM [internal].[object_permissions] ObjPerm 
JOIN sys.server_principals Princ 
ON ObjPerm.sid = Princ.sid
WHERE Princ.Name='dbi\orphan_user' 
ORDER BY [object_type] DESC,[database_user_name],[permission_description]

In this case, I have no result…

MSDN Reference for the object type matrix and the permission matrix in the “catalog.grant_permission” webpage

STEP 2

The second step is to verify the specific object permission tables with this script:

/*Folder Permissions*/
SELECT fo.*,p.name
FROM internal.folder_permissions fo
INNER JOIN sys.database_principals p on fo.[sid] = p.[sid]
WHERE p.name = 'dbi\orphan_user'
/*Project Permissions*/
SELECT pr.*,p.name
FROM internal.project_permissions pr
INNER JOIN sys.database_principals p on pr.[sid] = p.[sid]
WHERE p.name = 'dbi\orphan_user'
/*Environment Permissions*/
SELECT en.*,p.name
FROM internal.environment_permissions en
INNER JOIN sys.database_principals p on en.[sid] = p.[sid]
WHERE p.name = 'dbi\orphan_user'
/*Operation Permissions*/
SELECT op.*,p.name
FROM internal.operation_permissions op
INNER JOIN sys.database_principals p on op.[sid] = p.[sid]
WHERE p.name = 'dbi\orphan_user'

Only one return a result. In this case, it is the query with internal.operation_permissions.
I can deduct that I have some “ghost rows” in this table.

STEP 3

The third step is to delete these “ghost rows” but before don’t forget every time to back up the database SSISDB.
The ugly step is now to delete all rows in the internal.operation_permissions with the orphan user ‘dbi\orphan_user’ I use the sid from the user to delete it

DELETE internal.operation_permissions WHERE sid = 0x01050000000000051…

Logically, if you have others rows in the other object types, you can also delete rows.
The last step is really to drop the database user and now, it works!

Cet article Delete an orphan user-database under SSISDB est apparu en premier sur Blog dbi services.

↧

SQL Server Tips: Default trace enabled but no file is active…

January 14, 2016, 7:16 am

≫ Next: SQL Plan Directives strike again

≪ Previous: Delete an orphan user-database under SSISDB

I discover a strange error with a customer about the default trace.
As you know, SQL Server has a default trace with a lot of events like the data and log file growth.
This trace is enabled by default and you have five .trc files located in the installation directory of SQL Server.

In my case, I search through the trace file, the growth history of data files with this query:

DECLARE @filename NVARCHAR(1000);
DECLARE @bc INT;
DECLARE @ec INT;
DECLARE @bfn VARCHAR(1000);
DECLARE @efn VARCHAR(10);
 
-- Get the name of the current default trace
SELECT @filename = CAST(value AS NVARCHAR(1000))
FROM ::fn_trace_getinfo(DEFAULT)
WHERE traceid = 1 AND property = 2;

-- rip apart file name into pieces
SET @filename = REVERSE(@filename);
SET @bc = CHARINDEX('.',@filename);
SET @ec = CHARINDEX('_',@filename)+1;
SET @efn = REVERSE(SUBSTRING(@filename,1,@bc));
SET @bfn = REVERSE(SUBSTRING(@filename,@ec,LEN(@filename)));

-- set filename without rollover number
SET @filename = @bfn + @efn
IF EXISTS (SELECT * FROM ::fn_trace_gettable(@filename, DEFAULT) AS ftg 
               WHERE (EventClass = 92  -- Date File Auto-grow
                   OR EventClass = 93) -- Log File Auto-grow
                  AND StartTime > DATEADD(dy,-7,GETDATE())) 

  BEGIN -- If there are autogrows in the last day 
	SELECT * INTO #temp_trc FROM ::fn_trace_gettable(@filename, DEFAULT) AS ftg 
	WHERE (EventClass = 92  -- Date File Auto-grow
        OR EventClass = 93) -- Log File Auto-grow
       AND StartTime > DATEADD(dy,-7,GETDATE())

	Select  CONVERT(varchar,StartTime,102) as grow_date,DatabaseName as database_name,FileName as file_name,te.name as event_class,Duration/1000  as Duration,(tmp.IntegerData*8)/1024.0 as size_MB into #temp_trc2 from #temp_trc AS tmp 
               INNER JOIN sys.trace_events AS te ON tmp.EventClass = te.trace_event_id  

	SELECT grow_date,event_class,database_name,file_name,Count(Duration),Convert(int,Sum(size_MB)) , Sum(Duration) 
	from #temp_trc2 GROUP BY grow_date,database_name,file_name,event_class		
    ORDER BY grow_date, database_name
	
	DROP TABLE #temp_trc2
	DROP TABLE #temp_trc
  END

But I got no result…
The first step was to verify if the default trace is enabled with the command:

SELECT * FROM sys.configurations WHERE name=’default trace enable’

It is enabled, then I check the current running trace with the view sys.traces

SELECT * FROM sys.traces

As you can see, I have no file returned by the view.
If I check directly the installation directory, I see the five trace files.
This situation is very strange….
The real question is “What can I do?”
The solution was very simple.
I disable and re-enable the default trace with the sp_configure:

EXEC sp_configure ‘show advanced options’,1;
GO
RECONFIGURE WITH OVERRIDE;
GO
EXEC sp_configure ‘default trace enabled,0;
GO
RECONFIGURE WITH OVERRIDE;
GO
EXEC sp_configure ‘default trace enabled,1;
GO
RECONFIGURE WITH OVERRIDE;
GO
EXEC sp_configure ‘show advanced options’,0;
GO
RECONFIGURE WITH OVERRIDE;
GO

After this, I re-run the query with the view sys.traces and I see that I have a current running trace file with a start_time from now.

To be sure, I use the function fr_trace_getinfo to find the current running trace file

To prevent this problem, I add a condition:

IF (select count(*) FROM sys.traces ) >0

The entire query:

DECLARE @filename NVARCHAR(1000);
DECLARE @bc INT;
DECLARE @ec INT;
DECLARE @bfn VARCHAR(1000);
DECLARE @efn VARCHAR(10);

IF (select count(*) FROM sys.traces ) >0
BEGIN
 
-- Get the name of the current default trace
SELECT @filename = CAST(value AS NVARCHAR(1000))
FROM ::fn_trace_getinfo(DEFAULT)
WHERE traceid = 1 AND property = 2;

-- rip apart file name into pieces
SET @filename = REVERSE(@filename);
SET @bc = CHARINDEX('.',@filename);
SET @ec = CHARINDEX('_',@filename)+1;
SET @efn = REVERSE(SUBSTRING(@filename,1,@bc));
SET @bfn = REVERSE(SUBSTRING(@filename,@ec,LEN(@filename)));

-- set filename without rollover number
SET @filename = @bfn + @efn
IF EXISTS (SELECT * FROM ::fn_trace_gettable(@filename, DEFAULT) AS ftg 
               WHERE (EventClass = 92  -- Date File Auto-grow
                   OR EventClass = 93) -- Log File Auto-grow
                  AND StartTime > DATEADD(dy,-7,GETDATE())) 

  BEGIN -- If there are autogrows in the last day 
	SELECT * INTO #temp_trc FROM ::fn_trace_gettable(@filename, DEFAULT) AS ftg 
	WHERE (EventClass = 92  -- Date File Auto-grow
        OR EventClass = 93) -- Log File Auto-grow
       AND StartTime > DATEADD(dy,-7,GETDATE())

	Select  CONVERT(varchar,StartTime,102) as grow_date,DatabaseName as database_name,FileName as file_name,te.name as event_class,Duration/1000  as Duration,(tmp.IntegerData*8)/1024.0 as size_MB into #temp_trc2 from #temp_trc AS tmp 
               INNER JOIN sys.trace_events AS te ON tmp.EventClass = te.trace_event_id  

	SELECT grow_date,event_class,database_name,file_name,Count(Duration),Convert(int,Sum(size_MB)) , Sum(Duration) 
	from #temp_trc2 GROUP BY grow_date,database_name,file_name,event_class		
    ORDER BY grow_date, database_name
	
	DROP TABLE #temp_trc2
	DROP TABLE #temp_trc
  END
  END

After that, I can read the trace file to find all grows (Event Class 92 and 93) without an error due to the missing current running trace file.
To be sure, I add a policy to check every day if a current running trace file is in place.

Cet article SQL Server Tips: Default trace enabled but no file is active… est apparu en premier sur Blog dbi services.

↧

SQL Plan Directives strike again

February 4, 2016, 8:51 am

≫ Next: Documentum Administrator UCF Troubleshooting

≪ Previous: SQL Server Tips: Default trace enabled but no file is active…

12c SQL Plan Directives, a side effect, a bad application design, a bug… and the occasion to show how to quickly troubleshoot.
An application has long response time since 12c migration. No blind guesses, no reason to compare with previous version, let’s just troubleshoot performance with methodical approach.

Time Model

You should already know that, I often start from a Statspack or AWR report. I check first what the DB Time is used for.
Time Model System Stats DB/Inst: EPNP/EPNP Snaps: 1-8 -> Ordered by % of DB time desc, Statistic name Statistic Time (s) % DB time ----------------------------------- -------------------- --------- DB CPU 3,254.3 101.2 parse time elapsed 2,155.6 67.1 hard parse elapsed time 2,029.8 63.1 sql execute elapsed time 1,007.6 31.3 connection management call elapsed 24.8 .8 PL/SQL compilation elapsed time 0.3 .0 PL/SQL execution elapsed time 0.1 .0 hard parse (sharing criteria) elaps 0.1 .0 sequence load elapsed time 0.0 .0 failed parse elapsed time 0.0 .0 repeated bind elapsed time 0.0 .0

This is hard parsing. Let’s go to SQL sections.

SQL ordered by Parse Calls

SQL ordered by Parse Calls DB/Inst: EPNP/EPNP Snaps: 1-8 -> End Parse Calls Threshold: 1000 Total Parse Calls: 5,229,021 -> Captured SQL accounts for 86.2% of Total Parse Calls -> SQL reported below exceeded 1.0% of Total Parse Calls % Total Old Parse Calls Executions Parses Hash Value ------------ ------------ -------- ---------- 4,094,739 4,094,712 78.31 3360804353 select default$ from col$ where rowid=:1

This is an internal statement. First idea: check My Oracle Support.
That’s very similar to bug 20907061 HIGH # OF EXECUTIONS FOR RECURSIVE CALL ON COL$ except that this bug is supposed to happen when we have default values larger than 32 bytes and I don’t have default values on this application. Let’s continue investigation.

tkprof

I sql_trace the session at level 4 in order to get the bind values for that rowid, so that I can check which column is concerned.
Let’s tkprof first to see if my trace shows the same symptoms:
SQL ID: 47r1y8yn34jmj Plan Hash: 2191121161 select default$ from col$ where rowid=:1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 362494 2.43 3.11 0 0 0 0 Execute 362494 14.61 17.13 0 0 0 0 Fetch 362494 2.33 2.82 0 724988 0 362494 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 1087482 19.38 23.07 0 724988 0 362494 Misses in library cache during parse: 0 Optimizer mode: CHOOSE Parsing user id: SYS (recursive depth: 1) Number of plan statistics captured: 1
This looks like the bug: dictionary information is not kept in row cache, which means lot of buffer gets.

Raw trace

From the raw trace I’m interested about the bind value for that statement:
PARSING IN CURSOR #139947924936336 len=40 dep=1 uid=0 oct=3 lid=0 tim=30754233695312 hv=2821867121 ad='13fc1ec80' sqlid='47r1y8yn34jmj' select default$ from col$ where rowid=:1 END OF STMT PARSE #139947924936336:c=0,e=99,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=2191121161,tim=30754233695310 BINDS #139947924936336: Bind#0 oacdty=11 mxl=16(16) mxlc=00 mal=00 scl=00 pre=00 oacflg=18 fl2=0001 frm=00 csi=00 siz=16 off=0 kxsbbbfp=7f482a5a5250 bln=16 avl=16 flg=05 value=000093EB.000F.0001 EXEC #139947924936336:c=0,e=136,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=2191121161,tim=30754233695589 FETCH #139947924936336:c=0,e=62,p=0,cr=2,cu=0,mis=0,r=1,dep=1,og=4,plh=2191121161,tim=30754233695686 STAT #139947924936336 id=1 cnt=1 pid=0 pos=1 obj=21 op='TABLE ACCESS BY USER ROWID COL$ (cr=1 pr=0 pw=0 time=49 us cost=1 size=15 card=1)' CLOSE #139947924936336:c=0,e=74,dep=1,type=1,tim=30754233695857 PARSE #139947924936336:c=0,e=19,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=2191121161,tim=30754233695986 BINDS #139947924936336: Bind#0 oacdty=11 mxl=16(16) mxlc=00 mal=00 scl=00 pre=00 oacflg=18 fl2=0001 frm=00 csi=00 siz=16 off=0 kxsbbbfp=7f482a5a5250 bln=16 avl=16 flg=05 value=000093EA.0001.0001 EXEC #139947924936336:c=0,e=104,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=2191121161,tim=30754233696151 FETCH #139947924936336:c=1000,e=30,p=0,cr=2,cu=0,mis=0,r=1,dep=1,og=4,plh=2191121161,tim=30754233696212 CLOSE #139947924936336:c=0,e=14,dep=1,type=3,tim=30754233696293
Remember Oracle 7 rowid format? 000093EB.000F.0001 is block number . row number . file number

My Oracle Support

Default value

then let’s get the COL$ (data_object_id of COL$ is 2) row for it:
SQL> select owner,object_name,name,default$ from col$ join dba_objects on obj#=object_id where col$.rowid=DBMS_ROWID.ROWID_CREATE (1,2,1,37866,1); OWNER OBJECT_NAME NAME DEFAULT$ ---------- -------------------- ---------------------------------------- ---------------------------------------- XXXX TABLES SYS_STSFZ3ZG3BYVJUHOAOR7U6TO36 SYS_OP_COMBINED_HASH("TYPE","IDENT_PAP")
Okay. I’ve no default value for the columns I defined. But the system has created his own ones. Virtual columns for extended statistics. Column groups are defined as a hash of columns.

Extensions

And actually I’ve a lot of extensions on that table:
select * from dba_stat_extensions where owner='XXXX' and table_name='TABLES'; OWNER TABLE_NAME EXTENSION_NAME EXTENSION CREATO DRO ---------- -------------------- ---------------------------------------- ------------------------------------------------------------ ------ --- XXXX TABLES SYS_STSAZ#Y734_B7NAVBQRCV_07KJ ("NUMERO","IDENT_PAP") SYSTEM YES XXXX TABLES SYS_STSFZ3ZG3BYVJUHOAOR7U6TO36 ("TYPE","IDENT_PAP") SYSTEM YES XXXX TABLES SYS_STSXVAD3SBEYJ$KNF69JWC$QSV ("LOCAL","TYPE","IDENT_PAP") SYSTEM YES XXXX TABLES SYS_STSW2#AXTKKSSH0MM4NR$E_YS9 ("TYPE","IDENT_PAP","IDENT_PAP_BIS","X","Y") SYSTEM YES XXXX TABLES SYS_STSNBBWV$TKD1323KXM5YZ7KPL ("LOCAL","IDENT_PAP") SYSTEM YES XXXX TABLES SYS_STSFCX_6I8KOFMDY_IY3#64I2H ("NUMERO","TYPE","IDENT_PAP","LONGUEUR","NUMERO_BIS") SYSTEM YES XXXX TABLES SYS_STSR#78FTBPXMUCCTLJSODS846 ("NUMERO","TYPE","IDENT_PAP","IDENT_PAP_BIS") SYSTEM YES XXXX TABLES SYS_STSV5C$ERVCXVPQ_WWFC$B4FD0 ("TYPE","IDENT_PAP","IDENT_PAP_BIS") SYSTEM YES XXXX TABLES SYS_STSYPBYX4YV907UXQ5QG4R2N4G ("IDENT_PAP","NIVEAU") SYSTEM YES 9 rows selected.

SQL Plan Directives

If you know how SQL Plan directives work, then you know where those column groups are coming from.
If you don’t know, then it’s in Baden (Switzerland) on 2nd of March at 15:00 http://www.soug.ch/events/020316-sougday-agenda.html

Actually, 12c can create lot of column groups for you:

SQL> select directive_id,type,state,reason DIRECTIVE_ID TYPE STATE -------------------- ---------------- 8794114115142958506 DYNAMIC_SAMPLING PERMANENT 8117485436508017308 DYNAMIC_SAMPLING HAS_STATS 17449438648188877549 DYNAMIC_SAMPLING HAS_STATS 1651449127980030174 DYNAMIC_SAMPLING PERMANENT 7306874980619572993 DYNAMIC_SAMPLING PERMANENT 15367585934080234683 DYNAMIC_SAMPLING PERMANENT 6984979082531240597 DYNAMIC_SAMPLING PERMANENT 11591426134547187869 DYNAMIC_SAMPLING PERMANENT 7350444383897437289 DYNAMIC_SAMPLING PERMANENT 14964006389450232792 DYNAMIC_SAMPLING PERMANENT 6994764429265082323 DYNAMIC_SAMPLING PERMANENT 7446442670215314532 DYNAMIC_SAMPLING HAS_STATS 1756676544008628542 DYNAMIC_SAMPLING HAS_STATS 10941525217270092916 DYNAMIC_SAMPLING PERMANENT 219069820824454127 DYNAMIC_SAMPLING NEW 7377491717956776334 DYNAMIC_SAMPLING HAS_STATS 233608853702643127 DYNAMIC_SAMPLING PERMANENT 2552098580334338460 DYNAMIC_SAMPLING PERMANENT 11789724805619074802 DYNAMIC_SAMPLING PERMANENT 16211053123545351781 DYNAMIC_SAMPLING HAS_STATS 15215871106139158771 DYNAMIC_SAMPLING HAS_STATS 15710833686769870070 DYNAMIC_SAMPLING PERMANENT 5189294152039066378 DYNAMIC_SAMPLING PERMANENT 798588162556407282 DYNAMIC_SAMPLING PERMANENT 13022929319971523184 DYNAMIC_SAMPLING NEW 722059721778001206 DYNAMIC_SAMPLING NEW 14359263752097676624 DYNAMIC_SAMPLING HAS_STATS 8856200028259655090 DYNAMIC_SAMPLING HAS_STATS 10989662787548036325 DYNAMIC_SAMPLING HAS_STATS 13019616966644728092 DYNAMIC_SAMPLING PERMANENT 9949611300156421363 DYNAMIC_SAMPLING HAS_STATS 16724606582023948887 DYNAMIC_SAMPLING PERMANENT 9529470199340570651 DYNAMIC_SAMPLING PERMANENT 14968714134736676769 DYNAMIC_SAMPLING PERMANENT 17207666278887909291 DYNAMIC_SAMPLING PERMANENT 2085721135336820101 DYNAMIC_SAMPLING PERMANENT 3399278268320241269 DYNAMIC_SAMPLING PERMANENT 4717659897959055394 DYNAMIC_SAMPLING PERMANENT 1370532447375937784 DYNAMIC_SAMPLING PERMANENT 7480183286602769305 DYNAMIC_SAMPLING PERMANENT 13318940391279153288 DYNAMIC_SAMPLING HAS_STATS 10981064134896750754 DYNAMIC_SAMPLING NEW 13826770130463026145 DYNAMIC_SAMPLING HAS_STATS 10629713882167842929 DYNAMIC_SAMPLING HAS_STATS 334431847807025603 DYNAMIC_SAMPLING HAS_STATS 16381543551256728378 DYNAMIC_SAMPLING HAS_STATS 12085920252231302395 DYNAMIC_SAMPLING HAS_STATS 14143819104063380925 DYNAMIC_SAMPLING HAS_STATS 15133868083663596886 DYNAMIC_SAMPLING PERMANENT 49 rows selected. from dba_sql_plan_directives where directive_id in(select directive_id from dba_sql_plan_dir_objects where owner='XXXX' and object_name='TABLES' );
REASON
------------- ------------------------------------
JOIN CARDINALITY MISESTIMATE
JOIN CARDINALITY MISESTIMATE
JOIN CARDINALITY MISESTIMATE
JOIN CARDINALITY MISESTIMATE
GROUP BY CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE
GROUP BY CARDINALITY MISESTIMATE
JOIN CARDINALITY MISESTIMATE
JOIN CARDINALITY MISESTIMATE
GROUP BY CARDINALITY MISESTIMATE
JOIN CARDINALITY MISESTIMATE
JOIN CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE
JOIN CARDINALITY MISESTIMATE
GROUP BY CARDINALITY MISESTIMATE
JOIN CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE
GROUP BY CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE
JOIN CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE
JOIN CARDINALITY MISESTIMATE
JOIN CARDINALITY MISESTIMATE
GROUP BY CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE
GROUP BY CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE
GROUP BY CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE
JOIN CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE
GROUP BY CARDINALITY MISESTIMATE
JOIN CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE
GROUP BY CARDINALITY MISESTIMATE
JOIN CARDINALITY MISESTIMATE
JOIN CARDINALITY MISESTIMATE
JOIN CARDINALITY MISESTIMATE
JOIN CARDINALITY MISESTIMATE
SINGLE TABLE CARDINALITY MISESTIMATE

Yes… 49 column groups for only 15 columns…
Maybe you think that it’s very good because without them the estimations were bad and execution plans not optimal?
Then let me tell you that it’s a migration from 9i and users were happy with performance before we migrate to 12c

SYS_OP_COMBINED_HASH(column names) makes a default value larger than 32 bytes. Because of bug, they are not cached in row cache and this increase a lot the parsing time. The application is not using bind variables and parses too much. It was ok in 9i but not in 12c.

Conclusion

In that case, the solution is both to change the application to use bind variable (it’s perl, very easy) and apply the patch.
I’m sharing that because it’s a good illustration of what side effects can do, as well as a good example of methodical troubleshooting. When you know the reason without blind guesses, you can address the root cause. If you don’t, the risk is that you add even more side effects.

By the way, please user bind variables…

Cet article SQL Plan Directives strike again est apparu en premier sur Blog dbi services.

↧

Documentum Administrator UCF Troubleshooting

August 19, 2016, 2:23 am

≫ Next: Documentum story – dm_LogPurge and dfc.date_format

≪ Previous: SQL Plan Directives strike again

Maybe you had some issues with UCF in DA as me. I had this for no reason since few days at a customer. The problem was that we use SSL with DA and the Unified Content Facilities (UCF) wasn’t happy about it.
Thus, in this short blog I’ll speak about troubleshooting UCF.

The error I got happened when trying to edit, view or create documents; I had a popup saying an error occured with UCF.

First, we must know our enemy in order to fight it!

UCF stands for Unified Content Facilities. It’s a java applet made by EMC and used by wdk applications in order to manage and optimize content transfer between the application and your workstation. Thanks to UCF you can transfer large files with compressions and reconnect if the network failed some packets. The applet is downloaded to your workstation at runtime when you connect to a wdk application.
You can find the UCF configuration in your user folder like follow:
C:\Users\<USER>\Documentum\ucf

Refresh UCF Cache

Before going deeper in the debugging, maybe try to clear the ucf cache first and re-download the latest one from the server. In order to do so you’ll have to perform the following steps:
Clear you browser cache. If you have IE, go to Tools -> Delete Browsing History (or press CTRL+SHIFT+DEL).
Then check each checkboxes and click Delete.

Close the browser afterwards.

Now be sure that you don’t have any browser pointing to a wdk application and go to C:\Users\<USER>\Documentum and try deleting the ucf folder.
If you have an error telling you it is already used, open the task manager and search for javaw.exe processes, and then close them down.
You should be able to delete the ucf folder now.

Also clear the cached ucf jar files by opening the java control panel. Go to Control Panel -> search for Java -> General Tab -> Temporary Internet Files -> Settings -> Delete Files

Now test again by opening Documentum Administrator and creating/editing a document. You shouldn’t have a popup error about UCF.

If you reached this point in the blog that means you didn’t get rid of the problem, so didn’t I. Well at this point we did some corrections but we don’t know what is the real error about UCF, we only get this stack trace saying UCF failed. We can now enable the ucf tracing in order to see if something more interesting is written in the logs.
You can enable the tracing on both sides, the server and your workstation. The easiest is to begin with your workstation so go back to the ucf folder C:\Users\<USER>\Documentum\ucf
Then navigate to <PCNAME>\shared\config and edit ucf.client.config.xml
Add the following options between <configuration/>:

<option name="tracing.enabled">
    <value>true</value>
</option>
<option name="debug.mode">
    <value>true</value>
</option>

Also edit the file: ucf.client.logging.properties by changing .level=WARNING to .level=ALL

Now reproduce the error and check what has been written in C:\Users\<USER>\Documentum\Logs

If you can’t see what the problem is you can also activate the tracing on the webserver by editing the same way: ../WEB-INF/classes/ucf.server.config.xml but note that you need to restart the webserver for it to take effect.

The errors in the generated log should allow you to find the real cause of the ucf error. In my case it was the SSL handshake that was failing.

Cet article Documentum Administrator UCF Troubleshooting est apparu en premier sur Blog dbi services.

↧

Documentum story – dm_LogPurge and dfc.date_format

October 12, 2016, 3:05 am

≫ Next: Documentum story – User gets kicked out of D2 when navigating in D2Config

≪ Previous: Documentum Administrator UCF Troubleshooting

What is the relation between dfc.date_format and dm_LogPurge? This is the question we had to answer as we hit an issue. An issue with the dm_LogPurge job.
As usual once a repository has been created we are configuring several Documentum jobs for the housekeeping.
One of them is the dm_LogPurge. It is configured to run once a day with a cutoff_days of 90 days.
So all ran fine until we did another change.
On request of an application team we had to change the dfc.date_format to dfc.date_format=dd/MMM/yyyy HH:mm:ss to allow the D2 clients to use Months in letters and not digits.
This change fulfilled the application requirement but since that day, the dm_LogPurge job started to remove too many log files (to not write ALL).

So let’s explain how we proceed to find out the reason of the issue and more important the solution to avoid it.
We have been informed not by seeing that too many files have been removed but by checking the repository log file. BTW, this file is checked automatically using nagios with our own dbi scripts. So in the repository log file we had errors like:

2016-04-11T20:30:41.453453      16395[16395]    01xxxxxx80028223        [DM_OBJ_MGR_E_FETCH_FAIL]error:   "attempt to fetch object with handle 06xxxxxx800213d2 failed "
2016-04-11T20:30:41.453504      16395[16395]    01xxxxxx80028223        [DM_SYSOBJECT_E_CANT_GET_CONTENT]error:   "Cannot get  format for 0 content of StateOfDocbase sysobject. "
2016-04-11T20:26:10.157989      14679[14679]    01xxxxxx80028220        [DM_OBJ_MGR_E_FETCH_FAIL]error:   "attempt to fetch object with handle 06xxxxxx800213c7 failed "
2016-04-11T20:26:10.158059      14679[14679]    01xxxxxx80028220        [DM_SYSOBJECT_E_CANT_GET_CONTENT]error:   "Cannot get  format for 0 content

Based on the time stamp, I saw that the issue could be related to the dm_LogPurge. So I checked the job log file as well the folders which are cleaned out. In the folder all old log files were removed:

[dmadmin@content_server_01 log]$ date
Wed Apr 13 06:28:35 UTC 2016
[dmadmin@content_server_01 log]$ pwd
$DOCUMENTUM/dba/log
[dmadmin@content_server_01 log]$ ls -ltr REPO1*
lrwxrwxrwx. 1 dmadmin dmadmin      34 Oct 22 09:14 REPO1 -> $DOCUMENTUM/dba/log/<hex docbaseID>/
-rw-rw-rw-. 1 dmadmin dmadmin 8540926 Apr 13 06:28 REPO1.log

To have more information, I set the trace level of the dm_LogPurge job to 10 and analyzed the trace file.
In the trace file we had:

[main] com.documentum.dmcl.impl.DmclApiNativeAdapter@9276326.get( "get,c,sessionconfig,r_date_format ") ==> "31/1212/1995 24:00:00 "
[main] com.documentum.dmcl.impl.DmclApiNativeAdapter@9276326.get( "get,c,08xxxxxx80000362,method_arguments[ 1] ") ==> "-cutoff_days 90 "

So why did we have 31/1212/1995 ?

Using API I confirmed an issue related to the date format

API> get,c,sessionconfig,r_date_format
...
31/1212/1995 24:00:00

API> ?,c,select date(now) as dateNow from dm_server_config
datenow
-------------------------
14/Apr/2016 08:36:52

(1 row affected)

Date format? So as all our changes are documented, I easily found that we changed the dfc_date_format for the D2 application.
By cross-checking with another installation, used by another application where we did not change the dfc.date_format, I could confirm that the issue was related to this dfc parameter change.

Without dfc.date_format in dfc.properties:

API> get,c,sessionconfig,r_date_format
...
12/31/1995 24:00:00

API> ?,c,select date(now) as dateNow from dm_server_config
datenow
-------------------------
4/14/2016 08:56:13

(1 row affected)

Just to be sure that I did not miss something, I checked also if not all log files were removed after starting manually the job. They were still there.
Now the solution would be to rollback the dfc.date_format change but this would only help the platform but not the application team. As the initial dfc.date_format change was validated by EMC we had to find a solution for both teams.

After investigating we found the final solution:
Add dfc.date_format=dd/MMM/yyyyy HH:mm:ss in the dfc.properties file of the ServerApps (in the JMS directly so!)

With this solution the dm_LogPurge job does not remove too many files and the Application Team can still use the Month written in letters in its D2 applications.

Cet article Documentum story – dm_LogPurge and dfc.date_format est apparu en premier sur Blog dbi services.

↧

Documentum story – User gets kicked out of D2 when navigating in D2Config

October 26, 2016, 2:30 am

≫ Next: Observing DML restarts caused by invalidations

≪ Previous: Documentum story – dm_LogPurge and dfc.date_format

On a customer site, one of the users complained about being kicked out each time he wanted to create new documents in D2. This issue is happening in a default deployment of D2 and D2Config in a WebLogic Domain.We found out that the user sessions for D2Config and D2 are conflicting together.
This issue occurs if D2Config and D2 applications are opened in the same browser using different TABs and if the user is navigating from D2 to D2Config and vice versa.
The error message is misleading as it shows a session time out and the user just signed in .

Investigations:
Using a HTTP Header tracing tool, we saw that the JSESSIONID cookie which is the cookie storing the HTTP Session for the Java applications is changing when switching from one application to the other. This showed us that both Java Application were using the same session cookie which conduct to session lost.

Workaround or Fix:
An easy fix for this is to update the D2 weblogic.xml file included in the D2.war file with a section defining a new session cookie name as shown below:

<session-descriptor>
      <cookie-name>JSESSIONID_D2</cookie-name>
      <cookie-http-only>false</cookie-http-only>
</session-descriptor>

To proceed, follow the steps below:

Extract the weblogic.xml file from the war file

mkdir mytemp
cd mytemp
-- Put the D2.war file here
jar xvf D2.war WEB-INF/weblogic.xml

Edit the file and add the session-descriptor bloc above just after the Description closing TAG.
Update the D2.war file with the new weblogic.xml
```
jar uvf D2.war WEB-INF/weblogic.xml
```
And finally redeploy the D2.war file to the WebLogic Server.

This fix has been submitted to and validated by EMC support.

Cet article Documentum story – User gets kicked out of D2 when navigating in D2Config est apparu en premier sur Blog dbi services.

↧

Observing DML restarts caused by invalidations

November 24, 2016, 12:53 pm

≫ Next: Documentum story – How to avoid “DFC_FILE_LOCK_ACQUIRE_WARN” messages in Java Method Server (jms) LOG

≪ Previous: Documentum story – User gets kicked out of D2 when navigating in D2Config

Usually, cursor invalidations do not interrupt the current execution of the cursor. You can even drop a table while a query on it is running and the query may end successfully. However some invalidations will throw internal errors that are catch and the cursor execution restarts transparently. I’ve reproduced one here to show the symptoms: it’s a different EXEC_SQL_ID so the restarts are not accounted within the same SQL Monitoring.

I had that question because I encountered a SQL Monitoring report where the ‘executions’ count of the INSERT operation was more than one. I wondered if it can be a case of DML restart or if DML restart should be accounted in different EXEC_SQL_ID. I got the quick answer from a very reliable source:

@FranckPachot Yep!

— Stefan Koehler (@OracleSK) November 21, 2016

However for my better understanding, I’m now reproducing a case of DML restart to show the symptoms on V$SQL and SQL Monitor.

I need a table with some rows:
SQL> create table DEMO as select rownum n from xmltable('1 to 3'); Table created.
Initially my problem was with an insert into a GTT but now I’m reproducing the case with a partitioned table:
SQL> create table DEMOGTT (n number) partition by hash(n); Table created.
For internal reasons, when a cursor is invalidated after it has acquired a lock on a partition, a DML restart occurs. If you have a doubt, “oerr ora 14403″ explains that.

The easiest way to invalidate while the cursor is running is to call a function that does it for each row. And as you will see that it can be restarted to infinity I set a timeout timestamp after 30 seconds.

SQL> column limit new_value limit SQL> select to_char(sysdate+30/24/60/60,'yyyymmddhh24miss') limit from dual; LIMIT -------------- 20161124212154 SQL> create function DEMOSLOW(n number) return number as 2 pragma autonomous_transaction; 3 begin 4 if sysdate > to_date('&limit','yyyymmddhh24miss') 5 then 6 dbms_stats.gather_table_stats(user,'DEMOGTT',no_invalidate=>false); 7 end if; 8 return n; 9 end; 10 / old 4: if sysdate < to_date('&limit','yyyymmddhh24miss') new 4: if sysdate < to_date('20161124212154','yyyymmddhh24miss') Function created.

Time to run the insert, calling the function for each row:
SQL> alter session set statistics_level=all; Session altered. SQL> set timing on time on 21:21:24 SQL> insert /*+ monitor */ into DEMOGTT select n from DEMO where n=DEMOSLOW(n); 3 rows created. Elapsed: 00:00:29.30 21:21:54 SQL>

This has taked 30 seconds. Without the timeout, it never stops.

Execution plan for last run shows only one ‘Start':
21:21:54 SQL> select * from table(dbms_xplan.display_cursor('','','allstats last')); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------ SQL_ID 3fk1hyq9vnuzx, child number 0 ------------------------------------- insert /*+ monitor */ into DEMOGTT select n from DEMO where n=DEMOSLOW(n) Plan hash value: 4000794843 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------------- | 0 | INSERT STATEMENT | | 1 | | 0 |00:00:00.01 | 138 | | 1 | LOAD TABLE CONVENTIONAL | DEMOGTT | 1 | | 0 |00:00:00.01 | 138 | |* 2 | TABLE ACCESS FULL | DEMO | 1 | 1 | 3 |00:00:00.01 | 9 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("N"="DEMOSLOW"("N"))

This is a proof that DML restarts are not accounted within the same execution. the ‘last’ statistics are only from the last restart.

V$SQL accounts only one execution but thousands of invalidations:
21:21:54 SQL> select executions,invalidations from v$sql where sql_id='3fk1hyq9vnuzx'; EXECUTIONS INVALIDATIONS ---------- ------------- 1 1571

ASH shows the different SQL_EXEC_ID:
21:21:54 SQL> select sample_time,sql_id,sql_exec_id from v$active_session_history where sql_id='3fk1hyq9vnuzx' and sample_time>sysdate-10/24/60/60 order by sample_time desc; SAMPLE_TIME SQL_ID SQL_EXEC_ID --------------------------------------------------------------------------- ------------- ----------- 24-NOV-16 09.21.53.773 PM 3fk1hyq9vnuzx 16778774 24-NOV-16 09.21.51.773 PM 3fk1hyq9vnuzx 24-NOV-16 09.21.48.773 PM 3fk1hyq9vnuzx 16778501 24-NOV-16 09.21.46.773 PM 3fk1hyq9vnuzx 16778396 24-NOV-16 09.21.45.773 PM 3fk1hyq9vnuzx 16778341

and SQL Monitoring see different executions:
21:21:54 SQL> select sql_id,sql_exec_id,status,last_refresh_time,plan_operation,starts from v$sql_plan_monitor where sql_id='3fk1hyq9vnuzx' and plan_line_id=0 order by last_refresh_time desc fetch first 10 rows only; SQL_ID SQL_EXEC_ID STATUS LAST_REFRESH_TI PLAN_OPERATION STARTS ------------- ----------- ------------------- --------------- ------------------------------ ---------- 3fk1hyq9vnuzx 16778762 DONE 24-nov 21:21:53 INSERT STATEMENT 1 3fk1hyq9vnuzx 16778765 DONE 24-nov 21:21:53 INSERT STATEMENT 1 3fk1hyq9vnuzx 16778767 DONE 24-nov 21:21:53 INSERT STATEMENT 1 3fk1hyq9vnuzx 16778766 DONE 24-nov 21:21:53 INSERT STATEMENT 1 3fk1hyq9vnuzx 16778771 DONE 24-nov 21:21:53 INSERT STATEMENT 1 3fk1hyq9vnuzx 16778769 DONE 24-nov 21:21:53 INSERT STATEMENT 1 3fk1hyq9vnuzx 16778764 DONE 24-nov 21:21:53 INSERT STATEMENT 1 3fk1hyq9vnuzx 16778770 DONE 24-nov 21:21:53 INSERT STATEMENT 1 3fk1hyq9vnuzx 16778763 DONE 24-nov 21:21:53 INSERT STATEMENT 1 3fk1hyq9vnuzx 16778768 DONE 24-nov 21:21:53 INSERT STATEMENT 1

So those are the symptoms of DML restart. And my initial problem is still there:

This test case does not reproduce the issue on a GTT
My issue on the GTT had only one SQL_EXEC_ID with multiple ‘starts’ of the plan operations.

#sqlmon #puzzled
I expected to see '1' as 'executions for the first line of a plan,
Here I have executions with 1,12,19,64… ?!? pic.twitter.com/yHPZXDprgX

— Franck Pachot (@FranckPachot) November 21, 2016

Any comments greatly appreciated

Cet article Observing DML restarts caused by invalidations est apparu en premier sur Blog dbi services.

↧

Documentum story – How to avoid “DFC_FILE_LOCK_ACQUIRE_WARN” messages in Java Method Server (jms) LOG

December 20, 2016, 12:00 am

≫ Next: SQL Server 2016: New Dynamic Management Views (DMVs)

≪ Previous: Observing DML restarts caused by invalidations

Ref : EMC article number
The last publication date is Sat Feb 20 21:39:14 GMT 2016. Here the link: https://support.emc.com/kb/335987

After upgrading from 6.7.x to 7.2, the following warning message is logged in JMS log files: com.documentum.fc.common.DfNewInterprocessLockImpl – [DFC_FILE_LOCK_ACQUIRE_WARN] Failed to acquire lock proceeding ahead with no lock java.nio.channels.OverlappingFileLockException at sun.nio.ch.SharedFileLockTable.checkList FileLockTable.java:255)

In order to avoid this warning, EMC has provided a solution (SR #69856498) that will be described below:

By default ACS and ServerApp dfc.properties are pointing to $DOCUMENTUM_SHARED/config/dfc.properties.

Adding separate ‘dfc.data.dir’ cache folder location in ACS and ServerApp dfc.properties.
After JAVA Method Server restart, two separate cache folders are created inside $DOCUMENTUM_SHARED/jboss7.1.1/server and then, WARNING messages had gone from acs.log.

In fact, this is just a warning that someone else has acquired lock on the physical file (in this case it is dfc.keystore). Since ServerApps (Method Server) and ACS are invoking DFC simultaneously and both try to acquire lock on dfc.keystore file and Java throws OverlappingFileLockException. Then DFC warns that it could not lock the file and proceeds without lock. Ideally this should be just info message in this case, where file lock is acquired for read-only. But the same logic is used by other functionality like registry update and BOF Cache update, where this failure should be treated as genuine warning or error. Going forward, engineering will have to correct this code by taking appropriate actions for each functionality. There is no functional impact to use different data directory folder.

Please proceed as below to solve It:

Login to the Content Server
Change the current user to dmadmin :(administrator account)
Create some folders using:

 mkdir $DOCUMENTUM_SHARED/acs
 mkdir $DOCUMENTUM_SHARED/ServerApps
 mkdir $DOCUMENTUM_SHARED/bpm

Update all necessary dfc.properties files (with vi editor):

===============================================================================================================================

$DOCUMENTUM_SHARED/jboss7.1.1/server/DctmServer_MethodServer/deployments/acs.ear/lib/configs.jar/dfc.properties

⇒ Add at the end of this file the following line:

dfc.data.dir=$DOCUMENTUM_SHARED/acs

===============================================================================================================================

$DOCUMENTUM_SHARED/jboss7.1.1/server/DctmServer_MethodServer/deployments/ServerApps.ear/APP-INF/classes/dfc.properties

⇒ Add at the end of this file the following line:

dfc.data.dir=$DOCUMENTUM_SHARED/ServerApps

===============================================================================================================================

$DOCUMENTUM_SHARED/jboss7.1.1/server/DctmServer_MethodServer/deployments/bpm.ear/APP-INF/classes/dfc.properties

⇒ Add at the end of this file the following line:

dfc.data.dir=$DOCUMENTUM_SHARED/bpm

===============================================================================================================================

Verify that the recently created folders are empty using:

cd $DOCUMENTUM_SHARED
ls -l acs/ ServerApps/ bpm/

Restart the JMS using:

sh -c "cd $DOCUMENTUM_SHARED/jboss7.1.1/server;./stopMethodServer.sh"
sh -c "$DOCUMENTUM_SHARED/jboss7.1.1/server/startMethodServer.sh"

Verification

Verify that the recently created folders are now populated with default files and folders using:

cd $DOCUMENTUM_SHARED
ls -l acs/ ServerApps/ bpm/

Files must not be empty now.

Disconnect from the Content Server.

Using this procedure, you won’t see this WARNING message anymore.
Regards,

Source : EMC article number : 000335987

Cet article Documentum story – How to avoid “DFC_FILE_LOCK_ACQUIRE_WARN” messages in Java Method Server (jms) LOG est apparu en premier sur Blog dbi services.

↧

SQL Server 2016: New Dynamic Management Views (DMVs)

September 29, 2017, 6:32 am

≫ Next: Unstructured vs. structured

≪ Previous: Documentum story – How to avoid “DFC_FILE_LOCK_ACQUIRE_WARN” messages in Java Method Server (jms) LOG

In SQL Server 2016, you will discover a lot of new Dynamic Management Views(DMVs).
In this article, I will just give you a little overview of these useful views for us as DBA.

SQL Server 2012 has 145 DMVs and SQL Server 2014 has 166 DMVs.
Now, SQL Server 2016 has 185 DMVs.

How to see it?

It is very easy to have a look using the sys.all_objects view:

SELECT * FROM sys.all_objects WHERE TYPE=’V’ AND NAME LIKE ‘dm_%’ order by name ASC

From SQL Server 2012 to SQL Server 2014, we can notice that a lot of new DMVs comes with the In-Memory technology with the syntax “dm_xtp_xxxxxxxx” or “dm_db_xtp_xxxxxxxx”

In SQL Server 2016, a lot of new “dm_exec_xxxxxxxx” is present.

All definitions for these views come from the Microsoft documentation or web site.

To begin, you will see 10 DMVs for the PolyBase technology:

dm_exec_compute_node_errors
- Returns errors that occur on PolyBase compute nodes
- Microsoft Reference here

dm_exec_compute_node_status
- Give information about resources of PolyBase nodes like memory, cpu, time,…
- Microsoft Reference here

dm_exec_compute_nodes
- Returns the list of type, logical name and IP adress of PolyBase nodes
- Microsoft Reference here

dm_exec_distributed_request_steps
- Give all steps that compose a PolyBase request
- Microsoft Reference here

dm_exec_distributed_requests
- Give the current status of actives queries
- Microsoft Reference here

dm_exec_distributed_sql_requests
- This view shows the data for the last 1000 requests
- Microsoft Reference here

dm_exec_dms_services
- Give the status of the DMS (Data Movement Service) Service
- Microsoft Reference here

dm_exec_dms_workers
- Show all workers completing DMS steps for the last 1000 queries and active queries
- Microsoft Reference here

dm_exec_external_operations
- returns information of external PolyBase operations
- Microsoft Reference here

dm_exec_external_work
- gives information for the workload per node
- Microsoft Reference here

A useful msdn page resumes all DMVs for these new views here

Other dm_exec_xxx views are basically usefull like:

dm_exec_function_stats
- Returns aggregate performance statistics for cached functions.
- Microsoft Reference here

dm_exec_query_optimizer_memory_gateways
- Returns the current status of resource semaphores used to throttle concurrent query optimization.
- Microsoft Reference here

dm_exec_query_parallel_workers
- Returns worker availability information per node
- Microsoft Reference here

dm_exec_session_wait_stats
- Returns information about all the waits encountered by threads that executed for each session
- Microsoft Reference here

3 new DMVs for the Columstore technology:

dm_column_store_object_pool
- Returns counts of different types of object memory pool usage for columnstore index objects
- Microsoft Reference here

dm_db_column_store_row_group_operational_stats
- Returns current row-level I/O, locking, and access method activity for compressed rowgroups in a columnstore index.
- Microsoft Reference here

dm_db_column_store_row_group_physical_stats
- Provides current rowgroup-level information about all of the columnstore indexes in the current database
- Microsoft Reference here

2 new DMVs for Stretch Databases in the database context and with rda(remote database archive):

dm_db_rda_migration_status
- To list the migration batch of the table
- See Nathan Courtine Article here
- Microsoft Reference here

dm_db_rda_migration_status
- For the current database, list of state information of the remote data archive schema update task.
- Microsoft Reference here

This list can change if a Service Pack is applied.
It is just for you to have a little reference view about these useful views! 8-)

Cet article SQL Server 2016: New Dynamic Management Views (DMVs) est apparu en premier sur Blog dbi services.

↧

Unstructured vs. structured

November 17, 2017, 11:13 pm

≫ Next: SQL Server Tips: an orphan user owns a database role

≪ Previous: SQL Server 2016: New Dynamic Management Views (DMVs)

The title of this blog post was: “Tracing DBMS_RCVMAN for reclaimable archivelogs” until I started to write the conclusion…

In a previous post I mentioned that there’s a bug with archivelog deletion policy when you want to mention both the ‘BACKED UP … TIMES TO …’ and ‘APPLIED’ or ‘SHIPPED’ as conditions for archived logs to be reclaimable. I opened a SR, they didn’t even try to reproduce it (and I can guarantee you can reproduce it in 2 minutes on any currently supported version) so I traced it myself to understand the bug and suggest the fix.

I traced the DBMS_RCVMAN with Kernel Recovery Area function SQL Tracing:
SQL> alter session set events 'trace[kra_sql] disk high, memory disable'; SQL> exec dbms_backup_restore.refreshAgedFiles; SQL> alter session set events 'trace[kra_sql] off';
I know refreshAgedFiles checks for reclaimable file in FRA since it was an old bug where we had to run it manually on databases in mount.

I compared the traces when changing the order of ‘APPLIED’ and ‘BACKED UP’ and found the following:
< *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: setRedoLogDeletionPolicy with policy = TO BACKED UP 1 TIMES TO DISK APPLIED ON ALL STANDBY --- > *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: setRedoLogDeletionPolicy with policy = TO APPLIED ON ALL STANDBY BACKED UP 1 TIMES TO DISK 5340c5340 < *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: policy = TO BACKED UP 1 TIMES TO DISK APPLIED ON ALL STANDBY --- > *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: policy = TO APPLIED ON ALL STANDBY BACKED UP 1 TIMES TO DISK 5343c5343 < *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: EXITING setRedoLogDeletionPolicy with policy = TO BACKED UP 1 TIMES TO DISK APPLIED ON ALL STANDBY with alldest = 1 --- > *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: EXITING setRedoLogDeletionPolicy with policy = TO APPLIED ON ALL STANDBY BACKED UP 1 TIMES TO DISK with alldest = 1 5350,5351c5350,5351 < *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: parseBackedUpOption devtype=DISK < *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: parseBackedUpOption backed up conf - devtype=DISK , backups=1 --- > *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: parseBackedUpOption devtype=DISK > *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: parseBackedUpOption backed up conf - devtype=DISK, backups=1 5363c5363 < *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: EXITING getBackedUpAl with TRUE --- > *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: EXITING getBackedUpAl with key = 128 stamp = 958068130 5367c5367 < *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: EXITING getBackedUpFiles with: no_data_found --- > *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: EXITING getBackedUpFiles
You see at the top the difference in the way I mentioned the deletion policy. You see at the bottom that the first one (starting with ‘BACKED UP’) didn’t find archivelogs being backed up (no_data_found). But the second one (starting with ‘APPLIED’) mentioned the sequence# 128.

But if you look carefully, you see another difference in the middle: the “devtype=DISK” has an additional space before the comma in the first case.

So I traced a bit further, including SQL_TRACE and I found that the deletion policy is just using some INSTR and SUBSTR parsing on the deletion policy text to find the policy, the backup times, and the device type. For sure, looking for backups with DEVICE_TYPE=’DISK ‘ instead of ‘DISK’ will not find anything and this is the reason for the bug: no archived logs backed up means no archived log reclaimable.

If you look closer at DBMS_RCVMAN you will find that the device type is extracted with SUBSTR(:1, 1, INSTR(:1, ‘ ‘)) when the device type is followed by a space, which is the reason of this additional space. The correct extraction should be SUBSTR(:1, 1, INSTR(:1, ‘ ‘)-1) and this is what I suggested on the SR.

So what?

Writing the conclusion made me change the title. Currently, a lot of people are advocating for unstructured data. Because it is easy (which rhymes with ‘lazy’). Store information as it comes and postpone the parsing to a more structured data type until you need to process it. This seems to be how the RMAN configuration is stored: as the text we entered. And it is parsed later with simple text function as INSTR(), SUBSTR(), and LIKE. But you can see how a little bug, such as reading an additional character, has big consequences. If you look at the archivelog deletion policy syntax, you have 50% chances to run into this bug on a Data Guard configuration. The Recovery Area will fill up and your database will be blocked. The controlfile grows. Or you noticed it before and you run a ‘delete archivelog’ statement without knowing the reason. You waste space, removing some recovery files from local storage, which could have been kept for longer. If the deletion policy was parsed immediately when entered, like SQL DDL or PL/SQL APIs, the issue would have been detected a long time ago. Structure and strong typing is the way to build robust applications.

Cet article Unstructured vs. structured est apparu en premier sur Blog dbi services.

↧