martedì 9 luglio 2013

Linux - Sostituzione LUN gestite da ASM


Situazione attuale:

[root@pobi02 ~]# sanlun lun show
controller(7mode)/ device host lun
vserver(Cmode) lun-pathname filename adapter protocol size mode
---------------------------------------------------------------------------------------------------------
fas3160b /vol/pobi02_data/pobi02_data_lun /dev/sdb host3 FCP 100g 7
fas3160b /vol/pobiX_share/pobiX_share_lun /dev/sdc host3 FCP 10g 7
fas3160b /vol/pobiX_share/pobiX_share_lun2 /dev/sdd host3 FCP 100g 7
fas3160b /vol/pobi02_data/pobi02_data_lun /dev/sde host3 FCP 100g 7
fas3160b /vol/pobiX_share/pobiX_share_lun /dev/sdf host3 FCP 10g 7
fas3160b /vol/pobiX_share/pobiX_share_lun2 /dev/sdg host3 FCP 100g 7
fas3160b /vol/pobi02_data/pobi02_data_lun /dev/sdh host3 FCP 100g 7
fas3160b /vol/pobiX_share/pobiX_share_lun /dev/sdi host3 FCP 10g 7
fas3160b /vol/pobiX_share/pobiX_share_lun2 /dev/sdj host3 FCP 100g 7
fas3160b /vol/pobi02_data/pobi02_data_lun /dev/sdk host3 FCP 100g 7
fas3160b /vol/pobiX_share/pobiX_share_lun /dev/sdl host3 FCP 10g 7
fas3160b /vol/pobiX_share/pobiX_share_lun2 /dev/sdm host3 FCP 100g 7
fas3160b /vol/pobi02_data/pobi02_data_lun /dev/sdn host4 FCP 100g 7
fas3160b /vol/pobiX_share/pobiX_share_lun /dev/sdo host4 FCP 10g 7
fas3160b /vol/pobiX_share/pobiX_share_lun2 /dev/sdp host4 FCP 100g 7
fas3160b /vol/pobi02_data/pobi02_data_lun /dev/sdq host4 FCP 100g 7
fas3160b /vol/pobiX_share/pobiX_share_lun /dev/sdr host4 FCP 10g 7
fas3160b /vol/pobiX_share/pobiX_share_lun2 /dev/sds host4 FCP 100g 7
fas3160b /vol/pobi02_data/pobi02_data_lun /dev/sdt host4 FCP 100g 7
fas3160b /vol/pobiX_share/pobiX_share_lun /dev/sdu host4 FCP 10g 7
fas3160b /vol/pobiX_share/pobiX_share_lun2 /dev/sdv host4 FCP 100g 7
fas3160b /vol/pobi02_data/pobi02_data_lun /dev/sdw host4 FCP 100g 7
fas3160b /vol/pobiX_share/pobiX_share_lun /dev/sdx host4 FCP 10g 7
fas3160b /vol/pobiX_share/pobiX_share_lun2 /dev/sdy host4 FCP 100g 7


Forzo un rescan delle hba , su entrambi i nodi ma un hba alla volta facendo attenzione ad attendere che il link della stessa torni online prima di lanciare il rescan sull'altro path. Posso verificare lo stato utilizzando SCLI :


[root@pobi01 ~]# scli
Scanning QLogic FC HBA(s) and device(s), please wait...
-
SANsurfer FC/CNA HBA CLI

v1.7.3 Build 32

Main Menu

1: General Information
2: HBA Information
3: HBA Parameters
4: Target/LUN List
5: iiDMA Settings
6: Boot Device
7: Utilities
8: Beacon
9: Diagnostics
10: Statistics
11: Virtual
12: FCoE
13: Help
14: Exit


Enter Selection: 4

SANsurfer FC/CNA HBA CLI

v1.7.3 Build 32

Target List Menu

HBA Model QMI8142
1: Port 1: WWPN: 21-00-00-C0-DD-1D-01-B1 Link Down
2: Port 2: WWPN: 21-00-00-C0-DD-1D-01-B3 Link Down
HBA Model QMI2572
3: Port 1: WWPN: 21-00-00-1B-32-9B-8E-4D Loop Down
4: Port 2: WWPN: 21-01-00-1B-32-BB-8E-4D Online
5: All HBAs
6: Return to Previous Menu



In questo caso è in corso il rescan sull' hba 3 , come si vede dallo stato LOOP DOWN.


echo "1" > /sys/class/fc_host/host4/issue_lip

...


echo "1" > /sys/class/fc_host/host3/issue_lip

Al termine della doppia operazione di rescan , otteniamo quanto segue:


[root@pobi01 ~]# sanlun lun show
controller(7mode)/ device host lun
vserver(Cmode) lun-pathname filename adapter protocol size mode
--------------------------------------------------------------------------------------------------------------------
netapp1 /vol/N1_pobiX_share/N1_pobiX_share_data_lun1 /dev/sdaw host3 FCP 100g 7
netapp1 /vol/N1_pobiX_share/N1_pobiX_share_data_lun0 /dev/sdav host3 FCP 10g 7
netapp1 /vol/N1_pobi01_data/N1_pobi01_data_lun0 /dev/sdau host3 FCP 100g 7
netapp1 /vol/N1_pobiX_share/N1_pobiX_share_data_lun1 /dev/sdat host3 FCP 100g 7
netapp1 /vol/N1_pobiX_share/N1_pobiX_share_data_lun0 /dev/sdas host3 FCP 10g 7
netapp1 /vol/N1_pobi01_data/N1_pobi01_data_lun0 /dev/sdar host3 FCP 100g 7
netapp1 /vol/N1_pobiX_share/N1_pobiX_share_data_lun1 /dev/sdaq host3 FCP 100g 7
netapp1 /vol/N1_pobiX_share/N1_pobiX_share_data_lun0 /dev/sdap host3 FCP 10g 7
netapp1 /vol/N1_pobi01_data/N1_pobi01_data_lun0 /dev/sdao host3 FCP 100g 7
netapp1 /vol/N1_pobiX_share/N1_pobiX_share_data_lun1 /dev/sdan host3 FCP 100g 7
netapp1 /vol/N1_pobiX_share/N1_pobiX_share_data_lun0 /dev/sdam host3 FCP 10g 7
netapp1 /vol/N1_pobi01_data/N1_pobi01_data_lun0 /dev/sdal host3 FCP 100g 7
netapp1 /vol/N1_pobiX_share/N1_pobiX_share_data_lun1 /dev/sdak host4 FCP 100g 7
netapp1 /vol/N1_pobi01_data/N1_pobi01_data_lun0 /dev/sdai host4 FCP 100g 7
netapp1 /vol/N1_pobiX_share/N1_pobiX_share_data_lun0 /dev/sdaj host4 FCP 10g 7
netapp1 /vol/N1_pobiX_share/N1_pobiX_share_data_lun1 /dev/sdah host4 FCP 100g 7
netapp1 /vol/N1_pobiX_share/N1_pobiX_share_data_lun0 /dev/sdag host4 FCP 10g 7
netapp1 /vol/N1_pobi01_data/N1_pobi01_data_lun0 /dev/sdaf host4 FCP 100g 7
netapp1 /vol/N1_pobiX_share/N1_pobiX_share_data_lun1 /dev/sdae host4 FCP 100g 7
netapp1 /vol/N1_pobiX_share/N1_pobiX_share_data_lun0 /dev/sdad host4 FCP 10g 7
netapp1 /vol/N1_pobi01_data/N1_pobi01_data_lun0 /dev/sdac host4 FCP 100g 7
netapp1 /vol/N1_pobiX_share/N1_pobiX_share_data_lun1 /dev/sdab host4 FCP 100g 7
netapp1 /vol/N1_pobiX_share/N1_pobiX_share_data_lun0 /dev/sdaa host4 FCP 10g 7
netapp1 /vol/N1_pobi01_data/N1_pobi01_data_lun0 /dev/sdz host4 FCP 100g 7
fas3160b /vol/pobiX_share/pobiX_share_lun2 /dev/sdy host4 FCP 100g 7
fas3160b /vol/pobiX_share/pobiX_share_lun /dev/sdx host4 FCP 10g 7
fas3160b /vol/pobi01_data/pobi01_data_lun /dev/sdw host4 FCP 100g 7
fas3160b /vol/pobiX_share/pobiX_share_lun2 /dev/sdv host3 FCP 100g 7
fas3160b /vol/pobiX_share/pobiX_share_lun2 /dev/sdu host4 FCP 100g 7
fas3160b /vol/pobiX_share/pobiX_share_lun /dev/sdt host3 FCP 10g 7
fas3160b /vol/pobiX_share/pobiX_share_lun /dev/sds host4 FCP 10g 7
fas3160b /vol/pobi01_data/pobi01_data_lun /dev/sdr host3 FCP 100g 7
fas3160b /vol/pobiX_share/pobiX_share_lun2 /dev/sdp host3 FCP 100g 7
fas3160b /vol/pobi01_data/pobi01_data_lun /dev/sdq host4 FCP 100g 7
fas3160b /vol/pobiX_share/pobiX_share_lun /dev/sdo host3 FCP 10g 7
fas3160b /vol/pobi01_data/pobi01_data_lun /dev/sdn host3 FCP 100g 7
fas3160b /vol/pobi01_data/pobi01_data_lun /dev/sdb host3 FCP 100g 7
fas3160b /vol/pobiX_share/pobiX_share_lun /dev/sdc host3 FCP 10g 7
fas3160b /vol/pobiX_share/pobiX_share_lun2 /dev/sdd host3 FCP 100g 7
fas3160b /vol/pobi01_data/pobi01_data_lun /dev/sde host3 FCP 100g 7
fas3160b /vol/pobiX_share/pobiX_share_lun /dev/sdf host3 FCP 10g 7
fas3160b /vol/pobiX_share/pobiX_share_lun2 /dev/sdg host3 FCP 100g 7
fas3160b /vol/pobi01_data/pobi01_data_lun /dev/sdh host4 FCP 100g 7
fas3160b /vol/pobiX_share/pobiX_share_lun /dev/sdi host4 FCP 10g 7
fas3160b /vol/pobiX_share/pobiX_share_lun2 /dev/sdj host4 FCP 100g 7
fas3160b /vol/pobi01_data/pobi01_data_lun /dev/sdk host4 FCP 100g 7
fas3160b /vol/pobiX_share/pobiX_share_lun /dev/sdl host4 FCP 10g 7
fas3160b /vol/pobiX_share/pobiX_share_lun2 /dev/sdm host4 FCP 100g 7

Verifichiamo nuovamente lo stato della HBA , mediante scli:


HBA Model QMI2572
3: Port 1: WWPN: 21-00-00-1B-32-9B-8E-4D Online
4: Port 2: WWPN: 21-01-00-1B-32-BB-8E-4D Online

Adesso la ridondanza del path è nuovamente garantita , perfetto.


- IDENTIFICAZIONE DELLE LUN MEDIANTE WWID

Otteniamo il wwid delle lun utilizzando scsi_id -gus


[root@pobi01 ~]# scsi_id -gus /block/sdau
360a980004176427a302443427a71316f
[root@pobi01 ~]# scsi_id -gus /block/sdav
360a980004176427a302443427a713171
[root@pobi01 ~]# scsi_id -gus /block/sdaw
360a980004176427a302443427a713173

che corrispondono a :


netapp1 /vol/N1_pobi01_data/N1_pobi01_data_lun0 /dev/sdau host3 FCP 100g (DISCO PRIVATO PER FS /u01)
netapp1 /vol/N1_pobiX_share/N1_pobiX_share_data_lun0 /dev/sdav host3 FCP 10g (DISCO ASM PER VOTING DISK)
netapp1 /vol/N1_pobiX_share/N1_pobiX_share_data_lun1 /dev/sdaw host3 FCP 100g (DISCO ASM PER FS ACFS)

Verifichiamo anche sull'altro nodo la coerenza dei wwid:


[root@pobi02 ~]# scsi_id -gus /block/sdau
360a9800041764752752b434151713942
[root@pobi02 ~]# scsi_id -gus /block/sdav
360a980004176427a302443427a713171
[root@pobi02 ~]# scsi_id -gus /block/sdaw
360a980004176427a302443427a713173

Ovviamente il primo disco è diverso , quindi lo è anche il wwid (!) . I rimanenti coincidono essendo le shared lun.

Modifico il file multipath.conf su entrambi i nodi , salvandone copia e lo edito aggiungendo le sezioni relative ai nuovi dischi :


cp multipath.conf multipath.conf.20130508.old

multipath {
wwid 360a980004176427a302443427a713171
alias asm_vote_0
path_grouping_policy group_by_prio
path_checker readsector0
path_selector "round-robin 0"
failback immediate
polling_interval 10
no_path_retry 2
mode 0660
uid 290
gid 290
}
multipath {
wwid 360a980004176427a302443427a713173
alias asm_acfs_0
path_grouping_policy group_by_prio
path_checker readsector0
path_selector "round-robin 0"
failback immediate
polling_interval 10
no_path_retry 2
mode 0660
uid 290
gid 290
}

Salviamo e riavviamo il servizio multipathd su entrambi i nodi


[root@pobi01 etc]# service multipathd restart
Stopping multipathd daemon: [ OK ]
Starting multipathd daemon: [ OK ]


E verifichiamo i nomi assegnati dal device mapper ai dischi che ci interessano.


[root@pobi01 etc]# multipath -l
mpath1 (360a980004176427a302443427a71316f) dm-16 NETAPP,LUN
[size=100G][features=1 queue_if_no_path][hwhandler=0][rw]
...
...
asm_acfs_0 (360a980004176427a302443427a713173) dm-18 NETAPP,LUN
[size=100G][features=1 queue_if_no_path][hwhandler=0][rw]
...
...
asm_vote_0 (360a980004176427a302443427a713171) dm-17 NETAPP,LUN
[size=10G][features=1 queue_if_no_path][hwhandler=0][rw]
...
...

- SOSTITUZIONE DEL DISCO PRIVATO CHE OSPITA IL FS /u01

Iniziamo sostituendo il disco fuori dal controllo di ASM , ovvero il disco che ospita il fs /u01 ; nel nostro caso il device interessato è mpath1.


[root@pobi01 etc]# pvcreate /dev/mapper/mpath1
Physical volume "/dev/mapper/mpath1" successfully created


Verifichiamo le caratteristiche del vg che contiene il disco da sostituire ,cioè vg_obi :


[root@pobi01 etc]# vgdisplay --verbose vg_obi
Using volume group(s) on command line
Finding volume group "vg_obi"
--- Volume group ---
VG Name vg_obi
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 2
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 1
Max PV 0
Cur PV 1
Act PV 1
VG Size 100.00 GB
PE Size 4.00 MB
Total PE 25599
Alloc PE / Size 25344 / 99.00 GB
Free PE / Size 255 / 1020.00 MB
VG UUID pZ01Vb-54j0-MAAc-7cef-n9xI-oa36-LCaLsh

--- Logical volume ---
LV Name /dev/vg_obi/lv_u01
VG Name vg_obi
LV UUID HI5rYX-2fwS-1uq9-7c2c-rPkZ-33yU-MyQoRX
LV Write Access read/write
LV Status available
# open 1
LV Size 99.00 GB
Current LE 25344
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:15

--- Physical volumes ---
PV Name /dev/mpath/360a98000486e64376334696c6f487550
PV UUID cmhntD-hM2t-MezU-4ELc-exV5-omhR-hRW3px
PV Status allocatable
Total PE / Free PE 25599 / 255

e notiamo che contiene un solo pv il cui wwid corrisponde con



mpath0 (360a98000486e64376334696c6f487550) dm-12 NETAPP,LUN
[size=100G][features=1 queue_if_no_path][hwhandler=0][rw]

quindi aggiungiamo il nuovo pv al vg


[root@pobi01 etc]# vgextend vg_obi /dev/mapper/mpath1
Volume group "vg_obi" successfully extended

ed eseguo PVMOVE per migrare il contenuto del vecchio pv nel nuovo:


[root@pobi01 etc]# pvmove /dev/mpath/360a98000486e64376334696c6f487550 /dev/mpath/mpath1
/dev/mpath/360a98000486e64376334696c6f487550: Moved: 0.1%
/dev/mpath/360a98000486e64376334696c6f487550: Moved: 1.8%
/dev/mpath/360a98000486e64376334696c6f487550: Moved: 3.6%
...
...
/dev/mpath/360a98000486e64376334696c6f487550: Moved: 98.3%
/dev/mpath/360a98000486e64376334696c6f487550: Moved: 99.6%
/dev/mpath/360a98000486e64376334696c6f487550: Moved: 100.0%
[root@pobi01 etc]#

Verifichiamo che il vecchio PV non abbia PE usati


--- Physical volumes ---
PV Name /dev/mpath/360a98000486e64376334696c6f487550
PV UUID cmhntD-hM2t-MezU-4ELc-exV5-omhR-hRW3px
PV Status allocatable
Total PE / Free PE 25599 / 25599

PV Name /dev/mpath/mpath1
PV UUID fgUfVg-8iAD-fmlO-9D6J-j5XP-APB6-S48H0R
PV Status allocatable
Total PE / Free PE 25599 / 255


e procediamo con la reducevg


[root@pobi01 etc]# vgreduce vg_obi /dev/mpath/360a98000486e64376334696c6f487550
Removed "/dev/mpath/360a98000486e64376334696c6f487550" from volume group "vg_obi"


SOSTITUZIONE DISCHI ACFS

verifichiamo le presenza dei block device sotto /dev/mapper

[root@pobi02 mapper]# ls -la
total 0
drwxr-xr-x 2 root root 440 May 9 08:45 .
drwxr-xr-x 15 root root 5720 May 9 08:45 ..
brw-rw---- 1 root disk 253, 18 May 8 15:49 asm_acfs_0
brw-rw---- 1 grid asmadmin 253, 14 Apr 24 14:54 asm_acfs_1
brw-rw---- 1 root disk 253, 17 May 8 15:49 asm_vote_0
brw-rw---- 1 grid asmadmin 253, 13 May 9 08:45 asm_vote_1 ...


Modifichamo l'ownership dei nuovi device

[root@pobi02 mapper]# chown grid.asmadmin /dev/mapper/asm_acfs_0
[root@pobi02 mapper]# chown grid.asmadmin /dev/mapper/asm_vote_0
[root@pobi02 mapper]# ls -la
total 0
drwxr-xr-x 2 root root 440 May 9 08:45 .
drwxr-xr-x 15 root root 5720 May 9 08:45 ..
brw-rw---- 1 grid asmadmin 253, 18 May 8 15:49 asm_acfs_0
brw-rw---- 1 grid asmadmin 253, 14 Apr 24 14:54 asm_acfs_1
brw-rw---- 1 grid asmadmin 253, 17 May 8 15:49 asm_vote_0
brw-rw---- 1 grid asmadmin 253, 13 May 9 08:48 asm_vote_1


E' tutto pronto per configurare i nuovi dischi ed aggiungere nei dg

Verifico la lista di xauth [root@pobi01 etc]# xauth list pobi01/unix:11 MIT-MAGIC-COOKIE-1 4a0344ce5352618d2c37575c4710dcec

divento utente grid

[root@pobi01 etc]# su - grid
[grid@pobi01 ~]$ export DISPLAY=localhost:11.0
[grid@pobi01 ~]$ xauth add pobi01/unix:11 MIT-MAGIC-COOKIE-1 4a0344ce5352618d2c37575c4710dcec

lancio ASMCA per configurare i dg

[grid@pobi01 ~]$ asmca


AsmSostDsk1.png


Seleziono il VOTEDG e con tasto destro mi si apre un menu; seleziono Add disks


AsmSostDsk2.png


Seleziono il disco da aggiungere


AsmSostDsk3.png


Aggiunta completata


AsmSostDsk5.png


Seleziono il ACFSDG e con tasto destro mi si apre un menu; seleziono Add disks


AsmSostDsk6.png


AsmSostDsk7.png


Aggiunta completata , i disk groups devono ribilanciarsi con i nuovi dischi. Quando siamo certi che non vi sono attività di rebalance in corso , procediamo con la drop dei vecchi dischi dai dg . Posso verificare il progresso dell'attività di rebalancing con le seguenti istruzioni sql:


[grid@pobi02 ~]$ export ORACLE_SID=+ASM2
[grid@pobi02 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.3.0 Production on Thu May 9 09:10:54 2013

Copyright (c) 1982, 2011, Oracle. All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> select * from gv$asm_operation;

INST_ID GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE
---------- ------------ ----- ---- ---------- ---------- ---------- ---------- ---------- ----------- --------------------------------------------
2 1 REBAL WAIT 1
1 1 REBAL RUN 1 1 48045 50726 9027 0



SQL> r
1 select * from gv$asm_operation
2*

no rows selected

SQL> quit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options



Ora procedo con la drop dei vecchi dischi , seleziono VOTEDG con il tasto destro e scelgo Drop disks


AsmSostDsk8.png


Seleziono il disco da rimuovere dal dg


AsmSostDsk9.png


Rimozione completata , ripetiamo l'operazione con l'altro DG .


AsmSostDsk10.png

AsmSostDsk11.png


Rimozione completata




Per completare l'attività è necessario :

eliminare le voci dei vecchi dischi dal file multipath.conf ;

eliminare il pv vecchio per rimuovere I metadati lvm dal disco: [root@pobi01 etc]# pvs

PV VG Fmt Attr PSize PFree
/dev/mpath/360a98000486e64376334696c6f487550 lvm2 a- 100.00G 100.00G
/dev/mpath/mpath1 vg_obi lvm2 a- 100.00G 1020.00M
/dev/sda2 vg_root lvm2 a- 135.78G 79.53G



[root@pobi01 etc]# pvremove /dev/mpath/360a98000486e64376334696c6f487550

Labels on physical volume "/dev/mpath/360a98000486e64376334696c6f487550" successfully wiped

e rimuovere Il multipathed device: [root@pobi01 etc]# multipath -f mpath0

rimuovere i singoli device, ad esempio ne ottengo la lista e costruisco un ciclo:


for x in b c d e f g h i j k l m n o p q r s t u v w x y
do
echo 1 > /sys/block/sd$x/device/delete
done

infine riavviare il servizio multipathd.

Nessun commento: