[Resolu] Recuperation Du 4

fornas · le 21 septembre 2014

Bonjour,

ici je vous présente la situation

le quatrième disque donne l'historique suivant :

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%       249         2635179183
# 2  Short offline       Completed: read failure       90%       249         2901110171
# 3  Short offline       Completed: read failure       90%       226         2415837717
# 4  Short offline       Completed: read failure       90%       226         2943468502
# 5  Short offline       Completed: read failure       90%       225         2756480134
# 6  Extended offline    Completed: read failure       90%       207         3473185238
# 7  Short offline       Completed without error       00%       182         -
# 8  Short offline       Completed without error       00%       182         -
# 9  Short offline       Completed without error       00%       181         -
#10  Short offline       Interrupted (host reset)      90%        95         -
#11  Short offline       Aborted by host               90%        95         -

ici voici un avis sur ce disque :

http://www.newegg.com/Product/Product.aspx?Item=N82E16822145475

le plus simple aurait été de le remplacer par le même mais il semblerait qu'il soit difficile de le trouver à un prix raisonnable.

quel(s) choix vous paraît adapté ?

merci d'avance.

cordialement

fornas

Modifié le 30 septembre 2014 par fornas

gaetan.cambier · le 21 septembre 2014

j'ai un peu lu ce que tu as fait sur ton disque sur l'autre forum

pourquoi pas utiliser la commande suivante :

# badblocks -wsv /dev/<device>

et un test smart rapide après

# smartctl --test=short /dev/<device>

et si jamais ca va, passer à un smart long

autrement si tu veux changer pour un autre disque, faut un peux comparer les spec de tes hitachi pour avoir au moin un disque avec le meme debit et temps d'acces pour ne rien ralentir

fornas · le 21 septembre 2014

bonsoir Gaetan,

sans doute à tort, j'avais compris que le badblock sert à repérer des secteurs défaillants et que la solution était de remplir de zeros pour obliger le firmware du disque à déclarer le secteur, sur lequel il n'arrive pas à écrire, comme défaillant et à utiliser un secteur de secours.

c'est la raison pour laquelle je suis passé tout de suite à cette étape des "zeros"

* mauvaise interprétation ?

comme il reste à 90% de "read failure", j'imagine qu'il n'a plus assez de secteurs de secours.

* pour un remplacement, y a-t-il une marque et/ou un modèle qui semble recueillir des retours particulièrement favorables ?

quand on va sur le support hitachi, on tombe sur western digital. je suppose qu'il y a eu absorption

gaetan.cambier · le 21 septembre 2014

oui, badblock va reperer les secteur defectueux et normalement le disque va les remappé

comme il reste à 90% de "read failure", j'imagine qu'il n'a plus assez de secteurs de secours

non, le smart te dis juste que c'est à 90% de son test qu'il a eu un problème, ca ne veux pas dire que 10% du disk est mauvais

fornas · le 22 septembre 2014

bonjour Gaetan,

ok merci et j'ai lancé le "badblocks" hier soir : plus qu'à attendre

# badblocks -wsv /dev/sdc
Vérification des blocs défectueux en mode lecture-écriture
Du bloc 0 au bloc 1953514583
Test en cours avec le motif 0xaa:   0.18% effectué, 10:00 écoulé. (0/0/0 erreurs

ce matin : 10,67 % effectué, 09:35:36 écoulé.

approximativement 89 H (3,7j) pour le tout sur un passage

gaetan.cambier · le 22 septembre 2014

reste à espérer que ce soit efficace car autrement, ca fait 3 jours de perdu

fornas · le 25 septembre 2014

Bonsoir,

donc aux alentours de 20:00, 100 % effectué, 91H !

puis je me suis dit, tiens je vais les tenir "au courant"

donc je retourne sur le terminal, je sélectionne la zone copiable et laisse parler les réflexes : "ctrl-c", ce qui, au lieu de copier, m'arrête le plus long processus que j'ai jamais lancé

ça s'est donc terminé par :

# badblocks -wsv /dev/sdc
Vérification des blocs défectueux en mode lecture-écriture
Du bloc 0 au bloc 1953514583
Test en cours avec le motif 0xaa:   0.18% effectué, 10:00 écoulé. (0/0/0 erreurscomplété                                             
Lecture et comparaison : ^C1.25% effectué, 93:50:06 écoulé. (0/0/0 erreurs)

Interrupted at block 219822272

* peut-on déduire quelque conclusion de ce résultat "pitoyable" ?

* meilleure stratégie ?

fornas · le 26 septembre 2014

Bonjour,

résultat des smart-tests court et long :

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       40%       356         2573974660
# 2  Short offline       Completed without error       00%       352         -
# 3  Extended offline    Completed: read failure       90%       249         2635179183
# 4  Short offline       Completed: read failure       90%       249         2901110171
# 5  Short offline       Completed: read failure       90%       226         2415837717
# 6  Short offline       Completed: read failure       90%       226         2943468502
# 7  Short offline       Completed: read failure       90%       225         2756480134
# 8  Extended offline    Completed: read failure       90%       207         3473185238
# 9  Short offline       Completed without error       00%       182         -
#10  Short offline       Completed without error       00%       182         -
#11  Short offline       Completed without error       00%       181         -
#12  Short offline       Interrupted (host reset)      90%        95         -
#13  Short offline       Aborted by host               90%        95         -

donc, mort ?

fornas · le 26 septembre 2014

lu ici :

http://forums.freenas.org/index.php?threads/smart-results-problems.19447

DrKK, Mar 18, 2014

Pay attention to the raw counts on the right. In particular, #197, the "current pending sectors", is often the first one to go south. If the raw number is anything but zero, start worrying, and if it's more than a handful, start panicking. You can see yours is 2192 or something!!

You also have

reallocated sectors, and 90 "uncorrectable" sectors. Again, these are numbers that should be 0 until a drive starts dying. 90 uncorrectable sectors is an emergency.

In short, your drive is in "code red--emergency" status. It will fail soon. Accordingly, it should be replaced.

le "smart -a" en question indique notamment

Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       
Temperature_Celsius     0x0022   044   052   000    Old_age   Always       -       44 (0 23 0 0 0)
Current_Pending_Sector  0x0012   087   087   000    Old_age   Always       -       2192
Offline_Uncorrectable   0x0010   087   087   000    Old_age   Offline      -       2192

"mon" smart -a

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   135   135   054    Pre-fail  Offline      -       99
  3 Spin_Up_Time            0x0007   148   148   024    Pre-fail  Always       -       407 (Average 333)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       11
  5 Reallocated_Sector_Ct   0x0033   088   088   005    Pre-fail  Always       -       382
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   146   146   020    Pre-fail  Offline      -       29
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       369
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       10
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       12
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       12
194 Temperature_Celsius     0x0002   153   153   000    Old_age   Always       -       39 (Min/Max 19/43)
196 Reallocated_Event_Count 0x0032   086   086   000    Old_age   Always       -       416
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

* peut-on dire qu'avec un seul "current pending sector", apparemment non "reallocate-able", il s'agit d'un avertissement et qu'il faut déjà commencer à chercher le remplaçant ?

* a-t-on une idée de la durée de vie après un tel "avertissement" ?

Modifié le 26 septembre 2014 par fornas

fornas · le 26 septembre 2014

ici :

http://forums.freenas.org/index.php?threads/1-currently-unreadable-pending-sectors.10213/#post-45550

IMHO 1 unreadable sector is not a reason to replace a HDD, if you are running with double redundancy I would just fix the problem wthout even effecting uptime. I fixed this problem without having to replace the HDD (though I did order a spare immediately cause I was scared of the unknown), and the HDD runs just fine now.

et une procédure conseillée :

Run a long selftest on the disk ada#, it should fail somewhere
Code (text):

smartctl -t long /dev/ada#

check the smart information for the unreadable sector, lets call it 'X'
Code (text):

smartctl -A /dev/ada#

change the syscontrol and try writing to the sector. Change the 'X' below
Code (text):

sysctl kern.geom.debugflags=16
dd if=/dev/zero of=/dev/ada# bs=4096 count=1 seek=X conv=noerror,sync

check the smart information to see if 'Current_Pending_Sector' went to 0, you may need to repeat the steps above multiple times if there are multiple unreadable sectors..
Code (text):

smartctl -A /dev/ada#

Now run another smart test and hopefully it can complete without error.
Code (text):

smartctl -t long /dev/ada#
smartctl -A /dev/ada#

Now run a scrub (either from the gui or with 'zpool scrub poolname').
Check the scrub's status and hopefully it fixes some errors.
Code (text):

zpool status -v poolname

apparemment la fin "scrub" (nettoyage) concerne les NAS en ZFS.

* des expériences réussies avec cette procédure ?

fornas · le 26 septembre 2014

j'ai l'impression que ça ne concerne que FreeBSD.

gaetan.cambier · le 26 septembre 2014

Tu peux toujours tenter la commande DD sur chaque secteur défectueux mais ce qui m'inquiète c'est que le smart s'arrêtait a 90% au début et maintenant 40%. C'est plutôt mal partit

fornas · le 27 septembre 2014

Bonjour Gaetan,

le smart -a donne :

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   135   135   054    Pre-fail  Offline      -       99
  3 Spin_Up_Time            0x0007   148   148   024    Pre-fail  Always       -       407 (Average 333)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       11
  5 Reallocated_Sector_Ct   0x0033   088   088   005    Pre-fail  Always       -       382
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   146   146   020    Pre-fail  Offline      -       29
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       369
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       10
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       12
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       12
194 Temperature_Celsius     0x0002   153   153   000    Old_age   Always       -       39 (Min/Max 19/43)
196 Reallocated_Event_Count 0x0032   086   086   000    Old_age   Always       -       416
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

avec notamment :

Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0

ici :

http://forums.freenas.org/index.php?threads/1-currently-unreadable-pending-sectors.10213/#post-45550

IMHO 1 unreadable sector is not a reason to replace a HDD, if you are running with double redundancy I would just fix the problem wthout even effecting uptime. I fixed this problem without having to replace the HDD (though I did order a spare immediately cause I was scared of the unknown), and the HDD runs just fine now.

ici :

http://www.smartmontools.org/browser/trunk/www/badblockhowto.xml

Unassigned sectors

This section was written by Kay Diederichs. Even though this section assumes Linux and the ext2/ext3 file system, the strategy should be more generally applicable.

I read your badblocks-howto at and greatly benefited from it. One thing that's (maybe) missing is that often the smartctl -t long scan finds a bad sector which is not assigned to any file. In that case it does not help to run debugfs, or rather debugfs reports the fact that no file owns that sector. Furthermore, it is somewhat laborious to come up with the correct numbers for debugfs, and debugfs is slow ...

So what I suggest in the case of presence of Current_Pending_Sector/Offline_Uncorrectable errors is to create a huge file on that file system.

dd if=/dev/zero of=/some/mount/point bs=4k

creates the file. Leave it running until the partition/file system is full. This will make the disk reallocate those sectors which do not belong to a file. Check the smartctl -a output after that and make sure that the sectors are reallocated. If any remain, use the debugfs method. Of course the usual caveats apply - back it up first, and so on.

comme le disque n'était pas partitionné mais en principe "rempli de zeros", je l'ai formaté en ext4 et souhaite tenter la création de "huge file" mais ne comprend pas le code proposé :

dd if=/dev/zero of=/some/mount/point bs=4k

ma partition est : /dev/sdc1
le code ci-dessus doit-il être compris comme :

dd if=/dev/zero of=/dev/sdc1 bs=4k

comme la première chose faite est :

# dd if=/dev/zero of=/dev/sdb bs=1M

(le sdb s'est changé en sdc entre-temps)

ne serait-ce pas une "redite" ou "refaite" ?

fornas · le 29 septembre 2014

Bonjour,

un avis "éclairant" ?

fornas · le 29 septembre 2014

bon, j'ai appuyé sur le "détonnateur-dd"
rdv dans 3-4 jours !

fornas · le 30 septembre 2014

Bonjour,
plus tôt que prévu (la procédure "zero" est quatre fois plus rapide que celle du "badblocks" : 1 j contre 4j environ), la re-lance du dd a été payante

# smartctl -l selftest /dev/sdc

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       405         -
# 2  Short offline       Completed without error       00%       399         -
# 3  Extended offline    Completed: read failure       40%       356         2573974660
# 4  Short offline       Completed without error       00%       352         -
# 5  Extended offline    Completed: read failure       90%       249         2635179183
# 6  Short offline       Completed: read failure       90%       249         2901110171
# 7  Short offline       Completed: read failure       90%       226         2415837717
# 8  Short offline       Completed: read failure       90%       226         2943468502
# 9  Short offline       Completed: read failure       90%       225         2756480134
#10  Extended offline    Completed: read failure       90%       207         3473185238
#11  Short offline       Completed without error       00%       182         -
#12  Short offline       Completed without error       00%       182         -
#13  Short offline       Completed without error       00%       181         -
#14  Short offline       Interrupted (host reset)      90%        95         -
#15  Short offline       Aborted by host               90%        95         -
7 of 7 failed self-tests are outdated by newer successful extended offline self-test # 1

# smartctl -A /dev/sdc

smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.14-0.bpo.1-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   135   135   054    Pre-fail  Offline      -       99
  3 Spin_Up_Time            0x0007   148   148   024    Pre-fail  Always       -       407 (Average 333)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       12
  5 Reallocated_Sector_Ct   0x0033   085   085   005    Pre-fail  Always       -       436
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   146   146   020    Pre-fail  Offline      -       29
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       406
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       11
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       13
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       13
194 Temperature_Celsius     0x0002   146   146   000    Old_age   Always       -       41 (Min/Max 19/43)
196 Reallocated_Event_Count 0x0032   084   084   000    Old_age   Always       -       470
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

donc l'idée de créer une partition avant de lancer la procédure "zeros" a l'air d'être bonne

Modifié le 1 octobre 2014 par fornas

fornas · le 2 février 2015

bonjour,

finalement, il n'aura pas duré longtemps...

donc remplacé par un SEAGATE-NAS-HDD-ST4000VN000-3.5"-4To

qui vient de se faire "badblocké"

Connexion

[Resolu] Recuperation Du 4

Messages recommandés

fornas

gaetan.cambier

fornas

gaetan.cambier

fornas

gaetan.cambier

fornas

fornas

fornas

fornas

fornas

gaetan.cambier

fornas

fornas

fornas

fornas

fornas

Rejoindre la conversation

Qui est en ligne 5 membres, 0 anonyme, 118 invités (Afficher la liste complète)

Contributeurs populaires

Forum

Discussions

Articles

Information importante