From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sage Weil Subject: Re: new OSD re-using old OSD id fails to boot Date: Wed, 9 Dec 2015 06:00:06 -0800 (PST) Message-ID: References: <5663158D.1010302@dachary.org> <56678036.5050909@redhat.com> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Return-path: Received: from mx1.redhat.com ([209.132.183.28]:56792 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754924AbbLIOAK (ORCPT ); Wed, 9 Dec 2015 09:00:10 -0500 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Wei-Chung Cheng Cc: David Zafman , Loic Dachary , Ceph Development On Wed, 9 Dec 2015, Wei-Chung Cheng wrote: > Hi Loic, > > I try to reproduce this problem on my CentOS7. > I can not do the same issue. > This is my version: > ceph version 10.0.0-928-g8eb0ed1 (8eb0ed1dcda9ee6180a06ee6a4415b112090c534) > Would you describe more detail? > > > Hi David, Sage, > > In most of time, when we found the osd failure, the OSD is already in > `out` state. > It could not avoid the redundant data movement unless we could set the > osd noout when failure. > Is it right? (Means if OSD go into `out` state, it will make some > redundant data movement) > > Could we try the traditional spare behavior? (Set some disks backup > and auto replace the broken device?) > > That can replace the failure osd before it go into the `out` state. > Or we could always set the osd noout? I don't think there is a problem with 'out' if the osd id is reused and the crush position remains the same. And I expect usually the OSD will be replaced by a disk with a similar size. If the replacement is smaller (or 0--removed entirely) then you get double-movement, but if it's the same or larger I think it's fine. The sequence would be something like up + in down + in 5-10 minutes go by down + out (marked out by monitor) new replicas uniformly distributed across cluster days go by disk removed new disk inserted ceph-disk recreate ... recreates osd dir w/ the same id, new uuid on startup, osd adjusts crush weight (maybe.. usually by a smallish amount) up + in replicas migrate back to new device sage