A generalized Fellegi-Holt paradigm for automatic error localization
4. A generalized error localization problemA generalized Fellegi-Holt paradigm for automatic error localization
4. A generalized error localization problem
Let
G
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaatuuDJXwAK1
uy0HwmaeHbfv3ySLgzG0uy0Hgip5wzaGqbaabaaaaaaaaapeGae8Nb
XFeaaa@4316@
be a finite set of allowed
edit operations for a given application of automatic editing. Informally, I
propose to generalize the error localization problem of Fellegi and Holt (1976)
by replacing “the smallest subset of variables that can be imputed to make the
record consistent” with “the shortest sequence of allowed edit operations that
can be applied to make the record consistent”. To give a formal definition of
this generalized error localization problem, some new notation and concepts
need to be introduced.
Consider a sequence of points
x
=
x
0
,
x
1
,
…
,
x
t
=
y
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhacqGH9aqpcaWH4bWdamaaBaaaleaapeGaaGimaaWdaeqa
aOWdbiaacYcacaWH4bWdamaaBaaaleaapeGaaGymaaWdaeqaaOWdbi
aacYcacqGHMacVcaGGSaGaaCiEa8aadaWgaaWcbaWdbiaadshaa8aa
beaak8qacqGH9aqpcaWH5baaaa@4643@
in
ℝ
p
.
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaatuuDJXwAK1
uy0HMmaeHbfv3ySLgzG0uy0HgiuD3BaGqbaabaaaaaaaaapeGae8xh
Hi1damaaCaaaleqabaWdbiaadchaaaGcpaGaaiOlaaaa@448D@
A path from
x
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhaaaa@38CA@
to
y
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahMhaaaa@38CB@
is defined as a sequence of distinct edit operations
g
1
,
…
,
g
t
∈
G
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadEgapaWaaSbaaSqaa8qacaaIXaaapaqabaGcpeGaaiilaiab
gAci8kaacYcacaWGNbWdamaaBaaaleaapeGaamiDaaWdaeqaaOWdbi
abgIGioprr1ngBPrwtHrhAXaqeguuDJXwAKbstHrhAG8KBLbacfaGa
e8NbXFeaaa@4BFC@
such that
x
n
=
g
n
(
x
n
−
1
)
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhapaWaaSbaaSqaa8qacaWGUbaapaqabaGcpeGaeyypa0Ja
am4za8aadaWgaaWcbaWdbiaad6gaa8aabeaak8qadaqadaWdaeaape
GaaCiEa8aadaWgaaWcbaWdbiaad6gacqGHsislcaaIXaaapaqabaaa
k8qacaGLOaGaayzkaaaaaa@4342@
for all
n
∈
{
1
,
…
,
t
}
.
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
Wdbiaad6gacqGHiiIZdaGadaWdaeaapeGaaGymaiaacYcacqGHMacV
caGGSaGaamiDaaGaay5Eaiaaw2haaiaac6caaaa@41E4@
(Note: In the case that
g
n
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadEgapaWaaSbaaSqaa8qacaWGUbaapaqabaaaaa@3A02@
contains free parameters, one should interpret
this equality as “there exist feasible parameter values such that
g
n
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadEgapaWaaSbaaSqaa8qacaWGUbaapaqabaaaaa@3A02@
maps
x
n
−
1
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhapaWaaSbaaSqaa8qacaWGUbGaeyOeI0IaaGymaaWdaeqa
aaaa@3BBF@
to
x
n
”
.
)
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbmaabiaabaGaaCiEa8aadaWgaaWcbaWdbiaad6gaa8aabeaaieaa
kiaa=1bicaGGUaaapeGaayzkaaaaaa@3C72@
A path is denoted by
P
=
[
g
1
,
…
,
g
t
]
.
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadcfacqGH9aqpdaWadaWdaeaapeGaam4za8aadaWgaaWcbaWd
biaaigdaa8aabeaak8qacaGGSaGaeyOjGWRaaiilaiaadEgapaWaaS
baaSqaa8qacaWG0baapaqabaaak8qacaGLBbGaayzxaaGaaiOlaaaa
@43C9@
The set of all possible paths from
x
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhaaaa@38CA@
to
y
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahMhaaaa@38CB@
is denoted by
P (
x , y
) .
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaatuuDJXwAK1
uy0HwmaeHbfv3ySLgzG0uy0Hgip5wzaGqbaabaaaaaaaaapeGae83f
XJ1aaeWaa8aabaWdbiaahIhacaGGSaGaaCyEaaGaayjkaiaawMcaai
aac6caaaa@4845@
This set may be empty. Later, I will use
P (
x ; G
)
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaatuuDJXwAK1
uy0HwmaeHbfv3ySLgzG0uy0Hgip5wzaGqbaabaaaaaaaaapeGae83f
XJ1aaeWaa8aabaWdbiaahIhacaGG7aGaam4raaGaayjkaiaawMcaaa
aa@476C@
to denote, for a given subset
G
⊆
G
,
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadEeacqGHgksZtuuDJXwAK1uy0HwmaeHbfv3ySLgzG0uy0Hgi
p5wzaGqbaiab=zq8hjaacYcaaaa@4693@
the set of all paths starting in
x
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhaaaa@38CA@
that consist of the edit operations in
G
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadEeaaaa@3895@
in some order (without specifying the free
parameters); if
G
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadEeaaaa@3895@
contains
t
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
Wdbiaadshaaaa@38C2@
elements,
P (
x ; G
)
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaatuuDJXwAK1
uy0HwmaeHbfv3ySLgzG0uy0Hgip5wzaGqbaabaaaaaaaaapeGae83f
XJ1aaeWaa8aabaWdbiaahIhacaGG7aGaam4raaGaayjkaiaawMcaaa
aa@476C@
contains
t
!
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadshacaGGHaaaaa@3967@
paths.
To each edit operation
g
∈
G
,
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadEgacqGHiiIZtuuDJXwAK1uy0HwmaeHbfv3ySLgzG0uy0Hgi
p5wzaGqbaiab=zq8hjaacYcaaaa@4636@
one can associate a weight
w
g
>
0
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadEhapaWaaSbaaSqaa8qacaWGNbaapaqabaGcpeGaeyOpa4Ja
aGimaaaa@3BE7@
that expresses the costs of applying edit
operation
g
.
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadEgacaGGUaaaaa@3967@
In particular, the weight of an FH operation is
to be chosen equal to the confidence weight of the variable that it imputes.
Now the length of a path
P
=
[
g
1
,
…
,
g
t
]
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadcfacqGH9aqpdaWadaWdaeaapeGaam4za8aadaWgaaWcbaWd
biaaigdaa8aabeaak8qacaGGSaGaeyOjGWRaaiilaiaadEgapaWaaS
baaSqaa8qacaWG0baapaqabaaak8qacaGLBbGaayzxaaaaaa@4317@
can be defined as the sum of the weights of
its constituent edit operations:
ℓ
(
P
)
=
∑
n
=
1
t
w
g
n
,
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiabloriSnaabmaapaqaa8qacaWGqbaacaGLOaGaayzkaaGaeyyp
a0ZaaabmaeaacaWG3bWdamaaBaaaleaapeGaam4za8aadaWgaaadba
Wdbiaad6gaa8aabeaaaSqabaaapeqaaiaad6gacqGH9aqpcaaIXaaa
baGaamiDaaqdcqGHris5aOWdaiaacYcaaaa@4686@
where, by convention, the empty path has
length zero. The distance from
x
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhaaaa@38CA@
to
y
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahMhaaaa@38CB@
is defined as the length of the shortest path
that connects
x
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhaaaa@38CA@
to
y
:
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahMhacaGG6aaaaa@3989@
d (
x , y
) = {
min {
ℓ (
P
) | P ∈ P (
x , y
) }
if P (
x , y
) ≠ ∅ ,
∞
otherwise .
MathType@MTEF@5@5@+=
feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadsgadaqadaWdaeaapeGaaCiEaiaacYcacaWH5baacaGLOaGa
ayzkaaGaeyypa0Zaaiqaa8aabaqbaeaabiGaaaqaa8qaciGGTbGaai
yAaiaac6gadaGadaWdaeaapeGaeS4eHW2aaeWaa8aabaWdbiaadcfa
aiaawIcacaGLPaaacaqG8bGaamiuaiabgIGioprr1ngBPrwtHrhAXa
qeguuDJXwAKbstHrhAG8KBLbacfaGae83fXJ1aaeWaa8aabaWdbiaa
hIhacaGGSaGaaCyEaaGaayjkaiaawMcaaaGaay5Eaiaaw2haaaWdae
aapeGaaeyAaiaabAgacaqGGcGae83fXJ1aaeWaa8aabaWdbiaahIha
caGGSaGaaCyEaaGaayjkaiaawMcaaiabgcMi5kabgwGiglaacYcaa8
aabaWdbiabg6HiLcWdaeaapeGaae4BaiaabshacaqGObGaaeyzaiaa
bkhacaqG3bGaaeyAaiaabohacaqGLbGaaeOlaaaaaiaawUhaaaaa@7368@
In
general,
d
(
x
,
y
)
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadsgadaqadaWdaeaapeGaaCiEaiaacYcacaWH5baacaGLOaGa
ayzkaaaaaa@3D0D@
satisfies the standard axioms of a metric except that it need not be symmetric in
x
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhaaaa@38CA@
and
y
;
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahMhacaGG7aaaaa@398A@
it is a so-called quasimetric (Scholtus 2014). Accordingly,
d
(
x
,
y
)
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadsgadaqadaWdaeaapeGaaCiEaiaacYcacaWH5baacaGLOaGa
ayzkaaaaaa@3D0D@
represents “the distance from
x
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhaaaa@38CA@
to
y
”
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahMhaieaacaWFDacaaa@3992@
rather than “the distance between
x
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhaaaa@38CA@
and
y
”
.
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahMhaieaacaWFDaIaaiOlaaaa@3A44@
The distance from
x
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhaaaa@38CA@
to any closed, non-empty subset
D
⊆
ℝ
p
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadseacqGHgksZtuuDJXwAK1uy0HMmaeHbfv3ySLgzG0uy0Hgi
uD3BaGqbaiab=1ris9aadaahaaWcbeqaa8qacaWGWbaaaaaa@468C@
is defined as the distance to the nearest
y
∈
D
:
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahMhacqGHiiIZcaWGebGaaiOoaaaa@3BD6@
d
(
x
,
D
)
=
min
{
d
(
x
,
y
)
|
y
∈
D
}
.
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadsgadaqadaWdaeaapeGaaCiEaiaahYcacaWGebaacaGLOaGa
ayzkaaGaeyypa0JaciyBaiaacMgacaGGUbWaaiWaa8aabaWdbiaads
gadaqadaWdaeaapeGaaCiEaiaacYcacaWH5baacaGLOaGaayzkaaGa
aeiFaiaahMhacqGHiiIZcaWGebaacaGL7bGaayzFaaGaaiOlaaaa@4D45@
For the purpose of error localization, the
closed, non-empty subset of
ℝ
p
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaatuuDJXwAK1
uy0HMmaeHbfv3ySLgzG0uy0HgiuD3BaGqbaabaaaaaaaaapeGae8xh
Hi1damaaCaaaleqabaWdbiaadchaaaaaaa@43C2@
that is of particular interest is the set
D
0
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadseapaWaaSbaaSqaa8qacaaIWaaapaqabaaaaa@39A6@
of all points that satisfy (2.1).
I can now formulate the generalized error localization
problem.
Problem. Consider a given set of consistent records
D
0
,
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadseapaWaaSbaaSqaa8qacaaIWaaapaqabaGccaGGSaaaaa@3A60@
a given set of allowed edit operations
G
,
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaatuuDJXwAK1
uy0HwmaeHbfv3ySLgzG0uy0Hgip5wzaGqbaabaaaaaaaaapeGae8Nb
XFKaaiilaaaa@43C6@
and a given record
x
.
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhacaGGUaaaaa@397C@
If
d
(
x
,
D
0
)
=
∞
,
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadsgadaqadaWdaeaapeGaaCiEaiaacYcacaWGebWdamaaBaaa
leaapeGaaGimaaWdaeqaaaGcpeGaayjkaiaawMcaaiabg2da9iabg6
HiLkaacYcaaaa@4129@
then the error localization problem for
x
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhaaaa@38CA@
is infeasible. Otherwise, any shortest path leading
to a record
y
∈
D
0
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahMhacqGHiiIZcaWGebWdamaaBaaaleaapeGaaGimaaWdaeqa
aaaa@3C2C@
such that
d
(
x
,
y
)
<
∞
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadsgadaqadaWdaeaapeGaaCiEaiaacYcacaWH5baacaGLOaGa
ayzkaaGaeyipaWJaeyOhIukaaa@3F82@
is called a feasible
solution to the error localization problem for
x
.
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhacaGGUaaaaa@397C@
A feasible solution is called optimal if it leads to a record
x
*
∈
D
0
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhapaWaaWbaaSqabeaapeGaaiOkaaaakiabgIGiolaadsea
paWaaSbaaSqaa8qacaaIWaaapaqabaaaaa@3D2F@
such that
d
(
x
,
x
*
)
=
d
(
x
,
D
0
)
.
(
4.1
)
MathType@MTEF@5@5@+=
feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadsgadaqadaWdaeaapeGaaCiEaiaacYcacaWH4bWdamaaCaaa
leqabaWdbiaacQcaaaaakiaawIcacaGLPaaacqGH9aqpcaWGKbWaae
Waa8aabaWdbiaahIhacaGGSaGaamira8aadaWgaaWcbaWdbiaaicda
a8aabeaaaOWdbiaawIcacaGLPaaacaGGUaGaaGzbVlaaywW7caaMf8
UaaGzbVlaaywW7caGGOaGaaGinaiaac6cacaaIXaGaaiykaaaa@514A@
Formally, then, the generalized error localization problem consists of
finding an optimal path of edit operations.
Remark 1. In general, there may be infinitely many
records
x
*
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhapaWaaWbaaSqabeaapeGaaiOkaaaaaaa@39C4@
in
D
0
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadseapaWaaSbaaSqaa8qacaaIWaaapaqabaaaaa@39A6@
that satisfy (4.1) and can be reached by the
same path of edit operations. To solve the error localization problem, it is
sufficient to find an optimal path. Constructing an associated record
x
*
∈
D
0
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhapaWaaWbaaSqabeaapeGaaiOkaaaakiabgIGiolaadsea
paWaaSbaaSqaa8qacaaIWaaapaqabaaaaa@3D2F@
may then be regarded as a generalization of
the consistent imputation problem; cf. the discussion on imputation at the end
of Section 3.
Remark 2. The above error localization problem is
infeasible for records that cannot be mapped onto
D
0
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadseapaWaaSbaaSqaa8qacaaIWaaapaqabaaaaa@39A6@
by any combination of distinct edit operations
in
G
.
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaatuuDJXwAK1
uy0HwmaeHbfv3ySLgzG0uy0Hgip5wzaGqbaabaaaaaaaaapeGae8Nb
XFKaaiOlaaaa@43C8@
To avoid this situation,
G
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaatuuDJXwAK1
uy0HwmaeHbfv3ySLgzG0uy0Hgip5wzaGqbaabaaaaaaaaapeGae8Nb
XFeaaa@4316@
should be chosen sufficiently large so that
d
(
x
,
D
0
)
<
∞
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadsgadaqadaWdaeaapeGaaCiEaiaacYcacaWGebWdamaaBaaa
leaapeGaaGimaaWdaeqaaaGcpeGaayjkaiaawMcaaiabgYda8iabg6
HiLcaa@4077@
for all
x
∈
ℝ
p
.
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhacqGHiiIZtuuDJXwAK1uy0HMmaeHbfv3ySLgzG0uy0Hgi
uD3BaGqbaiab=1ris9aadaahaaWcbeqaa8qacaWGWbaaaOWdaiaac6
caaaa@4712@
In what follows, I tacitly assume that
G
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaatuuDJXwAK1
uy0HwmaeHbfv3ySLgzG0uy0Hgip5wzaGqbaabaaaaaaaaapeGae8Nb
XFeaaa@4316@
has this property. An easy way
–
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9IqqrpepC0xbbL8F4rqqr=hbbG8pue9Fbe9q8
qqvqFr0dXdbrVc=b0P0xb9peuD0xXddrpe0=1qpeea0=yrVue9Fve9
Fve8meaabaqaciGacaGaaeWabaWaaeaaeaaakeaaieaajugybabaaa
aaaaaapeGaa83eGaaa@384B@
not necessarily the only way
–
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9IqqrpepC0xbbL8F4rqqr=hbbG8pue9Fbe9q8
qqvqFr0dXdbrVc=b0P0xb9peuD0xXddrpe0=1qpeea0=yrVue9Fve9
Fve8meaabaqaciGacaGaaeWabaWaaeaaeaaakeaaieaajugybabaaa
aaaaaapeGaa83eGaaa@384B@
to achieve this is by letting
G
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaatuuDJXwAK1
uy0HwmaeHbfv3ySLgzG0uy0Hgip5wzaGqbaabaaaaaaaaapeGae8Nb
XFeaaa@4316@
contain at least all FH operations. That this
is sufficient follows from the fact that any two points in
ℝ
p
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaatuuDJXwAK1
uy0HMmaeHbfv3ySLgzG0uy0HgiuD3BaGqbaabaaaaaaaaapeGae8xh
Hi1damaaCaaaleqabaWdbiaadchaaaaaaa@43C2@
are connected by a path that concatenates the
FH operations associated with the coordinates on which they differ.
Remark 3. It is not difficult to see that the above error
localization problem reduces to the original problem of Fellegi and Holt (1976)
in the special case that
G
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaatuuDJXwAK1
uy0HwmaeHbfv3ySLgzG0uy0Hgip5wzaGqbaabaaaaaaaaapeGae8Nb
XFeaaa@4316@
contains only the FH operations.
Remark 4. As with the original Fellegi-Holt-based error
localization problem, it can be shown that, under certain assumptions,
minimizing
d
(
x
,
y
)
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadsgadaqadaWdaeaapeGaaCiEaiaacYcacaWH5baacaGLOaGa
ayzkaaaaaa@3D0D@
over all
y
∈
D
0
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahMhacqGHiiIZcaWGebWdamaaBaaaleaapeGaaGimaaWdaeqa
aaaa@3C2C@
for a given observed record
x
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaahIhaaaa@38CA@
is approximately equivalent to maximizing the
likelihood of the associated unobserved error-free record. The argument closely
follows that of Kruskal (1983, pages 38-39) for the so-called Levenshtein
distance in the context of approximate string matching. This requires first of
all that the edits (2.1) be hard edits, i.e. , failed only by erroneous values.
In addition, it must be assumed that the stochastic “error generating process”
ℰ
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaatuuDJXwAK1
uy0HwmaeHbfv3ySLgzG0uy0Hgip5wzaGqbaabaaaaaaaaapeGae8hm
Hueaaa@426E@
introduced in Section 3 has the following
properties:
There exists a one-to-one correspondence between the set of errors that can
occur under
ℰ
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaatuuDJXwAK1
uy0HwmaeHbfv3ySLgzG0uy0Hgip5wzaGqbaabaaaaaaaaapeGae8hm
Hueaaa@426E@
and the set of allowed edit operations
G
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaatuuDJXwAK1
uy0HwmaeHbfv3ySLgzG0uy0Hgip5wzaGqbaabaaaaaaaaapeGae8Nb
XFeaaa@4316@
that correct them.
The errors in
ℰ
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaatuuDJXwAK1
uy0HwmaeHbfv3ySLgzG0uy0Hgip5wzaGqbaabaaaaaaaaapeGae8hm
Hueaaa@426E@
occur independently of each other.
The error corresponding to operation
g
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadEgaaaa@38B5@
occurs with known probability
p
g
.
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadchapaWaaSbaaSqaa8qacaWGNbaapaqabaGccaGGUaaaaa@3AC0@
Finally, analogous to (2.3), the weights
w
g
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadEhapaWaaSbaaSqaa8qacaWGNbaapaqabaaaaa@3A0B@
should be chosen according to
w
g
=
−
log
(
p
g
1
−
p
g
)
.
(
4.2
)
MathType@MTEF@5@5@+=
feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadEhapaWaaSbaaSqaa8qacaWGNbaapaqabaGcpeGaeyypa0Ja
eyOeI0IaciiBaiaac+gacaGGNbWaaeWaa8aabaWdbmaalaaapaqaa8
qacaWGWbWdamaaBaaaleaapeGaam4zaaWdaeqaaaGcbaWdbiaaigda
cqGHsislcaWGWbWdamaaBaaaleaapeGaam4zaaWdaeqaaaaaaOWdbi
aawIcacaGLPaaacaGGUaGaaGzbVlaaywW7caaMf8UaaGzbVlaaywW7
caGGOaGaaGinaiaac6cacaaIYaGaaiykaaaa@530D@
Under
these assumptions, Scholtus (2014) adapted the argument of Kruskal (1983) to
show that the optimal solution to error localization problem (4.1) can be
justified as an approximate maximum likelihood estimator. [Note: The derivation
in Scholtus (2014) assumed in addition that all
p
g
≪
1
,
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadchapaWaaSbaaSqaa8qacaWGNbaapaqabaGcpeGaeSOAI0Ja
aGymaiaacYcaaaa@3CE3@
in which case
w
g
≈
−
log
p
g
.
MathType@MTEF@5@5@+=
feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn
hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr
4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrpipeea0xe9Lqpe0x
e9q8qqvqFr0dXdHiVc=bYP0xH8peuj0lXxdrpe0=1qpeeaY=rrVue9
Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa
WdbiaadEhapaWaaSbaaSqaa8qacaWGNbaapaqabaGcpeGaeyisISRa
eyOeI0IaciiBaiaac+gacaGGNbGaamiCa8aadaWgaaWcbaWdbiaadE
gaa8aabeaakiaac6caaaa@428A@
This assumption is unnecessary; cf. Liepins (1980).]
ISSN : 1492-0921
Editorial policy
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca , Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.
© Minister of Industry, 2016
Use of this publication is governed by the Statistics Canada Open Licence Agreement .
Catalogue No. 12-001-X
Frequency: semi-annual
Ottawa
Date modified:
2016-06-22