The answer above using stochastic equicontinuity works very well, but here I am answering my own question by using a uniform law of large numbers to show that the observed information matrix is a strongly consistent estimator of the information matrix
, i.e. N−1JN(ˆθN(Y))a.s.⟶I(θ0)N−1JN(θ^N(Y))⟶a.s.I(θ0) if we plug-in a strongly consistent sequence of estimators. I hope it is correct in all details.
We will use IN={1,2,...,N}IN={1,2,...,N} to be an index set, and let us temporarily adopt the notation J(˜Y,θ):=J(θ)J(Y~,θ):=J(θ) in order to be explicit about the dependence of J(θ)J(θ) on the random vector ˜YY~. We shall also work elementwise with (J(˜Y,θ))rs(J(Y~,θ))rs and (JN(θ))rs=∑Ni=1(J(Yi,θ))rs(JN(θ))rs=∑Ni=1(J(Yi,θ))rs, r,s=1,...,kr,s=1,...,k, for this discussion. The function (J(⋅,θ))rs(J(⋅,θ))rs is real-valued on the set Rn×Θ∘Rn×Θ∘, and we will suppose that it is Lebesgue measurable for every θ∈Θ∘θ∈Θ∘. A uniform (strong) law of large numbers defines a set of conditions under which
supθ∈Θ|N−1(JN(θ))rs−Eθ[(J(Y1,θ))rs]|=supθ∈Θ|N−1∑Ni=1(J(Yi,θ))rs−(I(θ))rs|a.s⟶0(1)supθ∈Θ∣∣N−1(JN(θ))rs−Eθ[(J(Y1,θ))rs]∣∣=supθ∈Θ∣∣N−1∑Ni=1(J(Yi,θ))rs−(I(θ))rs∣∣⟶a.s0(1)
The conditions that must be satisfied in order that (1) holds are (a) Θ∘Θ∘ is a compact set; (b) (J(˜Y,θ))rs(J(Y~,θ))rs is a continuous function on Θ∘Θ∘ with probability 1; (c) for each θ∈Θ∘θ∈Θ∘ (J(˜Y,θ))rs(J(Y~,θ))rs is dominated by a function h(˜Y)h(Y~), i.e. |(J(˜Y,θ))rs|<h(˜Y)|(J(Y~,θ))rs|<h(Y~); and
(d) for each θ∈Θ∘θ∈Θ∘ Eθ[h(˜Y)]<∞Eθ[h(Y~)]<∞;. These conditions come from Jennrich (1969, Theorem 2).
Now for any yi∈Rnyi∈Rn, i∈INi∈IN and θ′∈S⊆Θ∘θ′∈S⊆Θ∘, the following inequality obviously holds
|N−1∑Ni=1(J(yi,θ′))rs−(I(θ′))rs|≤supθ∈S|N−1∑Ni=1(J(yi,θ))rs−(I(θ))rs|.(2)∣∣N−1∑Ni=1(J(yi,θ′))rs−(I(θ′))rs∣∣≤supθ∈S∣∣N−1∑Ni=1(J(yi,θ))rs−(I(θ))rs∣∣.(2)
Suppose that {ˆθN(Y)}{θ^N(Y)} is a strongly consistent sequence of estimators for θ0θ0, and let ΘN1=BδN1(θ0)⊆K⊆Θ∘ΘN1=BδN1(θ0)⊆K⊆Θ∘ be an open ball in RkRk with radius δN1→0δN1→0 as N1→∞N1→∞, and suppose KK is compact. Then since ˆθN(Y)∈ΘN1θ^N(Y)∈ΘN1 for NN sufficiently large enough we have P[limN{ˆθN(Y)∈ΘN1}]=1P[limN{θ^N(Y)∈ΘN1}]=1 for sufficiently large NN. Together with (2) this implies
P[limN→∞{|N−1∑Ni=1(J(Yi,ˆθN(Y)))rs−(I(ˆθN(Y)))rs|≤supθ∈ΘN1|N−1∑Ni=1(J(Yi,θ))rs−(I(θ))rs|}]=1.(3)P[limN→∞{∣∣N−1∑Ni=1(J(Yi,θ^N(Y)))rs−(I(θ^N(Y)))rs∣∣≤supθ∈ΘN1∣∣N−1∑Ni=1(J(Yi,θ))rs−(I(θ))rs∣∣}]=1.(3)
Now ΘN1⊆Θ∘ΘN1⊆Θ∘ implies conditions (a)-(d) of Jennrich (1969, Theorem 2) apply to ΘN1ΘN1. Thus (1) and (3) imply
P[limN→∞{|N−1∑Ni=1(J(Yi,ˆθN(Y)))rs−(I(ˆθN(Y)))rs|=0}]=1.(4)P[limN→∞{∣∣N−1∑Ni=1(J(Yi,θ^N(Y)))rs−(I(θ^N(Y)))rs∣∣=0}]=1.(4)
Since (I(ˆθN(Y)))rsa.s.⟶I(θ0)(I(θ^N(Y)))rs⟶a.s.I(θ0) then (4) implies that N−1(JN(ˆθN(Y)))rsa.s.⟶(I(θ0))rsN−1(JN(θ^N(Y)))rs⟶a.s.(I(θ0))rs. Note that (3) holds however small ΘN1 is, and so the result in (4) is independent of the choice of N1 other than N1 must be chosen such that
ΘN1⊆Θ∘. This result holds for all r,s=1,...,k, and so in terms of matrices we have N−1JN(ˆθN(Y))a.s.⟶I(θ0).