Modelldiagnostik für Zähldatenzeitreihen

 

Projektpartner:

 

DFG-Projekt

 
Zweijähriges Projekt, gefördert durch die Deutsche Forschungsgemeinschaft (DFG) – Projektnummer 437270842, und vorbereitet im Rahmen eines IFF-Projekts.

Projektziele:

Zeitreihen aus Zähldaten sind in den verschiedensten Situationen mit wirtschaftswissenschaftlichem Kontext anzutreffen und können vielfältige Formen hinsichtlich ihrer Abhängigkeitsstruktur oder Randverteilung aufweisen. Da klassische Modelle für reellwertige Zeitreihen den diskreten Charakter von Zähldaten nicht bewahren können, gibt es ein sehr großes Portfolio an speziell für Zähldatenzeitreihen entwickelten Modellen. Eine adäquate Modellierung des Zähldatenprozesses ist wichtig, um Vorhersagen treffen zu können, den weiteren Verlauf der Zeitreihe zu überwachen, um strukturelle Änderungen schnellstmöglich aufzudecken, oder schlicht um ein besseres Verständnis des betrachteten Zählphänomens zu erlangen.

Das geplante Forschungsvorhaben zur Modelldiagnostik für Zähldatenzeitreihen umfasst dabei drei zentrale Schritte im Rahmen des Modellbildungsprozesses: Modellidentifikation, Modellauswahl und Modellvalidierung. Während Verfahren zur Modelldiagnostik reellwertiger (stetiger) Zeitreihen seit Langem und in großer Zahl vorliegen, steckt die Thematik im Hinblick auf die diskretwertigen Zähldatenzeitreihen noch in den Kinderschuhen. Von den bis dato bekannten Verfahren sind einige nur in rudimentärer Form vorhanden (etwa als heuristische Anwendungsempfehlungen), und konkretere, theoretisch fundierte Verfahren sind meist an restriktive Modellannahmen geknüpft oder beziehen sich auf isolierte Charakteristika wie etwa das Dispersionsverhalten. Entsprechende Aussagen gelten auch für Anpassungstests: Während es zahlreiche Anpassungstests für stetige Zeitreihen gibt, die nicht nur auf bestimmte Modelle, sondern auch auf ganze Modellklassen testen können, sind die bisher vorliegenden Anpassungstests nur eingeschränkt, etwa unter zusätzlichen parametrischen Annahmen, einsetzbar.

Das geplante Forschungsvorhaben weist zwei komplementäre Stoßrichtungen zur Modelldiagnostik in Zähldatenzeitreihen auf. Zum einen sollen parametrische Verfahren zur Modelldiagnostik für Zähldatenzeitreihen entwickelt werden, die vielfältige Verteilungscharakteristika und/oder Abhängigkeitsmuster miteinbeziehen. Auch sollen für reellwertige Zeitreihen erprobte diagnostische Werkzeuge etwa durch geeignete parametrische Bootstrap-Implementierungen auch für Zähldatenzeitreihen anwendbar gemacht werden. Zum anderen sollen auf gemeinsamen Verteilungen basierende Anpassungstests entwickelt werden, die in der Lage sind konsistent zwischen verschiedenen Modellklassen zu unterscheiden. Zu deren Implementierung, aber auch um eine breitere Anwendung der zuvor angeführten diagnostischen Werkzeuge zu ermöglichen, sollen für Zähldatenzeitreihen geeignete semi-parametrische Bootstrap-Verfahren hergeleitet und für die Modelldiagnostik verwendet werden. Für alle vorgeschlagenen Verfahren werden Performanz und Anwendbarkeit eingehend untersucht, sowohl durch umfassende, vergleichende Simulationsstudien wie auch durch Anwendung auf in den Wirtschaftswissenschaften relevante, reale Datenbeispiele.

Projektlaufzeit:

Oktober 2020 – September 2022.

Projektresultate:

  • Die ersten zu entwickelnden parametrischen Verfahren zur Modelldiagnostik beziehen sich auf Zähldatenzeitreihen mit einer Poisson-Randverteilung. Ziel ist es, statistische Testverfahren zu entwickeln, welche die Stein-Chen-Identität verwenden. Als vorbereitende Arbeit hierzu wurde folgender Artikel verfasst:
     
    Weiß, C.H., Aleksandrov, B. (2022):
    Computing (Bivariate) Poisson Moments using Stein–Chen Identities.
    The American Statistician 76(1), pp. 10-15.
     
    Abstract: The (bivariate) Poisson distribution is the most common distribution for (bivariate) count random variables. The univariate Poisson distribution is characterized by the famous Stein–Chen identity. We demonstrate that this identity allows to derive even sophisticated moment expressions in such a simple way that the corresponding computations can be presented in an introductory Statistics class. Then, we newly derive different types of Stein–Chen identity for the bivariate Poisson distribution. These are shown to be very useful for computing joint moments, again in a surprisingly simple way. We also explain how to extend our results to the general multivariate case.
     
  • Aleksandrov, B., Weiß, C.H., Jentsch, C. (2022):
    Goodness-of-Fit Tests for Poisson Count Time Series based on the Stein–Chen Identity.
    Statistica Neerlandica 76(1), pp. 35-64 (open access).
     
    Abstract: To test the null hypothesis of a Poisson marginal distribution, test statistics based on the Stein–Chen identity are proposed. For a wide class of Poisson count time series, the asymptotic distribution of different types of Stein–Chen statistics is derived, also if multiple statistics are jointly applied. The performance of the tests is analyzed with simulations, as well as the question which Stein–Chen functions should be used for which alternative. Illustrative data examples are presented, and possible extensions of the novel Stein–Chen approach are discussed as well.
     
  • Aleksandrov, B., Weiß, C.H., Jentsch, C., Faymonville, M. (2022):
    Novel Goodness-of-Fit Tests for Binomial Count Time Series.
    Statistics 56(5), pp. 957-990 (open access).
     
    Abstract: For testing the null hypothesis of a marginal binomial distribution of bounded count data, we derive novel and flexible goodness-of-fit (GoF) tests. We propose two general approaches to construct moment-based test statistics. The first one relies on properties of higher-order factorial moments, while the second one uses a so-called Stein identity being satisfied under the null. For a broad class of stationary time series processes of bounded counts with joint bivariate binomial distributions of lagged time series values, we derive the limiting distributions of the proposed GoF-test statistics. Among others, our setup covers the binomial autoregressive model, but includes also other binomial time series obtained, e. g., by superpositioning independent binary time series. The test performance under the null and under different alternatives is investigated in simulations. Two data examples are used to illustrate the application of the novel GoF-tests in practice.
     
  • Faymonville, M., Jentsch, C., Weiß, C.H., Aleksandrov, B. (2023):
    Semiparametric estimation of INAR models using roughness penalization.
    Statistical Methods and Applications 32(2), pp. 365-400 (open access).
     
    Abstract: Popular models for time series of count data are integer-valued autoregressive (INAR) models, for which the literature mainly deals with parametric estimation. In this regard, a semiparametric estimation approach is a remarkable exception which allows for estimation of the INAR models without any parametric assumption on the innovation distribution. However, for small sample sizes, the estimation performance of this semiparametric estimation approach may be inferior. Therefore, to improve the estimation accuracy, we propose a penalized version of the semiparametric estimation approach, which exploits the fact that the innovation distribution is often considered to be smooth, i.e. two consecutive entries of the PMF differ only slightly from each other. This is the case, for example, in the frequently used INAR models with Poisson, negative binomially or geometrically distributed innovations. For the data-driven selection of the penalization parameter, we propose two algorithms and evaluate their performance. In Monte Carlo simulations, we illustrate the superiority of the proposed penalized estimation approach and argue that a combination of penalized and unpenalized estimation approaches results in overall best INAR model fits.
     
  • Weiß, C.H., Puig, P., Aleksandrov, B. (2023):
    Optimal Stein-type Goodness-of-Fit Tests for Count Data.
    Biometrical Journal 65(2), 2200073 (open access).
     
    Abstract: Common count distributions, such as the Poisson (binomial) distribution for unbounded (bounded) counts considered here, can be characterized by appropriate Stein identities. These identities, in turn, might be utilized to define a corresponding goodness-of-fit (GoF) test, the test statistic of which involves the computation of weighted means for a user-selected weight function f. Here, the choice of f should be done with respect to the relevant alternative scenario, as it will have great impact on the GoF-test’s performance. We derive the asymptotics of both the Poisson and binomial Stein-type GoF-statistic for general count distributions (we also briefly consider the negative-binomial case), such that the asymptotic power is easily computed for arbitrary alternatives. This allows for an efficient implementation of optimal Stein tests, that is, which are most powerful within a given class F of weight functions. The performance and application of the optimal Stein-type GoF-tests is investigated by simulations and several medical data examples.
     
  • Weiß, C.H., Aleksandrov, B., Faymonville, M., Jentsch, C. (2023):
    Partial Autocorrelation Diagnostics for Count Time Series.
    Entropy 25(1), 105 (open access),
    Special Issue „Discrete-Valued Time Series“.
     
    Abstract: In a time series context, the study of the partial autocorrelation function (PACF) is helpful for model identification. Especially in the case of autoregressive (AR) models, it is widely used for order selection. During the last decades, the use of AR-type count processes, i.e., which also fulfil the Yule-Walker equations and thus provide the same PACF characterization as AR models, increased a lot. This motivates the use of the PACF test also for such count processes. By computing the sample PACF based on the raw data or the Pearson residuals, respectively, findings are usually evaluated based on well-known asymptotic results. However, the conditions for these asymptotics are generally not fulfilled for AR-type count processes, which deteriorates the performance of the PACF test in such cases. Thus, we present different implementations of the PACF test for AR-type count processes, which rely on several bootstrap schemes for count times series. We compare them in simulations with the asymptotic results, and we illustrate them with an application to a real-world data example.
     
  • Wang, S., Weiß, C.H. (2023):
    New Characterizations of the (Discrete) Lindley Distribution and their Applications.
    Mathematics and Computers in Simulation 212, pp. 310-322, 2023.
     
    Abstract: A Stein-type characterization of the Lindley distribution is derived. It is shown that if using the generalized derivative in the sense of distributions, one can choose all indicator functions as the characterization functions class. This extends some known recent results about characterizations of the Lindley distribution. In addition, a new characterization based on another independent exponential random variable is also provided. As an application of the novel results, some moment formulas related to the Lindley distribution are obtained. Furthermore, generalized method-of-moments estimators for both the discrete and continuous Lindley distribution are proposed, which lead to a notably lower bias at the cost of an only modest increase in mean squared error compared to existing estimators. It is also demonstrated how the Stein characterization might be used to construct a goodness-of-fit test with respect to the null hypothesis of the Lindley distribution. The paper concludes with an illustrative real-data example.
     
  • Weiß, C.H. (2024):
    Control Charts for Poisson Counts based on the Stein-Chen Identity.
    Advanced Statistical Methods in Statistical Process Monitoring, Finance, and Environmental Science, Springer, pp. 195-209, 2024 (arXiv preprint).
     
    Abstract: If monitoring Poisson count data for a possible mean shift (while the Poisson distribution is preserved), then the ordinary Poisson exponentially weighted moving-average (EWMA) control chart proved to be a good solution. In practice, however, mean shifts might occur in combination with further changes in the distribution family. Or due to a misspecification during Phase-I analysis, the Poisson assumption might not be appropriate at all. In such cases, the ordinary EWMA chart might not perform satisfactorily. Therefore, two novel classes of generalized EWMA charts are proposed, which utilize the so-called Stein-Chen identity and are thus sensitive to further distributional changes than just sole mean shifts. Their average run length (ARL) performance is investigated with simulations, where it becomes clear that especially the class of so-called „ABC-EWMA charts“ shows an appealing ARL performance. The practical application of the novel Stein-Chen EWMA charts is illustrated with an application to count data from semiconductor manufacturing.
     
  • Faymonville, M., Riffo, J., Rieger, J., Jentsch, C. (2023):
    spINAR: Semiparametric and Parametric Estimation and Bootstrapping of Integer-Valued Autoregressive (INAR) Models.
    R package version 0.1.0, CRAN.
    Full codes available on GitHub.
     
    Abstract: Semiparametric and parametric estimation of INAR models including a finite sample refinement (Faymonville et al. (2022), doi:10.1007/s10260-022-00655-0) for the semiparametric setting introduced in Drost et al. (2009), doi:10.1111/j.1467-9868.2008.00687.x, different procedures to bootstrap INAR data (Jentsch, C. and Weiß, C.H. (2017), doi:10.3150/18-BEJ1057) and flexible simulation of INAR data.
     
  • Aleksandrov, B., Weiß, C.H., Nik, S., Faymonville, M., Jentsch, C. (2024):
    Modelling and Diagnostic Tests for Poisson and Negative-binomial Count Time Series.
    Metrika 87(7), 843–887 (open access).
     
    Abstract: When modelling unbounded counts, their marginals are often assumed to follow either Poisson (Poi) or negative binomial (NB) distributions. To test such null hypotheses, we propose goodness-of-fit (GoF) tests based on statistics relying on certain moment properties. By contrast to most approaches proposed in the count-data literature so far, we do not restrict ourselves to specific low-order moments, but consider a flexible class of functions of generalized moments to construct model-diagnostic tests. These cover GoF-tests based on higher-order factorial moments, which are particularly suitable for the Poi- or NB-distribution where simple closed-form expressions for factorial moments of any order exist, but also GoF-tests relying on the respective Stein’s identity for the Poi- or NB-distribution. In the time-dependent case, under mild mixing conditions, we derive the asymptotic theory for GoF tests based on higher-order factorial moments for a wide family of stationary processes having Poi- or NB-marginals, respectively. This family also includes a type of NB-autoregressive model, where we provide clarification of some confusion caused in the literature. Additionally, for the case of independent and identically distributed counts, we prove asymptotic normality results for GoF-tests relying on a Stein identity, and we briefly discuss how its statistic might be used to define an omnibus GoF-test. The performance of the tests is investigated with simulations for both asymptotic and bootstrap implementations, also considering various alternative scenarios for power analyses. A data example of daily counts of downloads of a TeX editor is used to illustrate the application of the proposed GoF-tests.
     
  • Weiß, C.H. (2024):
    Stein EWMA Control Charts for Count Processes.
    Statistical Methods and Applications in Systems Assurance & Quality, Book Series „Advanced Research in Reliability and System Assurance“, CRC Press, pp. 3-17 (arXiv preprint).
     
    Abstract: The monitoring of serially independent or autocorrelated count processes is considered, having a Poisson or (negative) binomial marginal distribution under in-control conditions. Utilizing the corresponding Stein identities, exponentially weighted moving-average (EWMA) control charts are constructed, which can be flexibly adapted to uncover zero inflation, over- or underdispersion. The proposed Stein EWMA charts‘ performance is investigated by simulations, and their usefulness is demonstrated by a real-world data example from health surveillance.
     
  • Nik, S., Weiß, C.H. (2024):
    Generalized Moment Estimators based on Stein Identities.
    Journal of Statistical Theory and Applications 23(3), 240-274 (open access).
     
    Abstract: For parameter estimation of continuous and discrete distributions, we propose a generalization of the method of moments (MM), where Stein identities are utilized for improved estimation performance. The construction of these Stein-type MM-estimators makes use of a weight function as implied by an appropriate form of the Stein identity. Our general approach as well as potential benefits thereof are first illustrated by the simple example of the exponential distribution. Afterward, we investigate the more sophisticated two-parameter inverse Gaussian distribution and the two-parameter negative-binomial distribution in great detail, together with illustrative real-world data examples. Given an appropriate choice of the respective weight functions, their Stein-MM estimators, which are defined by simple closed-form formulas and allow for closed-form asymptotic computations, exhibit a better performance regarding bias and mean squared error than competing estimators.
     
  • Faymonville, M., Riffo, J., Rieger, J., Jentsch, C. (2024):
    spINAR: An R Package for Semiparametric and Parametric Estimation and Bootstrapping of Integer-Valued Autoregressive (INAR) Models.
    Journal of Open Source Software 9(97), 5386 (open access).
     
    Abstract: Although the statistical literature extensively covers continuous-valued time series processes and their parametric, non-parametric and semiparametric estimation, the literature on count data time series is considerably less advanced. Among the count data time series models, the integer-valued autoregressive (INAR) model is arguably the most popular one finding applications in a wide variety of fields such as medical sciences, environmentology and economics. While many contributions have been made during the last decades, the majority of the literature focuses on parametric INAR models and estimation techniques. Our emphasis is on the complex but efficient and non-restrictive semiparametric estimation of INAR models. The appeal of this approach lies in the absence of a commitment to a parametric family of innovation distributions. In this paper, we describe the need and the features of our R package spINAR which combines semiparametric simulation, estimation and bootstrapping of INAR models also covering its parametric versions.
     
  • Faymonville, M., Jentsch, C., Weiß, C.H. (2024):
    Semi-parametric goodness-of-fit testing for INAR models.
    arXiv preprint.
     
    Abstract: Among the various models designed for dependent count data, integer-valued autoregressive (INAR) processes enjoy great popularity. Typically, statistical inference for INAR models uses asymptotic theory that relies on rather stringent (parametric) assumptions on the innovations such as Poisson or negative binomial distributions. In this paper, we present a novel semi-parametric goodness-of-fit test tailored for the INAR model class. Relying on the INAR-specific shape of the joint probability generating function, our approach allows for model validation of INAR models without specifying the (family of the) innovation distribution. We derive the limiting null distribution of our proposed test statistic, prove consistency under fixed alternatives and discuss its asymptotic behavior under local alternatives. By manifold Monte Carlo simulations, we illustrate the overall good performance of our testing procedure in terms of power and size properties. In particular, it turns out that the power can be considerably improved by using higher-order test statistics. We conclude the article with the application on three real-world economic data sets.
     
  • wird fortgesetzt!

 

IFF-Projekt

 
Einjähriges Projekt, gefördert durch die Interne Forschungsförderung (IFF2018) der HSU Hamburg.

 

Projektresultate:

Die IFF-Förderung des Projektes „Modelldiagnostik für Zähldatenzeitreihen“ ermöglichte die Zwischenfinanzierung einer wissenschaftlichen Mitarbeiterstelle, welche mit einem Nachwuchswissenschaftler besetzt wurde. Im Rahmen dieser Förderung wurden für spezielle Arten von Zähldatenprozess, einen sog. Poisson-INAR(1)- und Poisson-INARCH(1)-Prozess, analytische Ausdrücke für die asymptotische Verteilung von Quadratmittel und Varianz der Pearson-Residuen hergeleitet, was wiederum neuartige Signifikanztests ermöglichte. Die Performanz der entwickelten Tests wurde mittels Simulationen untersucht und alle Ergebnisse gemeinsam mit dem geförderten Nachwuchswissenschaftler in dem Manuskript „Testing the Dispersion Structure of Count Time Series Using Pearson Residuals“ zusammengefasst. Das Manuskript wurde von den „AStA Advances in Statistical Analysis“ zur Veröffentlichung angenommen.

Mithilfe der durch die IFF finanzierten Stelle konnte ein inhaltlich wesentlich erweiterter Projektantrag unter dem gleichlautenden Titel „Modelldiagnostik für Zähldatenzeitreihen“ erarbeitet werden, der in wesentlichen Punkten auf den Inhalten des Antrags der IFF basiert. Der Projektantrag (Sachbeihilfe) wurde am 20.12.2019 von der Deutschen Forschungsgemeinschaft (DFG) bewilligt.

Einen Überblick über das IFF-Projekt bietet folgendes Poster.

Projektlaufzeit:

Juli 2018 – Juni 2019.

Publikationen:

 

HSU

Letzte Änderung: 22. November 2024