2014/15 PhD Presentation Event
Tuesday 20 and Wednesday 21 May 2014
Rafal Baranowski
Title: Rankingbased subset selection for highdimensional data
Abstract: In this presentation, we consider highdimensional variable selection problem, where the number of predictors is much larger than the number of observations. Our goal is to identify those predictors, which truly affect the response variable. To achieve this, we propose the Ranking Based Subset Selection (RBSS), which combines subsampling with any variable selection algorithm allowing to rank “importance” of the explanatory variables . Unlike the existing competitors such as Stability Selection (Meinshausen and Bühlmann, 2010), RBSS can identify subsets of relevant predictors selected by the original procedure with relatively low but yet significant probability. We provide a real data example, which demonstrates that this issue arises in practice and show that RBSS offers a very good performance then. Moreover, we report results of an extensive simulation study and some of the theoretical results derived, which show that RBSS is a valid and powerful statistical procedure.
Wenqian Cheng
Title: Text mining and time series analysis on Chinese microblogs
Abstract: This presentation will discuss some text mining and time series analysis results on Chinese Microblogs (Weibo). First, It will give brief review towards social media/microblog, techniques of Microblog data acquisition, and some exploratory data analysis. The aim of using text mining is to understand general public’s perspectives towards certain keywords (e.g. speciﬁc companies). Useful information is typically derived through the devising of patterns and trends through statistical pattern learning. Text mining methods such as Clustering and Support Vector Machine are applied. In addition, to discover the abstract “topics” that occur in a collection of posts, topic modelling was applied in the simulation study. Next, time series analysis on sentiment and on the correlation between posts amount and stock price will be presented. Plans and problems for next stage will be proposed in the end.
Marco Doretti
Title: Measuring the efficacy of the UK counterweight programme via gcomputation algorithm
Abstract: One of the purposes of longitudinal studies is the evaluation of the impact of a sequence of treatments/exposures on an outcome measured at the final stage. When dealing with observational data, particular care is needed in stating dependencies among variables into play, in order to avoid a number of drawbacks that could affect the validity of performed inference. Timevarying confounding is one of the most important and arises naturally when the causality framework is adapted to a multitemporal context, as there may be variables that at each time act as confounders for the treatments/outcome relation but are also influenced by previous treatments, lying therefore on the causal paths under investigation. The gcomputation algorithm (Robins 1986, Ryan et al. 2012) is probably the most popular method to overcome this issue. In order to handle informative dropout, we propose an extension of Heckman correction to deal with several occasions. The motivating example consists of a followup study implemented within the Counterweight Programme, one of the most relevant protocols enforced to tackle the problem of obesity in the last decades in UK (Taubman et al. 2009), from which the dataset used for the application has been gathered.
Essential references:
Robins, J. (1986)  A new approach to causal inference in mortality studies with a sustained exposure period  application to control of the healthy worker survivor effect. Mathematical Modelling.
Daniel, R. M. et al. (2012)  Methods for dealing with timedependent confounding. Statistics in Medicine.
Taubman, S. L. et al. (2009)  Intervening on risk factors for coronary heart disease: an application of the parametric gformula. International Journal of Epidemiology.
Tomasz DubielTeleszynski
Title: Data augmentation: simulating diffusion bridges using Bayesian filters
Abstract: We propose a new approach to simulating diffusion bridges. We focus on bridges for nonlinear processes however our method is applicable to linear diffusion processes as well. Novelty of our data augmentation technique lies in the proposal which is based on a Bayesian filter, in particular Kalman filter or unscented Kalman filter, applied to Euler approximation of a given diffusion process. We thus follow multivariate normal regression theory applying unscented transformation whenever diffusion process is nonlinear. Bridges we study are for mean reverting processes, such as linear OrnsteinUhlenbeck process, square root process with nonlinear diffusion coefficient and inverse square root process with nonlinear drift and diffusion coefficient. We introduce a correction to approximation of drift in the Euler scheme and generalize it for a class of meanreverting processes with polynomial drift. Setting our method against other techniques found in the literature, in cases we study we find acceptance rates we obtain comparable for values of meanreversion parameter lying in the unit interval. However, unlike the other methods our method leads to incomparably higher acceptance rates for values of this parameter higher than unity. We believe this result to be of interest especially when modelling termstructure dynamics or other phenomena with inverse squareroot processes. Our next goal is to extend these results to a multidimensional setting and simulate diffusion processes conditional on their integrals, followed by applications in stochastic volatility models.
Ali Habibnia
Title: Financial forecasting with many predictors with neural network factor models
Abstract: Modelling and forecasting financial returns have been an essential question of recent studies in academia as well as in financial markets to understand market dynamics. Financial returns present special features, which makes the forecast of this variable hard. This study aims to propose a nonlinear forecasting technique based on an improved factor model with two neural network extensions. The first extension proposes an autoassociative neural network principal component analysis as an alternative for factor estimation, which allows the factors to have a nonlinear relationship to the input variables. After finding the common factors, the next step will propose a nonlinear factor augmented forecasting equation based on a single hidden layer feed forward neural network model. In this study, statistical approach has been demonstrated to show that the modelling procedure is not a black box. This proposed neural network factor model can capture both nonlinearity and nonguasianity of a highdimensional dataset. Therefore, this model can be more accurate to forecast the complex behaviour in financial data.
Charlie Hu
Title: Nonparametric eigenvalueregularized precision or covariance matrix estimator
Abstract: Recently there are numerous works on the estimation of large covariance or precision matrix. The high dimensional nature of data means that the sample covariance matrix can be illconditioned. Without assuming a particular structure, much efforts have been devoted to regularizing the eigenvalues of the sample covariance matrix. Lam (2014) proposes to regularize these eigenvalues through subsampling of the data. The method enjoys asymptotic optimal nonlinear shrinkage of eigenvalues with respect to the Frobenius error norm. Coincidentally, this nonlinear shrinkage is asymptotically the same as that introduced in Ledoit and Wolf 2012. One advantage of our estimator is its computational speed when the dimension p is not extremely large. Our estimator also allows p to be larger than the sample size n, and is always positive semidefinite.
Na Huang
Title: NOVELIST estimator for large covariance matrix
Abstract: We propose a NOVEL Integration of the Sample and Thresholded covariance estimators (NOVELIST) to estimate large covariance matrix. It is shrinkage of the sample covariance towards a general thresholding target, especially soft or hard thresholding estimators. The benefits of NOVELIST include simplicity, ease of implementation, and the fact that its application avoids eigenanalysis, which is unfamiliar to many practitioners. We obtain an explicit convergence rate in the operator norm over a large class of covariance matrices when dimension p and sample size n satisfy log p/n→0. Further we show the rate is a tradeoff between sparsity, shrinkage intensity, thresholding level, dimension and sample size under different covariance structures. The simulation results will be presented and comparison with other competing methods will also be given.
Cheng Li
Title: Limit convergence of BSDEs driven by a marked point process
Abstract: We study backward stochastic differential equations (BSDEs) driven by a random measure, or equivalently, by a marked point process. When some assumptions hold, there exists a unique supersolution with its unique decomposition to the BSDE. Thanks to Peng’s paper written in 1999, we can follow his method with proper modifications to prove limit theorem of BSDEs driven by a marked point process, i.e. if there exists a sequence of supersolutions of BSDEs increasingly converges to a supersolution Y, there also exists the convergence to Y’s unique decomposition. Moreover, we can apply this limit convergence theorem to show the existence of the smallest supersolution of a BSDE with a constraint. Finally, we apply our results to consider the insider trading problem.
Shiju Liu
Title: Excursions of Lévy processes
Abstract: We study the classical collective risk model, CramérLundberg risk model, driven by a compound Poisson process, which concerns the probability of ultimate ruin of an insurance company both in finite time horizon and infinite time horizon. Particular attention is given to GerberShiu expected discounted penalty functions, which provide a method of calculating the probability of ruin. We derive the Laplace transforms of claim sizes following an inverse Gaussian distribution and mixture of two exponential distributions and we obtain the asymptotic formulas of probability of ruin based on the two scenarios mentioned above. The infinite divisibility of Lévy processes and the LévyKhintchine representation theorem are introduced as preliminaries to study the excursions of Lévy processes as well as applications in financial mathematics.
AnnaLouise Schröder
Title: Adaptive trend estimation in financial return data  recent findings and new challenges
Abstract: Financial returns can be modelled as centred around piecewiseconstant trend functions which change at certain points in time. We can capture this in a model using a hierarchicallyordered oscillatory basis of simple piecewiseconstant functions which is uniquely defined through Binary Segmentation for changepoint detection. The resulting interpretable decomposition of nonstationarity into short and longterm components yields an adaptive movingaverage estimator of the current trend, which beats comparable forecast estimators in applications on daily return data. In my presentation I discuss some challenges and interesting questions as well as potential paths to improve the existing framework. I also show some promising results for a multivariate extension of this model.
Ewelina Sienkiewicz
Title: How long in the future can you trust the forecast?
Abstract: In this research I quantify the predictability of a chaotic system, estimate how far in the future it is predictable for and identify the two main limitations. Sensitivity to initial conditions complicates the forecasting of chaotic dynamical systems, even when the model is perfect. Structural model inadequacy is a distinct source of forecast failure, failures which are sometimes mistakenly interpreted to be due to chaos. These methods are demonstrated using a toy mathematical system (Henon Map) as an illustration. Model inadequacy is shown to be important in realworld forecasting practice using the example of climate models. The research findings based on North American Regional Climate Change Assessment Program (NARCCAP) database show significant divergence between Regional and Global Climate Models estimates of surface radiation, and consider the implications for the reliability of such models.
Tayfun Terzi
Title: Methods for the identification of semiplausible response patterns (SpRPs)
Abstract: New challenges concerning bias from measurement error have arisen due to the increasing use of paid participants: semiplausible response patterns (SpRPs). SpRPs result when participants only superficially process the information of (online) experiments or questionnaires and attempt only to respond in a plausible way. This is due to the fact that participants who are paid are generally motivated by fast cash, and try to efficiently overcome objective plausibility checks and process other items only superficially, if at all. Thus, those participants produce not only useless but detrimental data, because they attempt to conceal their malpractice from the researcher. The potential consequences are biased estimation and misleading statistical inference. The inferential objective is to derive identification statistics within latent models that detect these behavioural patterns (detection of error), by drawing knowledge from related fields of research (e.g., outlier analysis, personfit indices, fraud detection).
Youyou Zhang
Title: The joint distribution of excursion and hitting times of the Brownian motion with application to Parisian option pricing
Abstract: We study the joint law of excursion time and hitting time of a drifted Brownian motion by using a three state semiMarkov model obtained through perturbation. We obtain a martingale to which we can apply the optional sampling theorem and derive the double Laplace transform. This general result is applied to address problems in option pricing. We introduce a new option related to Parisian options being triggered when the age of an excursion exceeds a certain time or/and a barrier is hit. We obtain an explicit expression for the Laplace transform of its fair price.
2013/14 PhD Presentation Event
Tuesday 21 and Wednesday 22 May 2013
Rafal Baranowski
Title: Subset stability selection
Abstract: In this presentation, we provide a brief introduction to the concepts standing behind recently developed variable screening procedures in a linear regression model. These techniques aim to remove a great number of unimportant variables from the analysed data set, preserving all relevant ones. In practice, however, it may occur that the obtained set does not include any important variables at all! That is why there is a need for a tool, which could assess reliability and stability of a set of variables and implement these assessments in the further analysis. We introduce a new method, termed “subset stability selection”, which combines any variable screening procedure with resampling techniques, in order to find significant variables only. Our method is fully nonparametric, easily applicable in much wider context than linear regression only and it exhibits very promising finite sample performance in the simulation study provided.
Zhanyu Chen
Title: Hedging of barrier options via a general selfduality
Abstract: Classical putcall symmetry relates the price of puts and calls under a suitable dual market transform. One wellknown application is the semistatic hedging of path dependent barrier options with European options. Nevertheless, one has to relieve restrictions on modelling price processes so as to fit empirical data of stock prices. In this work, we develop a general selfduality theorem to develop hedging schemes for barrier options in stochastic volatility models with correlation.
Wenqian Cheng
Title: Data analysis and text mining on micoblogs
Abstracts: This presentation will discuss some data analysis and text mining on Microblogs, especially for Chinese Microblog (Weibo). Some brief introduction towards social media/microblog and comparison between Twitter and Weibo will be presented. It will cover several techniques of Microblog data acquisition, including downloading via Application Programming Interface (API), Web crawling tools, Web parsing applications. For initial data analysis, some works towards posting pattern recognition and correlation with share price has been conducted. Further text mining study towards Weibo includes Chinese word segmentation, word frequency counting, and sentiment analysis will be introduced. Plans and problems for next stage will be proposed in the end.
Baojun Dou
Title: Sparse factor model for multivariate time series
Abstract: In this work, we model multiple time series via common factors. Under the stationary settings, we concentrate on the case when the factor loading matrix is sparse. We proposed a method to estimate the factor loading matrix and to correctly pick up the zeros from it. Two aspects of asymptotic results are investigated when the dimension of the time series p is fixed: (1) parameter consistency: the convergent rate of the new sparse estimator and (2) sign consistency. We have obtained a necessary condition for sign consistency of the estimator. Future work will allow p goes to infinity.
Ali Habibnia
Title: Forecasting with many predictors with a neuralbased dynamic factor model
Abstract: The contribution of this study is to propose a nonlinear forecasting technique based on an improved dynamic factor model with two neural network extensions. The first extension proposes a bottlenecktype neural network principal component analysis as an alternative for factor estimation, which allows the factors to have a nonlinear relationship to the input variables. After finding the common factors, the next step will propose a nonlinear factor augmented forecasting equation based on a multilayer feed forward neural network. Neural networks as a function approximation method can capture both nonlinearity and nonnormality of the data. Therefore, this model can be more accurate to forecast nonlinear behaviour in macroeconomic and financial highdimensional time series data.
Mai Hafez (poster presentation)
Title: Multivariate longitudinal data subject to dropout and item nonresponse  a latent variable approach
Abstract: Longitudinal data are collected for studying changes across time. Studying many variables simultaneously across time (e.g. items from a questionnaire) is common when the interest is in measuring unobserved constructs such as democracy, happiness, fear of crime, social status, etc. The observed variables are used as indicators for the unobserved constructs "latent variables" of interest. Dropout is a common problem in longitudinal studies where subjects exit the study prematurely. Ignoring the dropout mechanism can lead to biased estimates, especially when the dropout is non  ignorable. Another possible type of missingness is item nonresponse where an individual chooses not to respond to a specific question. Our proposed approach uses latent variable models to capture the evolution of the latent phenomenon over time while accounting for dropout (possibly non  random), together with item nonresponse.
Qilin 'Charlie' Hu
Title: Factor modelling for high dimensional time series
Abstract: Lam et al. (2011) propose an autocorrelation based estimation method for high dimensional time series using a factor model. When factors have different strengths, a two step procedure which estimate strong factors and weak factor separately will perform better than doing the estimation in one go. It is well known that PCA method (Bai and Ng, 2002) is only valid for high dimensional data (consistency comes from dimension going to infinity). On the other hand, we derive some convergence results, which show that the autocorrelation based method can takes advantage of low dimensional estimation and estimate weaker factor better, while itself is a high dimensional data analysis procedure. This result can be applied to some macroeconomic data.
Alex Jarman (poster presentation)
Title: Forecasting the probability of tropical cyclone formation  the reliability of NHC forecasts from the 2012 hurricane season
Abstract: see poster
Cheng Li
Title: Asymptotic equilibrium in glostenmilgrom model
Abstract: Kyle (1985) studied a market with asymmetry information and obtained the equilibrium in the market. Back (1992) generalized it in continuous time. In Back’s result, the fundamental value of the risky asset can take any continuous distribution. This general result is contrast to the studies in GlostenMilgrom equilibrium where the fundamental value of the risk asset is assumed to have a Bernoulli distribution in Back and Baruch (2004). We have taken on this project to study the existence of GlostenMilgrom equilibrium, when the fundamental value of the risky asset has the discrete general distribution. We also introduce a notion of asymptotic equilibrium for GlostenMilgrom equilibrium which allows a sequence of GlostenMilgrom equilibriums to approximate KyleBack equilibrium, when the value of risky asset has general discrete distributions.
Anna Louise Schroeder
Title: How to quantify the predictability of a chaotic system
Abstract: I present a new time series model for nonstationary data that is able to cope with a very low signaltonoise ratio and timevarying volatility, both of which are typical features of financial time series. Core of our model is a set of dataadaptive basis functions and coefficients which specify location and size of jumps in the mean of a time series. The set of these change points can be determined with a uniquely identifiable hierarchical structure, allowing for unambiguous reconstruction. Thresholding the estimated wavelet coefficients adequately, our model provides practitioners with a flexible forecasting method: only those change points of higher importance (in terms of jump size) taken into account in forecasting returns.
Ewelina Sienkiewicz (poster presentation)
Title: How to quantify the predictability of a chaotic system
Abstract: Models are tools that describe reality in form of mathematical equations. For example General Circulation Models (GCM) represent actual climate system and are used to investigate major climate processes and help us better understand certain dependencies amongst climate variables. Global forecasts help foresee severe weather anywhere on the planet and save many lives, although meteorology is unreliable in long run. A model is only an approximate representation of nature, which is reflected by model error. In addition, small uncertainties in the initial conditions usually bring up errors in the final forecasts. We can handle initial condition uncertainty but not model error. This study examines how to quantify predictability of complex models with an eye towards experimental design.
Majeed Simaan
Title: Estimation risk in asset allocation theory
Abstract: Assuming that the assets returns are normally distributed with a known covariance matrix, the paper derives a joint sampling distribution for the estimated efficient portfolio weights as well as for its mean and risk return. In addition, it shows that estimation error increases with the investor’s risk tolerance and the number of assets within the portfolio, while it decreases with the sample size. While large institutional investors allocate their funds over a number of classes, in practice, these allocation decisions are made in a hierarchical manner and involve adding constraints on the process. From a pure exante perspective, such procedures are likely to result in suboptimal decision making. Nevertheless, from an expost view as my results approve, the committed estimation risk increases with the number of assets. Therefore, the loss of exante welfare in the hierarchical approach can be outweighed by lower estimation risk achieved by optimizing over a smaller number of assets.
Edward Wheatcoft (poster presentation)
Title: Will it rain tomorrow? Improving probabilistic forecasts
Abstract: Chaos is the phenomenon of small differences in the initial conditions of a process causing large differences later in time, often colloquially referred to as the “butterfly effect”. Perhaps the most wellknown example though is in meteorology where small differences in the current conditions can have large effects later on. The effect is famously summed up by the notion that “when a butterfly flutters its wings in one part of the world, it can eventually cause a hurricane in another.” Of course this is only a fictional example but let’s suppose that we know this is true but we don’t know whether the butterfly has flapped its wings or not. Do we accept that we can’t predict what’s going to happen? Or can we gain some insight? Now suppose that we know from experience that the probability of the butterfly flapping its wings is 0.05, i.e. 5 percent. With this information we might conclude that the probability of a hurricane occurring is 0.05 also. This is of course an oversimplified and unrealistic example, but it illustrates the concept of ensemble forecasting in that a degree of belief about uncertainty of the initial conditions can give us a better idea of the probability of a future event.
Yang Yan (poster presentation)
Title: Efficient estimation of risk measures in a semiparametric GARCH model
Abstract: This paper proposes efficient estimators of risk measures in a semiparametric GARCH model defined through moment constraints. Moment constraints are often used to identify and estimate the mean and variance parameters and are however discarded when estimating error quantiles. In order to prevent this efficiency loss in quantile estimation, we propose a quantile estimator based on inverting an empirical likelihood weighted distribution estimator. It is found that the new quantile estimator is uniformly more efficient than the simple empirical quantile and a quantile estimator based on normalized residuals. At the same time, the efficiency gain in error quantile estimation hinges on the efficiency of estimators of the variance parameters.
You You Zhang
Title: Last passage time processes
Abstract: The survey of last passage times play an important role in financial mathematics. Since they look into the future and are not stopping times the standard theorems in martingale theory can not be applied and therefore they are much harder to handle. Using time inversion we relate last passage times of drifted Brownian motion to first hitting times. Using this argument we derive the distribution of the increments. We extend this to general transient diffusions. Work has been done by Profeta et al. making use of Tanaka’s formula. We introduce the concept of conditioned martingales and connect it to Girsanov’s theorem. Our main focus lies in relating the Brownian meander to the BES(3) process. This transformation proofs to be useful in deriving the last passage time density of the Brownian meander.
Thursday 10 May 2012
Sarah Higgins
Mai Hafez
Alex Jarman
Na Huang
Yang Yan
Karolos Korkas
Jia Wei Lim
Friday 11 May 2012
Baojun Dou
Joseph Dureau
Yehuda Dayan
Thursday 23 June 2011
Roy Rosemarin
Edward Wheatcroft
Yehuda Dayan
Karolos Korkas
Alex Jarman
Felix Ren
Joseph Dureau
Daniel Bruynooghe
Jia Wei Lim
Yang Yan
Friday 24 June 2011
Zhanyu Chen
Ilaria Vannini
Multivariate regression chain graph models for clustered categorical data
Dan Chen
Mai Hafez
Hongbiao Zhao
Ilya Sheynzon
Monday 14 June 2010
Haeran Cho
Xiaonan Che
Sujin Park
Sarah Higgins
Alex Jarman
Felix Ren
Filippo Riccardi
Jia Wei Lim
Dan Chen
Tuesday 15 June 2010
Flavia Giammarino
Malvina Marchese
Deniz Akinc
Roy Rosemarin
Ilya Sheynzon
Wednesday 18 June 2009
Felix Ren
Takeshi Yamada
Flavia Giammarino
Deniz Akinc
Neil Bathia
Noha Youssef
A 2stage design procedure for computer experiments
Young Lee
Roy Rosemarin
Thursday 19 June 2009
Xiaonan Che
Malvina Marchese
Sujin Park
Daniel Hawellek
Hongbiao Zhao
James Abdey
Thursday 19 June 2008
Sarah Higgins
Yehuda Dayan
Daniel Hawellek
Xiaonan Che
Hai Liang Du
Edward Tredger
Takeshi Yamada
Friday 20 June 2008
Young Lee
Neil Bathia
Sandrine Toeblem
Flavia Giammarino
Monday 4 June 2007
Pauline Sculli
Hai Liang Du
Limin Wang
Noha Youssef
Monday 5 June 2007
Oksana Savina
Shanle Wu
Edward Tredger
Young Lee
Sandrine Tobelem
Tuesday 13 June 2006
James Abdey
Pauline Sculli
Sarah Higgins
Hai Liang Du
Adrian Gfeller
Young Lee
Wednesday 14 June 2006
Billy Wu
Edward Tredger
Sandrine Tobelem
Limin Wang
Friday 10 June 2005
Billy Wu
Hailiang Du
Miltiadis Mavrakakis
Dario Ciraki
