Title: | Variable Length Markov Chains with Covariates |
---|---|
Description: | Estimates Variable Length Markov Chains (VLMC) models and VLMC with covariates models from discrete sequences. Supports model selection via information criteria and simulation of new sequences from an estimated model. See Bühlmann, P. and Wyner, A. J. (1999) <doi:10.1214/aos/1018031204> for VLMC and Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022) <doi:10.1111/jtsa.12615> for VLMC with covariates. |
Authors: | Fabrice Rossi [aut, cre, cph] |
Maintainer: | Fabrice Rossi <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.2.1.9000 |
Built: | 2025-02-01 04:29:52 UTC |
Source: | https://github.com/fabrice-rossi/mixvlmc |
Estimates Variable Length Markov Chains (VLMC) models and VLMC with covariates models from discrete sequences. Supports model selection via information criteria and simulation of new sequences from an estimated model. See Bühlmann, P. and Wyner, A. J. (1999) doi:10.1214/aos/1018031204 for VLMC and Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022) doi:10.1111/jtsa.12615 for VLMC with covariates.
Mixvlmc uses the following options()
:
mixvlmc.maxit
: maximum number of iterations in model fitting for covlmc()
mixvlmc.predictive
: specifies the computing engine used for model fitting
for covlmc()
. Two values are supported:
"glm"
(default value): covlmc()
uses stats::glm()
with a binomial
link (stats::binomial()
) for a two values state space, and VGAM::vglm()
with a multinomial link (VGAM::multinomial()
) for a state space with
three or more values;
"multinom"
: covlmc()
uses nnet::multinom()
in all cases.
The first option "glm"
is recommended as both stats::glm()
and VGAM::vglm()
are able to detect and deal with degeneracy in the data set.
mixvlmc.backend
: specifies the implementation used for the context tree
construction in ctx_tree()
, vlmc()
and tune_vlmc()
. Two values are
supported:
"R"
(default value): this corresponds to the original almost pure R
implementation.
"C++"
: this corresponds to the experimental C++ implementation. This
version is significantly faster than the R version, but is still
considered experimental.
mixvlmc.charset
: specifies the collection of characters used to display
context trees in "ascii art" when using the "text"
format for draw()
and related functions. Two values are supported:
"ascii"
: the collection uses only standard ASCII characters and
should be compatible with all environments;
"utf8"
: the collection uses UTF-8 symbols and needs a compatible display.
At loading the option is set based on a call to cli::is_utf8_output()
.
It defaults to "utf8"
is this encoding is supported.
Maintainer: Fabrice Rossi [email protected] (ORCID) [copyright holder]
Other contributors:
Hugo Le Picard [email protected] (ORCID) [contributor]
Guénolé Joubioux [email protected] [contributor]
Useful links:
Report bugs at https://github.com/fabrice-rossi/mixvlmc/issues
This generic function converts an object into a covlmc.
as_covlmc(x, ...) ## S3 method for class 'tune_covlmc' as_covlmc(x, ...)
as_covlmc(x, ...) ## S3 method for class 'tune_covlmc' as_covlmc(x, ...)
x |
an object to convert into a covlmc. |
... |
additional arguments for conversion functions. |
a covlmc
## conversion from the results of tune_covlmc pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) rdts_best_model_tune <- tune_covlmc(rdts, rdts_cov) rdts_best_model <- as_covlmc(rdts_best_model_tune) draw(rdts_best_model)
## conversion from the results of tune_covlmc pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) rdts_best_model_tune <- tune_covlmc(rdts, rdts_cov) rdts_best_model <- as_covlmc(rdts_best_model_tune) draw(rdts_best_model)
This function returns the sequence represented by the node
object.
as_sequence(node, reverse)
as_sequence(node, reverse)
node |
a |
reverse |
specifies whether the sequence should be reported in reverse
temporal order ( |
the sequence represented by the node
object, a vector
rdts <- c("A", "B", "C", "A", "A", "B", "B", "C", "C", "A") rdts_tree <- ctx_tree(rdts, max_depth = 3) res <- find_sequence(rdts_tree, "A") as_sequence(res)
rdts <- c("A", "B", "C", "A", "A", "B", "B", "C", "C", "A") rdts_tree <- ctx_tree(rdts, max_depth = 3) res <- find_sequence(rdts_tree, "A") as_sequence(res)
This generic function converts an object into a vlmc.
as_vlmc(x, ...) ## S3 method for class 'ctx_tree' as_vlmc(x, alpha, cutoff, ...) ## S3 method for class 'tune_vlmc' as_vlmc(x, ...)
as_vlmc(x, ...) ## S3 method for class 'ctx_tree' as_vlmc(x, alpha, cutoff, ...) ## S3 method for class 'tune_vlmc' as_vlmc(x, ...)
x |
an object to convert into a vlmc. |
... |
additional arguments for conversion functions. |
alpha |
cut off parameter applied during the conversion, quantile scale (if specified) |
cutoff |
cut off parameter applied during the conversion, native scale (if specified) |
This function converts a context tree into a VLMC. If alpha
or
cutoff
is specified, it is used to reduce the complexity of the tree as in
a direct call to vlmc()
(prune()
).
a vlmc
## conversion from a context tree rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3) draw(rdts_ctree) rdts_vlmc <- as_vlmc(rdts_ctree) class(rdts_vlmc) draw(rdts_vlmc) ## conversion from the result of tune_vlmc rdts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE) tune_result <- tune_vlmc(rdts) tune_result rdts_best_vlmc <- as_vlmc(tune_result) draw(rdts_best_vlmc)
## conversion from a context tree rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3) draw(rdts_ctree) rdts_vlmc <- as_vlmc(rdts_ctree) class(rdts_vlmc) draw(rdts_vlmc) ## conversion from the result of tune_vlmc rdts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE) tune_result <- tune_vlmc(rdts) tune_result rdts_best_vlmc <- as_vlmc(tune_result) draw(rdts_best_vlmc)
This generic function converts an object into a vlmc.
## S3 method for class 'ctx_tree_cpp' as_vlmc(x, alpha, cutoff, ...)
## S3 method for class 'ctx_tree_cpp' as_vlmc(x, alpha, cutoff, ...)
x |
an object to convert into a vlmc. |
alpha |
cut off parameter applied during the conversion, quantile scale (if specified) |
cutoff |
cut off parameter applied during the conversion, native scale (if specified) |
... |
additional arguments for conversion functions. |
This function converts a context tree into a VLMC. If alpha
or
cutoff
is specified, it is used to reduce the complexity of the tree as in
a direct call to vlmc()
(prune()
).
a vlmc
## conversion from a context tree rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3, backend = "C++") draw(rdts_ctree) rdts_vlmc <- as_vlmc(rdts_ctree) class(rdts_vlmc) draw(rdts_vlmc)
## conversion from a context tree rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3, backend = "C++") draw(rdts_ctree) rdts_vlmc <- as_vlmc(rdts_ctree) class(rdts_vlmc) draw(rdts_vlmc)
This function prepares a plot of the results of tune_covlmc()
using
ggplot2. The result can be passed to print()
to display the result.
## S3 method for class 'tune_covlmc' autoplot(object, ...)
## S3 method for class 'tune_covlmc' autoplot(object, ...)
object |
a |
... |
additional parameters (not used currently) |
The graphical representation proposed by this function is complete, while the
one produced by plot.tune_covlmc()
is minimalistic. We use here the
faceting capabilities of ggplot2 to combine on a single graphical
representation the evolution of multiple characteristics of the VLMC during
the pruning process, while plot.tune_covlmc()
shows only the selection
criterion or the log likelihood. Each facet of the resulting plot shows a
quantity as a function of the cut off expressed in quantile or native scale.
a ggplot object
pc <- powerconsumption[powerconsumption$week %in% 10:12, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) rdts_best_model_tune <- tune_covlmc(rdts, rdts_cov, criterion = "AIC") covlmc_plot <- ggplot2::autoplot(rdts_best_model_tune) print(covlmc_plot)
pc <- powerconsumption[powerconsumption$week %in% 10:12, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) rdts_best_model_tune <- tune_covlmc(rdts, rdts_cov, criterion = "AIC") covlmc_plot <- ggplot2::autoplot(rdts_best_model_tune) print(covlmc_plot)
This function prepares a plot of the results of tune_vlmc()
using ggplot2.
The result can be passed to print()
to display the result.
## S3 method for class 'tune_vlmc' autoplot(object, cutoff = c("quantile", "native"), ...)
## S3 method for class 'tune_vlmc' autoplot(object, cutoff = c("quantile", "native"), ...)
object |
a |
cutoff |
the scale used for the cut off criterion (default "quantile") |
... |
additional parameters (not used currently) |
The graphical representation proposed by this function is complete, while the
one produced by plot.tune_vlmc()
is minimalistic. We use here the faceting
capabilities of ggplot2 to combine on a single graphical representation the
evolution of multiple characteristics of the VLMC during the pruning process,
while plot.tune_vlmc()
shows only the selection criterion or the log
likelihood. Each facet of the resulting plot shows a quantity as a function
of the cut off expressed in quantile or native scale.
a ggplot object
pc <- powerconsumption[powerconsumption$week %in% 10:11, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_best_model_tune <- tune_vlmc(rdts, criterion = "BIC") vlmc_plot <- ggplot2::autoplot(rdts_best_model_tune) print(vlmc_plot) ## simple post customisation print(vlmc_plot + ggplot2::geom_point())
pc <- powerconsumption[powerconsumption$week %in% 10:11, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_best_model_tune <- tune_vlmc(rdts, criterion = "BIC") vlmc_plot <- ggplot2::autoplot(rdts_best_model_tune) print(vlmc_plot) ## simple post customisation print(vlmc_plot + ggplot2::geom_point())
This function returns a list of ASCII characters used to fine tune the draw()
function behaviour when it is used with format="text"
. It can be used
as is or customised using its parameters.
charset_ascii( root = "*", first_node = "+", next_node = "'", final_node = "'", vbranch = "|", hbranch = "--", open_ct = "(", close_ct = ")", level_sep = " ~ ", time_sep = " | ", intercept = "(I)", intercept_sep = " & ", open_p_value = "<", close_p_value = ">", open_model = "[", close_model = "]" )
charset_ascii( root = "*", first_node = "+", next_node = "'", final_node = "'", vbranch = "|", hbranch = "--", open_ct = "(", close_ct = ")", level_sep = " ~ ", time_sep = " | ", intercept = "(I)", intercept_sep = " & ", open_p_value = "<", close_p_value = ">", open_model = "[", close_model = "]" )
root |
character used for the root node. |
first_node |
characters used for the first child of a node. |
next_node |
characters used for intermediate children of a node. |
final_node |
characters used for the last child of a node. |
vbranch |
characters used to represent a branch in a vertical way. |
hbranch |
characters used to represent a branch in a horizontal was. |
open_ct |
characters used to start each node specific text representation. |
close_ct |
characters used to end each node specific text representation. |
level_sep |
characters used to separate levels from models in
|
time_sep |
characters used to separate temporal blocks in
|
intercept |
characters used to represent the intercept in
|
intercept_sep |
characters used to the intercept from the other
parameters in |
open_p_value |
characters used as opening delimiters for the p value of
a node in |
close_p_value |
characters used as closing delimiters for the p value of
a node in |
open_model |
characters used as opening delimiters for the
representation of a model in |
close_model |
characters used as closing delimiters for the
representation of a model in |
a list
charset_ascii(root = "x")
charset_ascii(root = "x")
This function returns a list of UTF-8 characters and symbols used to fine
tune the draw()
function behaviour when it is used with format="text"
. It
can be used as is or customised using its parameters.
charset_utf8( root = "▪", first_node = "├", next_node = "├", final_node = "└", vbranch = "│", hbranch = "─", open_ct = "(", close_ct = ")", level_sep = " ~ ", time_sep = " ⁞ ", intercept = "(I)", intercept_sep = " • ", open_p_value = "‹", close_p_value = "›", open_model = "[", close_model = "]" )
charset_utf8( root = "▪", first_node = "├", next_node = "├", final_node = "└", vbranch = "│", hbranch = "─", open_ct = "(", close_ct = ")", level_sep = " ~ ", time_sep = " ⁞ ", intercept = "(I)", intercept_sep = " • ", open_p_value = "‹", close_p_value = "›", open_model = "[", close_model = "]" )
root |
character used for the root node. |
first_node |
characters used for the first child of a node. |
next_node |
characters used for intermediate children of a node. |
final_node |
characters used for the last child of a node. |
vbranch |
characters used to represent a branch in a vertical way. |
hbranch |
characters used to represent a branch in a horizontal was. |
open_ct |
characters used to start each node specific text representation. |
close_ct |
characters used to end each node specific text representation. |
level_sep |
characters used to separate levels from models in
|
time_sep |
characters used to separate temporal blocks in
|
intercept |
characters used to represent the intercept in
|
intercept_sep |
characters used to the intercept from the other
parameters in |
open_p_value |
characters used as opening delimiters for the p value of
a node in |
close_p_value |
characters used as closing delimiters for the p value of
a node in |
open_model |
characters used as opening delimiters for the
representation of a model in |
close_model |
characters used as closing delimiters for the
representation of a model in |
a list
charset_utf8(root = "\u27E1")
charset_utf8(root = "\u27E1")
This function returns a list (possibly empty) of ctx_node
objects. Each
object represents one of the children of the node represented by the node
parameter.
children(node) ## S3 method for class 'ctx_node' children(node) ## S3 method for class 'ctx_node_cpp' children(node)
children(node) ## S3 method for class 'ctx_node' children(node) ## S3 method for class 'ctx_node_cpp' children(node)
node |
a |
Each node of a context tree represents a sequence. When find_sequence()
is
called with success, the returned object represents the corresponding node in
the context tree. If this node has no child, the present function returns an
empty list. When the node has at least one child, the function returns a list
with one value for each element in the state space (see states()
). The
value is NULL
if the corresponding child is empty, while it is a ctx_node
object when the child is present. Each ctx_node
object is associated to the
sequence obtained by adding to the past of the sequence represented by node
an observation of the associated state (this corresponds to an extension to
the left of the sequence in temporal order).
a list of ctx_node
objects, see details.
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3) ctx_00 <- find_sequence(rdts_ctree, c(0, 0)) ## this context can only be extended in the past by 1: children(ctx_00) ctx_10 <- find_sequence(rdts_ctree, c(1, 0)) ## this context can be extended by both states children(ctx_10) ## C++ backend rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3, backend = "C++") ctx_00 <- find_sequence(rdts_ctree, c(0, 0)) ## this context can only be extended in the past by 1: children(ctx_00) ctx_10 <- find_sequence(rdts_ctree, c(1, 0)) ## this context can be extended by both states children(ctx_10)
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3) ctx_00 <- find_sequence(rdts_ctree, c(0, 0)) ## this context can only be extended in the past by 1: children(ctx_00) ctx_10 <- find_sequence(rdts_ctree, c(1, 0)) ## this context can be extended by both states children(ctx_10) ## C++ backend rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3, backend = "C++") ctx_00 <- find_sequence(rdts_ctree, c(0, 0)) ## this context can only be extended in the past by 1: children(ctx_00) ctx_10 <- find_sequence(rdts_ctree, c(1, 0)) ## this context can be extended by both states children(ctx_10)
This function returns the number of distinct contexts in a context tree.
context_number(ct)
context_number(ct)
ct |
a context tree. |
the number of contexts of the tree.
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3) # should be 8 context_number(rdts_ctree)
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3) # should be 8 context_number(rdts_ctree)
This function returns the total number of contexts of a VLMC with covariates.
## S3 method for class 'covlmc' context_number(ct)
## S3 method for class 'covlmc' context_number(ct)
ct |
a fitted covlmc model. |
the number of contexts present in the VLMC with covariates.
pc <- powerconsumption[powerconsumption$week == 5, ] dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(dts, dts_cov, min_size = 10) # should be 3 context_number(m_cov)
pc <- powerconsumption[powerconsumption$week == 5, ] dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(dts, dts_cov, min_size = 10) # should be 3 context_number(m_cov)
This function extracts from a context tree a description of all of its contexts.
contexts(ct, sequence = FALSE, reverse = FALSE, ...)
contexts(ct, sequence = FALSE, reverse = FALSE, ...)
ct |
a context tree. |
sequence |
if |
reverse |
logical (defaults to |
... |
additional arguments for the contexts function. |
The default behaviour consists in returning a list of all the contexts
contained in the tree using ctx_node
objects (as returned by e.g.
find_sequence()
) (with type="list"
). The properties of the contexts can
then be explored using adapted functions such as counts()
and
positions()
. The result list is of class contexts
. When sequence=TRUE
,
the method returns a data.frame whose first column, named context
, contains
the contexts as vectors (i.e. the value returned by as_sequence()
applied
to a ctx_node
object). Other columns contain context specific values which
depend on the actual class of the tree and on additional parameters. In all
implementations of contexts()
, setting the additional parameters to any no
default value leads to a data.frame
result.
A list of class contexts
containing the contexts represented in
this tree (as ctx_node
) or a data.frame.
Notice that contexts are given by default
in the temporal order and not in the "reverse" order used by many VLMC
research papers: older values are on the left. For instance, the context
c(1, 0)
is reported if the sequence 0, then 1 appeared in the time series
used to build the context tree. Set reverse to TRUE
for the reverse
convention which is somewhat easier to relate to the way the context trees
are represented by draw()
(i.e. recent values at the top the tree).
find_sequence()
and find_sequence.covlmc()
for direct access to
a specific context, and contexts.ctx_tree()
, contexts.vlmc()
and
contexts.covlmc()
for concrete implementations of contexts()
.
rdts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE) rdts_tree <- ctx_tree(rdts, max_depth = 3, min_size = 5) contexts(rdts_tree) contexts(rdts_tree, TRUE, TRUE)
rdts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE) rdts_tree <- ctx_tree(rdts, max_depth = 3, min_size = 5) contexts(rdts_tree) contexts(rdts_tree, TRUE, TRUE)
This function returns the different contexts present in a VLMC with covariates, possibly with some associated data.
## S3 method for class 'covlmc' contexts( ct, sequence = FALSE, reverse = FALSE, frequency = NULL, positions = FALSE, local = FALSE, metrics = FALSE, model = NULL, hsize = FALSE, merging = FALSE, ... )
## S3 method for class 'covlmc' contexts( ct, sequence = FALSE, reverse = FALSE, frequency = NULL, positions = FALSE, local = FALSE, metrics = FALSE, model = NULL, hsize = FALSE, merging = FALSE, ... )
ct |
a fitted covlmc model. |
sequence |
if |
reverse |
logical (defaults to |
frequency |
specifies the counts to be included in the result
data.frame. The default value of |
positions |
logical (defaults to FALSE). Specify whether the positions
of each context in the time series used to build the context tree should be
reported in a |
local |
specifies how the counts reported by |
metrics |
if TRUE, adds predictive metrics for each context (see
|
model |
specifies whether to include the model associated to a each
context. The default result with |
hsize |
if TRUE, adds a |
merging |
if TRUE, adds a |
... |
additional arguments for the contexts function. |
The default behaviour of the function is to return a list of all the
contexts using ctx_node_covlmc
objects (as returned by
find_sequence.covlmc()
). The properties of the contexts can then be
explored using adapted functions such as counts()
, covariate_memory()
,
cutoff.ctx_node()
, metrics.ctx_node()
, model()
, merged_with()
and
positions()
.
When sequence=TRUE
the method returns a data.frame whose first column,
named context
, contains the contexts as vectors (i.e. the value returned
by as_sequence()
applied to a ctx_node
object). Other columns contain
context specific values specified by the additional parameters. Setting any
of those parameters to a value that ask for reporting information will
toggle the result type of the function to data.frame
.
See contexts.ctx_tree()
for details about the frequency
parameter. When
model
is non NULL
, the resulting data.frame
contains the models
associated to each context (either the full R model or its coefficients).
Other columns are added is the corresponding parameters are set to TRUE
.
A list of class contexts
containing the contexts represented in
this tree (as ctx_node_covlmc
) or a data.frame.
A position of a context ctx
in the time series x
is
an index value t
such that the context ends with x[t]
. Thus x[t+1]
is
after the context. For instance if x=c(0, 0, 1, 1)
and ctx=c(0, 1)
(in
standard state order), then the position of ctx
in x
is 3.
Notice that contexts are given by default
in the temporal order and not in the "reverse" order used by many VLMC
research papers: older values are on the left. For instance, the context
c(1, 0)
is reported if the sequence 0, then 1 appeared in the time series
used to build the context tree. Set reverse to TRUE
for the reverse
convention which is somewhat easier to relate to the way the context trees
are represented by draw()
(i.e. recent values at the top the tree).
find_sequence()
and find_sequence.covlmc()
for direct access to
a specific context, and contexts.ctx_tree()
, contexts.vlmc()
and
contexts.covlmc()
for concrete implementations of contexts()
.
pc <- powerconsumption[powerconsumption$week == 5, ] breaks <- c(0, median(pc$active_power), max(pc$active_power)) dts <- cut(pc$active_power, breaks = breaks) dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(dts, dts_cov, min_size = 5) ## direct representation with ctx_node_covlmc objects m_cov_ctxs <- contexts(m_cov) m_cov_ctxs sapply(m_cov_ctxs, covariate_memory) sapply(m_cov_ctxs, is_merged) sapply(m_cov_ctxs, model) ## data.frame interface contexts(m_cov, model = "coef") contexts(m_cov, model = "full", hsize = TRUE)
pc <- powerconsumption[powerconsumption$week == 5, ] breaks <- c(0, median(pc$active_power), max(pc$active_power)) dts <- cut(pc$active_power, breaks = breaks) dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(dts, dts_cov, min_size = 5) ## direct representation with ctx_node_covlmc objects m_cov_ctxs <- contexts(m_cov) m_cov_ctxs sapply(m_cov_ctxs, covariate_memory) sapply(m_cov_ctxs, is_merged) sapply(m_cov_ctxs, model) ## data.frame interface contexts(m_cov, model = "coef") contexts(m_cov, model = "full", hsize = TRUE)
This function extracts from a context tree a description of all of its contexts.
## S3 method for class 'ctx_tree' contexts( ct, sequence = FALSE, reverse = FALSE, frequency = NULL, positions = FALSE, ... ) ## S3 method for class 'ctx_tree_cpp' contexts( ct, sequence = FALSE, reverse = FALSE, frequency = NULL, positions = FALSE, ... )
## S3 method for class 'ctx_tree' contexts( ct, sequence = FALSE, reverse = FALSE, frequency = NULL, positions = FALSE, ... ) ## S3 method for class 'ctx_tree_cpp' contexts( ct, sequence = FALSE, reverse = FALSE, frequency = NULL, positions = FALSE, ... )
ct |
a context tree. |
sequence |
if |
reverse |
logical (defaults to |
frequency |
specifies the counts to be included in the result
data.frame. The default value of |
positions |
logical (defaults to FALSE). Specify whether the positions
of each context in the time series used to build the context tree should be
reported in a |
... |
additional arguments for the contexts function. |
The default behaviour of the function is to return a list of all the
contexts using ctx_node
objects (as returned by find_sequence()
). The
properties of the contexts can then be explored using adapted functions
such as counts()
and positions()
.
When sequence=TRUE
the method returns a data.frame whose first column,
named context
, contains the contexts as vectors (i.e. the value returned
by as_sequence()
applied to a ctx_node
object). Other columns contain
context specific values specified by the additional parameters. Setting any
of those parameters to a value that ask for reporting information will
toggle the result type of the function to data.frame
.
If frequency="total"
, an additional column named freq
gives the number
of occurrences of each context in the series used to build the tree. If
frequency="detailed"
, one additional column is added per state in the
context space. Each column records the number of times a given context is
followed by the corresponding value in the original series.
A list of class contexts
containing the contexts represented in
this tree (as ctx_node
) or a data.frame.
A position of a context ctx
in the time series x
is
an index value t
such that the context ends with x[t]
. Thus x[t+1]
is
after the context. For instance if x=c(0, 0, 1, 1)
and ctx=c(0, 1)
(in
standard state order), then the position of ctx
in x
is 3.
Notice that contexts are given by default
in the temporal order and not in the "reverse" order used by many VLMC
research papers: older values are on the left. For instance, the context
c(1, 0)
is reported if the sequence 0, then 1 appeared in the time series
used to build the context tree. Set reverse to TRUE
for the reverse
convention which is somewhat easier to relate to the way the context trees
are represented by draw()
(i.e. recent values at the top the tree).
find_sequence()
and find_sequence.covlmc()
for direct access to
a specific context, and contexts.ctx_tree()
, contexts.vlmc()
and
contexts.covlmc()
for concrete implementations of contexts()
.
rdts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE) rdts_tree <- ctx_tree(rdts, max_depth = 3, min_size = 5) ## direct representation with ctx_node objects contexts(rdts_tree) ## data.frame format contexts(rdts_tree, sequence = TRUE) contexts(rdts_tree, frequency = "total") contexts(rdts_tree, frequency = "detailed")
rdts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE) rdts_tree <- ctx_tree(rdts, max_depth = 3, min_size = 5) ## direct representation with ctx_node objects contexts(rdts_tree) ## data.frame format contexts(rdts_tree, sequence = TRUE) contexts(rdts_tree, frequency = "total") contexts(rdts_tree, frequency = "detailed")
This function extracts all the contexts from a fitted VLMC, possibly with some associated data.
## S3 method for class 'vlmc' contexts( ct, sequence = FALSE, reverse = FALSE, frequency = NULL, positions = FALSE, local = FALSE, cutoff = NULL, metrics = FALSE, ... ) ## S3 method for class 'vlmc_cpp' contexts( ct, sequence = FALSE, reverse = FALSE, frequency = NULL, positions = FALSE, local = FALSE, cutoff = NULL, metrics = FALSE, ... )
## S3 method for class 'vlmc' contexts( ct, sequence = FALSE, reverse = FALSE, frequency = NULL, positions = FALSE, local = FALSE, cutoff = NULL, metrics = FALSE, ... ) ## S3 method for class 'vlmc_cpp' contexts( ct, sequence = FALSE, reverse = FALSE, frequency = NULL, positions = FALSE, local = FALSE, cutoff = NULL, metrics = FALSE, ... )
ct |
a context tree. |
sequence |
if |
reverse |
logical (defaults to |
frequency |
specifies the counts to be included in the result
data.frame. The default value of |
positions |
logical (defaults to FALSE). Specify whether the positions
of each context in the time series used to build the context tree should be
reported in a |
local |
specifies how the counts reported by |
cutoff |
specifies whether to include the cut off value associated to
each context (see |
metrics |
if TRUE, adds predictive metrics for each context (see
|
... |
additional arguments for the contexts function. |
The default behaviour of the function is to return a list of all the
contexts using ctx_node
objects (as returned by find_sequence()
). The
properties of the contexts can then be explored using adapted functions
such as counts()
, cutoff.ctx_node()
, metrics.ctx_node()
and
positions()
.
When sequence=TRUE
the method returns a data.frame whose first column,
named context
, contains the contexts as vectors (i.e. the value returned
by as_sequence()
applied to a ctx_node
object). Other columns contain
context specific values specified by the additional parameters. Setting any
of those parameters to a value that ask for reporting information will
toggle the result type of the function to data.frame
.
The frequency
parameter is described in details in the documentation of
contexts.ctx_tree()
. When cutoff
is non NULL
, the resulting
data.frame
contains a cutoff
column with the cut off values, either in
quantile or in native scale. See cutoff.vlmc()
and prune.vlmc()
for the
definitions of cut off values and of the two scales.
A list of class contexts
containing the contexts represented in
this tree (as ctx_node
) or a data.frame.
The cut off values reported by contexts.vlmc
can
be different from the ones reported by cutoff.vlmc()
for three reasons:
cutoff.vlmc()
reports only useful cut off values, i.e., cut off values
that should induce a simplification of the VLMC when used in prune()
.
This exclude cut off values associated to simple contexts that are smaller
than the ones of their descendants in the context tree. Those values are
reported by context.vlmc
.
context.vlmc
reports only cut off values of actual contexts, while
cutoff.vlmc()
reports cut off values for all nodes of the context tree.
values are not modified to induce pruning, contrarily to the default
behaviour of cutoff.vlmc()
A position of a context ctx
in the time series x
is
an index value t
such that the context ends with x[t]
. Thus x[t+1]
is
after the context. For instance if x=c(0, 0, 1, 1)
and ctx=c(0, 1)
(in
standard state order), then the position of ctx
in x
is 3.
Notice that contexts are given by default
in the temporal order and not in the "reverse" order used by many VLMC
research papers: older values are on the left. For instance, the context
c(1, 0)
is reported if the sequence 0, then 1 appeared in the time series
used to build the context tree. Set reverse to TRUE
for the reverse
convention which is somewhat easier to relate to the way the context trees
are represented by draw()
(i.e. recent values at the top the tree).
find_sequence()
and find_sequence.covlmc()
for direct access to
a specific context, and contexts.ctx_tree()
, contexts.vlmc()
and
contexts.covlmc()
for concrete implementations of contexts()
.
rdts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE) model <- vlmc(rdts, alpha = 0.5) ## direct representation with ctx_node objects model_ctxs <- contexts(model) model_ctxs sapply(model_ctxs, cutoff, scale = "quantile") sapply(model_ctxs, cutoff, scale = "native") sapply(model_ctxs, function(x) metrics(x)$accuracy) ## data.frame format contexts(model, frequency = "total") contexts(model, cutoff = "quantile") contexts(model, cutoff = "native", metrics = TRUE)
rdts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE) model <- vlmc(rdts, alpha = 0.5) ## direct representation with ctx_node objects model_ctxs <- contexts(model) model_ctxs sapply(model_ctxs, cutoff, scale = "quantile") sapply(model_ctxs, cutoff, scale = "native") sapply(model_ctxs, function(x) metrics(x)$accuracy) ## data.frame format contexts(model, frequency = "total") contexts(model, cutoff = "quantile") contexts(model, cutoff = "native", metrics = TRUE)
This function reports the number of occurrences of the sequence represented
by node
in the original time series used to build the associated context
tree (not including a possible final occurrence not followed by any value at
the end of the original time series). In addition if frequency=="detailed"
,
the function reports the frequencies of each of the possible value of the
time series when they appear just after the sequence.
counts(node, frequency = c("detailed", "total"), local = FALSE) ## S3 method for class 'ctx_node' counts(node, frequency = c("detailed", "total"), local = FALSE) ## S3 method for class 'ctx_node_cpp' counts(node, frequency = c("detailed", "total"), local = FALSE)
counts(node, frequency = c("detailed", "total"), local = FALSE) ## S3 method for class 'ctx_node' counts(node, frequency = c("detailed", "total"), local = FALSE) ## S3 method for class 'ctx_node_cpp' counts(node, frequency = c("detailed", "total"), local = FALSE)
node |
a |
frequency |
specifies the counts to be included in the result. |
local |
specifies how the counts are computed. When |
either an integer when frequency="total"
which gives the total
number of occurrences of the sequence represented by node
or a
data.frame
with a total
column with the same value and a column for
each of the possible value of the original time series, reporting counts in
each column (see the description above).
contexts()
and contexts.ctx_tree()
rdts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE) rdts_tree <- ctx_tree(rdts, max_depth = 3, min_size = 5) subseq <- find_sequence(rdts_tree, factor(c("A", "A"), levels = c("A", "B", "C"))) if (!is.null(subseq)) { counts(subseq) }
rdts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE) rdts_tree <- ctx_tree(rdts, max_depth = 3, min_size = 5) subseq <- find_sequence(rdts_tree, factor(c("A", "A"), levels = c("A", "B", "C"))) if (!is.null(subseq)) { counts(subseq) }
This function return the longest covariate memory used by a VLMC with covariates.
covariate_depth(model)
covariate_depth(model)
model |
a covlmc object |
the longest covariate memory of this model
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) m_nocovariate <- vlmc(rdts) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 10) covariate_depth(m_cov)
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) m_nocovariate <- vlmc(rdts) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 10) covariate_depth(m_cov)
This function returns the length of the memory of a COVLMC context represented
by a ctx_node_covlmc
object.
covariate_memory(node)
covariate_memory(node)
node |
A |
the memory length, an integer
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 10) ctxs <- contexts(m_cov) ## get all the memory lengths sapply(ctxs, covariate_memory)
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 10) ctxs <- contexts(m_cov) ## get all the memory lengths sapply(ctxs, covariate_memory)
This function fits a Variable Length Markov Chain with covariates (coVLMC) to a discrete time series coupled with a time series of covariates.
covlmc( x, covariate, alpha = 0.05, min_size = 5L, max_depth = 100L, keep_data = TRUE, control = covlmc_control(...), ... )
covlmc( x, covariate, alpha = 0.05, min_size = 5L, max_depth = 100L, keep_data = TRUE, control = covlmc_control(...), ... )
x |
an object that can be interpreted as a discrete time series, such
as an integer vector or a |
covariate |
a data frame of covariates. |
alpha |
number in (0,1) (default: 0.05) cut off value in the pruning phase (in quantile scale). |
min_size |
number >= 1 (default: 5). Tune the minimum number of observations for a context in the growing phase of the context tree (see below for details). |
max_depth |
integer >= 1 (default: 100). Longest context considered in growing phase of the context tree. |
keep_data |
logical (defaults to |
control |
a list with control parameters, see |
... |
arguments passed to |
The model is built using the algorithm described in Zanin Zambom et al. As
for the vlmc()
approach, the algorithm builds first a context tree (see
ctx_tree()
). The min_size
parameter is used to compute the actual number
of observations per context in the growing phase of the tree. It is computed
as min_size*(1+ncol(covariate)*d)*(s-1)
where d
is the length of the
context (a.k.a. the depth in the tree) and s
is the number of states. This
corresponds to ensuring min_size observations per parameter of the logistic
regression during the estimation phase.
Then logistic models are adjusted in the leaves at the tree: the goal of each logistic model is to estimate the conditional distribution of the next state of the times series given the context (the recent past of the time series) and delayed versions of the covariates. A pruning strategy is used to simplified the models (mainly to reduce the time window associated to the covariates) and the tree itself.
Parameters specified by control
are used to fine tune the behaviour of the
algorithm.
a fitted covlmc model.
By default, covlmc
uses two different computing engines for logistic
models:
when the time series has only two states, covlmc
uses stats::glm()
with a binomial link (stats::binomial()
);
when the time series has at least three
states, covlmc
use VGAM::vglm()
with a multinomial link
(VGAM::multinomial()
).
Both engines are able to detect degenerate cases and lead to more robust
results that using nnet::multinom()
. It is nevertheless possible to
replace stats::glm()
and VGAM::vglm()
with nnet::multinom()
by setting
the global option mixvlmc.predictive
to "multinom"
(the default value is
"glm"
). Notice that while results should be comparable, there is no
guarantee that they will be identical.
Bühlmann, P. and Wyner, A. J. (1999), "Variable length Markov chains." Ann. Statist. 27 (2) 480-513 doi:10.1214/aos/1018031204
Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022), "Variable length Markov chain with exogenous covariates." J. Time Ser. Anal., 43 (2) 312-328 doi:10.1111/jtsa.12615
cutoff.covlmc()
and prune.covlmc()
for post-pruning.
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(1 / 3, 2 / 3, 1) ))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 15) draw(m_cov) withr::with_options( list(mixvlmc.predictive = "multinom"), m_cov_nnet <- covlmc(rdts, rdts_cov, min_size = 15) ) draw(m_cov_nnet)
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(1 / 3, 2 / 3, 1) ))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 15) draw(m_cov) withr::with_options( list(mixvlmc.predictive = "multinom"), m_cov_nnet <- covlmc(rdts, rdts_cov, min_size = 15) ) draw(m_cov_nnet)
This function creates a list with parameters used to fine tune the coVLMC fitting algorithm.
covlmc_control(pseudo_obs = 1)
covlmc_control(pseudo_obs = 1)
pseudo_obs |
number of fake observations of each state to add to the observed ones. |
pseudo_obs
is used to regularize the probability estimations when a
context is only observed followed by always the same state. Transition
probabilities are computed after adding pseudo_obs
pseudo observations
of each of the states (including the observed one). This corresponds to a
Bayesian posterior mean estimation with a Dirichlet prior.
a list.
rdts <- rep(c(0, 1), 100) rdts_cov <- data.frame(y = rep(0, length(rdts))) default_model <- covlmc(rdts, rdts_cov) contexts(default_model, type = "data.frame", model = "coef")$coef control <- covlmc_control(pseudo_obs = 10) model <- covlmc(rdts, rdts_cov, control = control) contexts(model, type = "data.frame", model = "coef")$coef
rdts <- rep(c(0, 1), 100) rdts_cov <- data.frame(y = rep(0, length(rdts))) default_model <- covlmc(rdts, rdts_cov) contexts(default_model, type = "data.frame", model = "coef")$coef control <- covlmc_control(pseudo_obs = 10) model <- covlmc(rdts, rdts_cov, control = control) contexts(model, type = "data.frame", model = "coef")$coef
This function fits a Variable Length Markov Chain with covariates (coVLMC) to a discrete time series coupled with a time series of covariates.
## Default S3 method: covlmc( x, covariate, alpha = 0.05, min_size = 5L, max_depth = 100L, keep_data = TRUE, control = covlmc_control(...), ... )
## Default S3 method: covlmc( x, covariate, alpha = 0.05, min_size = 5L, max_depth = 100L, keep_data = TRUE, control = covlmc_control(...), ... )
x |
a numeric, character, factor or logical vector |
covariate |
a data frame of covariates. |
alpha |
number in (0,1) (default: 0.05) cut off value in the pruning phase (in quantile scale). |
min_size |
number >= 1 (default: 5). Tune the minimum number of observations for a context in the growing phase of the context tree (see below for details). |
max_depth |
integer >= 1 (default: 100). Longest context considered in growing phase of the context tree. |
keep_data |
logical (defaults to |
control |
a list with control parameters, see |
... |
arguments passed to |
The model is built using the algorithm described in Zanin Zambom et al. As
for the vlmc()
approach, the algorithm builds first a context tree (see
ctx_tree()
). The min_size
parameter is used to compute the actual number
of observations per context in the growing phase of the tree. It is computed
as min_size*(1+ncol(covariate)*d)*(s-1)
where d
is the length of the
context (a.k.a. the depth in the tree) and s
is the number of states. This
corresponds to ensuring min_size observations per parameter of the logistic
regression during the estimation phase.
Then logistic models are adjusted in the leaves at the tree: the goal of each logistic model is to estimate the conditional distribution of the next state of the times series given the context (the recent past of the time series) and delayed versions of the covariates. A pruning strategy is used to simplified the models (mainly to reduce the time window associated to the covariates) and the tree itself.
Parameters specified by control
are used to fine tune the behaviour of the
algorithm.
a fitted covlmc model.
By default, covlmc
uses two different computing engines for logistic
models:
when the time series has only two states, covlmc
uses stats::glm()
with a binomial link (stats::binomial()
);
when the time series has at least three
states, covlmc
use VGAM::vglm()
with a multinomial link
(VGAM::multinomial()
).
Both engines are able to detect degenerate cases and lead to more robust
results that using nnet::multinom()
. It is nevertheless possible to
replace stats::glm()
and VGAM::vglm()
with nnet::multinom()
by setting
the global option mixvlmc.predictive
to "multinom"
(the default value is
"glm"
). Notice that while results should be comparable, there is no
guarantee that they will be identical.
Bühlmann, P. and Wyner, A. J. (1999), "Variable length Markov chains." Ann. Statist. 27 (2) 480-513 doi:10.1214/aos/1018031204
Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022), "Variable length Markov chain with exogenous covariates." J. Time Ser. Anal., 43 (2) 312-328 doi:10.1111/jtsa.12615
cutoff.covlmc()
and prune.covlmc()
for post-pruning.
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(1 / 3, 2 / 3, 1) ))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 15) draw(m_cov) withr::with_options( list(mixvlmc.predictive = "multinom"), m_cov_nnet <- covlmc(rdts, rdts_cov, min_size = 15) ) draw(m_cov_nnet)
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(1 / 3, 2 / 3, 1) ))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 15) draw(m_cov) withr::with_options( list(mixvlmc.predictive = "multinom"), m_cov_nnet <- covlmc(rdts, rdts_cov, min_size = 15) ) draw(m_cov_nnet)
This function fits a Variable Length Markov Chain with covariates (coVLMC) to a discrete time series coupled with a time series of covariates.
## S3 method for class 'dts' covlmc( x, covariate, alpha = 0.05, min_size = 5L, max_depth = 100L, keep_data = TRUE, control = covlmc_control(...), ... )
## S3 method for class 'dts' covlmc( x, covariate, alpha = 0.05, min_size = 5L, max_depth = 100L, keep_data = TRUE, control = covlmc_control(...), ... )
x |
a discrete time series represented by a |
covariate |
a data frame of covariates. |
alpha |
number in (0,1) (default: 0.05) cut off value in the pruning phase (in quantile scale). |
min_size |
number >= 1 (default: 5). Tune the minimum number of observations for a context in the growing phase of the context tree (see below for details). |
max_depth |
integer >= 1 (default: 100). Longest context considered in growing phase of the context tree. |
keep_data |
logical (defaults to |
control |
a list with control parameters, see |
... |
arguments passed to |
The model is built using the algorithm described in Zanin Zambom et al. As
for the vlmc()
approach, the algorithm builds first a context tree (see
ctx_tree()
). The min_size
parameter is used to compute the actual number
of observations per context in the growing phase of the tree. It is computed
as min_size*(1+ncol(covariate)*d)*(s-1)
where d
is the length of the
context (a.k.a. the depth in the tree) and s
is the number of states. This
corresponds to ensuring min_size observations per parameter of the logistic
regression during the estimation phase.
Then logistic models are adjusted in the leaves at the tree: the goal of each logistic model is to estimate the conditional distribution of the next state of the times series given the context (the recent past of the time series) and delayed versions of the covariates. A pruning strategy is used to simplified the models (mainly to reduce the time window associated to the covariates) and the tree itself.
Parameters specified by control
are used to fine tune the behaviour of the
algorithm.
a fitted covlmc model.
By default, covlmc
uses two different computing engines for logistic
models:
when the time series has only two states, covlmc
uses stats::glm()
with a binomial link (stats::binomial()
);
when the time series has at least three
states, covlmc
use VGAM::vglm()
with a multinomial link
(VGAM::multinomial()
).
Both engines are able to detect degenerate cases and lead to more robust
results that using nnet::multinom()
. It is nevertheless possible to
replace stats::glm()
and VGAM::vglm()
with nnet::multinom()
by setting
the global option mixvlmc.predictive
to "multinom"
(the default value is
"glm"
). Notice that while results should be comparable, there is no
guarantee that they will be identical.
Bühlmann, P. and Wyner, A. J. (1999), "Variable length Markov chains." Ann. Statist. 27 (2) 480-513 doi:10.1214/aos/1018031204
Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022), "Variable length Markov chain with exogenous covariates." J. Time Ser. Anal., 43 (2) 312-328 doi:10.1111/jtsa.12615
cutoff.covlmc()
and prune.covlmc()
for post-pruning.
pc <- powerconsumption[powerconsumption$week == 5, ] power_dts <- dts(cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(1 / 3, 2 / 3, 1) )))) power_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(power_dts, power_cov, min_size = 15) draw(m_cov)
pc <- powerconsumption[powerconsumption$week == 5, ] power_dts <- dts(cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(1 / 3, 2 / 3, 1) )))) power_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(power_dts, power_cov, min_size = 15) draw(m_cov)
This function builds a context tree for a time series.
ctx_tree( x, min_size = 2L, max_depth = 100L, keep_position = TRUE, backend = getOption("mixvlmc.backend", "R"), ... )
ctx_tree( x, min_size = 2L, max_depth = 100L, keep_position = TRUE, backend = getOption("mixvlmc.backend", "R"), ... )
x |
an object that can be interpreted as a discrete time series, such
as an integer vector or a |
min_size |
integer >= 1 (default: 2). Minimum number of observations for a context to be included in the tree. |
max_depth |
integer >= 1 (default: 100). Maximum length of a context to be included in the tree. |
keep_position |
logical (default: TRUE). Should the context tree keep the position of the contexts. |
backend |
"R" or "C++" (default: as specified by the "mixvlmc.backend" option). Specifies the implementation used to represent the context tree and to built it. See details. |
... |
additional parameters |
The tree represents all the sequences of symbols/states of length smaller
than max_depth
that appear at least min_size
times in the time series and
stores the frequencies of the states that follow each context. Optionally,
the positions of the contexts in the time series can be stored in the tree.
a context tree (of class that inherits from ctx_tree
).
Two back ends are available to compute context trees:
the "R" back end represents the tree in pure R data structures (nested lists) that be easily processed further in pure R (C++ helper functions are used to speed up the construction).
the "C++" back end represents the tree with C++ classes. This back end is
considered experimental. The tree is built with an optimised suffix tree
algorithm which speeds up the construction by at least a factor 10 in
standard settings. As the tree is kept outside of R direct reach, context
trees built with the C++ back end must be restored after a
saveRDS()
/readRDS()
sequence. This is done automatically by recomputing
completely the context tree.
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) ## get all contexts of length 2 rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 2) draw(rdts_ctree)
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) ## get all contexts of length 2 rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 2) draw(rdts_ctree)
This function builds a context tree for a time series.
## Default S3 method: ctx_tree( x, min_size = 2L, max_depth = 100L, keep_position = TRUE, backend = getOption("mixvlmc.backend", "R"), ... )
## Default S3 method: ctx_tree( x, min_size = 2L, max_depth = 100L, keep_position = TRUE, backend = getOption("mixvlmc.backend", "R"), ... )
x |
a numeric, character, factor or logical vector |
min_size |
integer >= 1 (default: 2). Minimum number of observations for a context to be included in the tree. |
max_depth |
integer >= 1 (default: 100). Maximum length of a context to be included in the tree. |
keep_position |
logical (default: TRUE). Should the context tree keep the position of the contexts. |
backend |
"R" or "C++" (default: as specified by the "mixvlmc.backend" option). Specifies the implementation used to represent the context tree and to built it. See details. |
... |
additional parameters |
The tree represents all the sequences of symbols/states of length smaller
than max_depth
that appear at least min_size
times in the time series and
stores the frequencies of the states that follow each context. Optionally,
the positions of the contexts in the time series can be stored in the tree.
a context tree (of class that inherits from ctx_tree
).
Two back ends are available to compute context trees:
the "R" back end represents the tree in pure R data structures (nested lists) that be easily processed further in pure R (C++ helper functions are used to speed up the construction).
the "C++" back end represents the tree with C++ classes. This back end is
considered experimental. The tree is built with an optimised suffix tree
algorithm which speeds up the construction by at least a factor 10 in
standard settings. As the tree is kept outside of R direct reach, context
trees built with the C++ back end must be restored after a
saveRDS()
/readRDS()
sequence. This is done automatically by recomputing
completely the context tree.
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) ## get all contexts of length 2 rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 2) draw(rdts_ctree)
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) ## get all contexts of length 2 rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 2) draw(rdts_ctree)
This function builds a context tree for a time series.
## S3 method for class 'dts' ctx_tree( x, min_size = 2L, max_depth = 100L, keep_position = TRUE, backend = getOption("mixvlmc.backend", "R"), ... )
## S3 method for class 'dts' ctx_tree( x, min_size = 2L, max_depth = 100L, keep_position = TRUE, backend = getOption("mixvlmc.backend", "R"), ... )
x |
a discrete time series represented by a |
min_size |
integer >= 1 (default: 2). Minimum number of observations for a context to be included in the tree. |
max_depth |
integer >= 1 (default: 100). Maximum length of a context to be included in the tree. |
keep_position |
logical (default: TRUE). Should the context tree keep the position of the contexts. |
backend |
"R" or "C++" (default: as specified by the "mixvlmc.backend" option). Specifies the implementation used to represent the context tree and to built it. See details. |
... |
additional parameters |
The tree represents all the sequences of symbols/states of length smaller
than max_depth
that appear at least min_size
times in the time series and
stores the frequencies of the states that follow each context. Optionally,
the positions of the contexts in the time series can be stored in the tree.
a context tree (of class that inherits from ctx_tree
).
Two back ends are available to compute context trees:
the "R" back end represents the tree in pure R data structures (nested lists) that be easily processed further in pure R (C++ helper functions are used to speed up the construction).
the "C++" back end represents the tree with C++ classes. This back end is
considered experimental. The tree is built with an optimised suffix tree
algorithm which speeds up the construction by at least a factor 10 in
standard settings. As the tree is kept outside of R direct reach, context
trees built with the C++ back end must be restored after a
saveRDS()
/readRDS()
sequence. This is done automatically by recomputing
completely the context tree.
x_dts <- dts(c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)) ## get all contexts of length 2 ctree <- ctx_tree(x_dts, min_size = 1, max_depth = 2) draw(ctree)
x_dts <- dts(c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)) ## get all contexts of length 2 ctree <- ctx_tree(x_dts, min_size = 1, max_depth = 2) draw(ctree)
This generic function returns one or more cut off values that are guaranteed
to have an effect on the model
passed to the function when a simplification
procedure is applied (in general a tree pruning operation as provided by
prune()
).
cutoff(model, ...)
cutoff(model, ...)
model |
a model. |
... |
additional arguments for the cutoff function implementations |
The exact definition of what is a cut off value depends on the model type and is documented in concrete implementation of the function.
a cut off value or a vector of cut off values.
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) model <- vlmc(rdts) draw(model) model_cuts <- cutoff(model) model_2 <- prune(model, model_cuts[2]) draw(model_2)
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) model <- vlmc(rdts) draw(model) model_cuts <- cutoff(model) model_2 <- prune(model, model_cuts[2]) draw(model_2)
This function returns all the cut off values that should induce a pruning of the context tree of a VLMC with covariates.
## S3 method for class 'covlmc' cutoff(model, raw = FALSE, tolerance = .Machine$double.eps^0.5, ...)
## S3 method for class 'covlmc' cutoff(model, raw = FALSE, tolerance = .Machine$double.eps^0.5, ...)
model |
a fitted COVLMC model. |
raw |
specify whether the returned values should be limit values computed in the model or modified values that guarantee pruning (see details) |
tolerance |
specify the minimum separation between two consecutive values of the cut off in native mode (before any transformation). See details. |
... |
additional arguments for the |
Notice that the list of cut off values returned by the function is not as complete as the one computed for a VLMC without covariates. Indeed, pruning the COVLMC tree creates new pruning opportunities that are not evaluated during the construction of the initial model, while all pruning opportunities are computed during the construction of a VLMC context tree. Nevertheless, the largest value returned by the function is guaranteed to produce the least pruned tree consistent with the reference one.
For large COVLMC, some cut off values can be almost identical, with a
difference of the order of the machine epsilon value. The tolerance
parameter is used to keep only values that are different enough. This is done
in the quantile scale, before transformations implemented when raw
is
FALSE
.
Notice that the loglikelihood scale is not directly useful in COVLMC as the
differences in model sizes are not constant through the pruning process. As a
consequence, this function does not provide mode
parameter, contrarily to
cutoff.vlmc()
.
Setting raw
to TRUE
removes the small perturbation that are subtracted
from the log-likelihood ratio values computed from the COVLMC (in quantile
scale).
As automated model selection is provided by tune_covlmc()
, the direct use of
cutoff
should be reserved to advanced exploration of the set of trees that
can be obtained from a complex one, e.g. to implement model selection
techniques that are not provided by tune_covlmc()
.
a vector of cut off values, NULL
if none can be computed
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) m_nocovariate <- vlmc(rdts) draw(m_nocovariate) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5) draw(m_cov) cutoff(m_cov)
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) m_nocovariate <- vlmc(rdts) draw(m_nocovariate) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5) draw(m_cov) cutoff(m_cov)
This function returns the cut off value associated to a specific node in the
context tree interpreted as a VLMC. The node is represented by a ctx_node
object as returned by find_sequence()
or contexts()
. For details, see
cutoff.vlmc()
.
## S3 method for class 'ctx_node' cutoff(model, scale = c("quantile", "native"), raw = FALSE, ...)
## S3 method for class 'ctx_node' cutoff(model, scale = c("quantile", "native"), raw = FALSE, ...)
model |
a |
scale |
specify whether the results should be "native" log likelihood ratio values or expressed in a "quantile" scale of a chi-squared distribution (defaults to "quantile"). |
raw |
specify whether the returned values should be limit values
computed in the model or modified values that guarantee pruning (see
details in |
... |
additional arguments for the |
a cut off value
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) model <- vlmc(rdts) model_ctxs <- contexts(model) cutoff(model_ctxs[[1]]) cutoff(model_ctxs[[2]], scale = "native", raw = TRUE)
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) model <- vlmc(rdts) model_ctxs <- contexts(model) cutoff(model_ctxs[[1]]) cutoff(model_ctxs[[2]], scale = "native", raw = TRUE)
This function returns a collection of cut off values that are guaranteed to
induce all valid pruned trees of the context tree of a VLMC. Pruning is
implemented by the prune()
function.
## S3 method for class 'vlmc' cutoff( model, scale = c("quantile", "native"), raw = FALSE, tolerance = .Machine$double.eps^0.5, ... ) ## S3 method for class 'vlmc_cpp' cutoff( model, scale = c("quantile", "native"), raw = FALSE, tolerance = .Machine$double.eps^0.5, ... )
## S3 method for class 'vlmc' cutoff( model, scale = c("quantile", "native"), raw = FALSE, tolerance = .Machine$double.eps^0.5, ... ) ## S3 method for class 'vlmc_cpp' cutoff( model, scale = c("quantile", "native"), raw = FALSE, tolerance = .Machine$double.eps^0.5, ... )
model |
a fitted VLMC model. |
scale |
specify whether the results should be "native" log likelihood ratio values or expressed in a "quantile" scale of a chi-squared distribution (defaults to "quantile"). |
raw |
specify whether the returned values should be limit values computed in the model or modified values that guarantee pruning (see details) |
tolerance |
specify the minimum separation between two consecutive values of the cut off in native mode (before any transformation). See details. |
... |
additional arguments for the cutoff function. |
By default, the function returns values that can be used directly to induce
pruning in the context tree. This is done by computing the log likelihood
ratios used by the context algorithm on the reference VLMC and by keeping the
relevant ones. From them the function selects intermediate values that are
guaranteed to generate via pruning all the VLMC models that could be
generated by using larger values of the cutoff
parameter that was used to
build the reference model (or smaller values of the alpha
parameter in
"quantile" scale).
Setting the raw
parameter to TRUE
removes this operation on the values
and asks the function to return the relevant log likelihood ratios.
For large VLMC, some log likelihood ratios can be almost identical, with a
difference of the order of the machine epsilon value. The tolerance
parameter is used to keep only values that are different enough. This is done
in the native scale, before transformations implemented when raw
is
FALSE
.
As automated model selection is provided by tune_vlmc()
, the direct use of
cutoff
should be reserved to advanced exploration of the set of trees that
can be obtained from a complex one, e.g. to implement model selection
techniques that are not provided by tune_vlmc()
.
a vector of cut off values.
prune()
and tune_vlmc()
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) model <- vlmc(rdts) draw(model) model_cuts <- cutoff(model) model_2 <- prune(model, model_cuts[2]) draw(model_2)
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) model <- vlmc(rdts) draw(model) model_cuts <- cutoff(model) model_2 <- prune(model, model_cuts[2]) draw(model_2)
This function returns the depth of a context tree, i.e. the length of the longest context represented in the tree.
depth(ct)
depth(ct)
ct |
a context tree. |
the depth of the tree.
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3) ## should be 3 depth(rdts_ctree)
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3) ## should be 3 depth(rdts_ctree)
This function 'draws' a context tree as a text.
draw(ct, format, control = draw_control(), ...)
draw(ct, format, control = draw_control(), ...)
ct |
a context tree. |
format |
a character string that specifies the output format of the
function. Possible values are |
control |
a list of low level control parameters of the text
representation. See details and |
... |
additional arguments for draw. |
The function uses different text based formats (plain "ascii art" and LaTeX)
to represent the context tree. Fine tuning of the representation can be done
via the draw_control()
function.
In addition to the structure of the context tree, draw()
can represent
information attached to the nodes (contexts and partial contexts). This is
controlled by additional parameters depending on the type of the context
tree. In general, parameters given directly to draw()
specify what
information is represented while details on how this representation is made
can be controlled via the control
parameter and the associated
draw_control()
function.
the context tree (invisibly).
The format
parameter specifies the format used for the textual output.
With the default value "text"
the output is produced in "ascii art" using
by default only ascii characters (notice that draw_control()
can be used
to specified non ascii characters, but this is discouraged).
With the latex
value, the output is produced in LaTeX, leveraging the
forest Latex package (see
https://ctan.org/pkg/forest). Each call to draw()
produces a full
forest
LaTeX environment. This can be included as is in a LaTeX document,
provided the forest
package is loaded in the preamble of the document.
The LaTeX output is sanitized to avoid potential problems induced by
special characters in the names of the states of the context tree.
rdts <- sample(c(0, 1), 100, replace = TRUE) ctree <- ctx_tree(rdts, min_size = 10, max_depth = 2) draw(ctree) rdts_c <- sample(c("A", "B", "CD"), 100, replace = TRUE) ctree_c <- ctx_tree(rdts_c, min_size = 10, max_depth = 2) draw(ctree_c, control = draw_control(digits = 2)) ## LaTeX output draw(ctree_c, "latex")
rdts <- sample(c(0, 1), 100, replace = TRUE) ctree <- ctx_tree(rdts, min_size = 10, max_depth = 2) draw(ctree) rdts_c <- sample(c("A", "B", "CD"), 100, replace = TRUE) ctree_c <- ctx_tree(rdts_c, min_size = 10, max_depth = 2) draw(ctree_c, control = draw_control(digits = 2)) ## LaTeX output draw(ctree_c, "latex")
draw
This function returns a list used to fine tune the draw()
function
behaviour.
draw_control( digits = 4, charset = NULL, orientation = c("vertical", "horizontal"), tabular = TRUE, tab_orientation = c("vertical", "horizontal"), decoration = c("none", "rectangle", "circle", "ellipse"), fontsize = "normalsize", prob_fontsize = "small" )
draw_control( digits = 4, charset = NULL, orientation = c("vertical", "horizontal"), tabular = TRUE, tab_orientation = c("vertical", "horizontal"), decoration = c("none", "rectangle", "circle", "ellipse"), fontsize = "normalsize", prob_fontsize = "small" )
digits |
numerical parameters and p-values are represented using the
|
charset |
specifies the characters used for the "ascii art" represention when the format is "text", see details. |
orientation |
specifies the global orientation of the tree, either "vertical" (default) or "horizontal" ("latex"). |
tabular |
if TRUE (default value), the "latex" format will use tables
for each node, with one row for the state value and other rows for
additional information (such as the conditional probability associated to
the context). Notice that |
tab_orientation |
specifies the way the models are represented when used
by |
decoration |
specifies node decoration in the "latex" format, see details. |
fontsize |
font size for the state names in the "latex" format (using
latex standard font size, default to |
prob_fontsize |
font size for the context counts, probabilities or
models in the "latex" format (using latex standard font size, defaults to
|
Parameters are generally specific to the format
used for draw()
. If this
is the case, the format is given at the end of the parameter description.
Some parameters are also specific to some functions inheriting from draw()
.
a list
The LaTeX format ("latex"
) can "decorate" the nodes of the context tree
by drawing borders. We support only basic decorations, but in theory all
TikZ possibilities could be used (see the documentation of the forest LaTeX package). Supported decorations:
"none"
: default, no decoration;
"rectangle"
: adds a rectangular border to all nodes;
"circle"
: adds a circular border to all nodes;
"ellipse"
: adds an ellipsoidal border to all nodes.
The "ascii art" format ("text"
) uses a collection of characters to
display a context tree. The default collection is specified by the
package option "mixvlmc.charset"
and is used when charset=NULL
(default
value). If charset
is set to a character value, this value is used to
select the collection in the same way that "mixvlmc.charset"
specifies
it:
"ascii"
: the collection uses only standard ASCII characters and
should be compatible with all environments;
"utf8"
: the collection uses UTF-8 symbols and needs a compatible display.
Finally, charset
can a user supplied list of characters as the one returned
by charset_ascii()
and charset_utf8()
.
draw()
, charset_ascii()
and charset_utf8()
.
draw_control(digits = 2, tabular = FALSE)
draw_control(digits = 2, tabular = FALSE)
This function 'draws' a covlmc as a text.
## S3 method for class 'covlmc' draw( ct, format, control = draw_control(), model = c("coef", "full"), p_value = FALSE, with_state = FALSE, constant_as_prob = TRUE, ... )
## S3 method for class 'covlmc' draw( ct, format, control = draw_control(), model = c("coef", "full"), p_value = FALSE, with_state = FALSE, constant_as_prob = TRUE, ... )
ct |
a fitted covlmc model. |
format |
a character string that specifies the output format of the
function. Possible values are |
control |
a list of low level control parameters of the text
representation. See details and |
model |
this parameter controls the display of logistic models
associated to nodes (accepted values: |
p_value |
specifies whether the p-values of the likelihood ratio tests
conducted during the covlmc construction must be included in the
representation (defaults to |
with_state |
specifies whether to display the state associated to each dimension of the logistic model (see details). |
constant_as_prob |
specifies how to represent constant logistic models
for |
... |
additional arguments for draw. |
The function uses different text based formats (plain "ascii art" and LaTeX)
to represent the context tree. Fine tuning of the representation can be done
via the draw_control()
function.
Contrarily to draw()
functions adapted to context trees draw.ctx_tree()
and VLMC draw.vlmc()
, the present function does not try to produce similar
results for the "text"
format and the "latex"
format as the "text"
format is intrinsically more limited in terms of model representations. This
is detailed below.
The format
parameter specifies the format used for the textual output.
With the default value "text"
the output is produced in "ascii art" using
the charset specified by the global option mixvlmc.charset
.
With the latex
value, the output is produced in LaTeX, leveraging the
forest Latex package (see
https://ctan.org/pkg/forest). Each call to draw.covlmc()
produces a full
forest
LaTeX environment. This can be included as is in a LaTeX document,
provided the forest
package is loaded in the preamble of the document.
The LaTeX output is sanitized to avoid potential problems induced by
special characters in the names of the states of the context tree.
"text"
formatWhen format="text"
the parameters are interpreted as follows:
model
: the default model="coef"
represents only the coefficients
of the logistic models associated to each context. model="full"
includes
the name of the variables in the representation. Setting model=NULL
removes the model representations. Additional parameters can be used to
tweak model representations (see below).
constant_as_prob
: specifies whether to represent logistic models that
do not use covariates (a.k.a. constant models) using the probability
distributions they induce on the state space (default behaviour with
constant_as_prob=TRUE
) or as normal models (when set to FALSE
). This is
not taken into account when model
is not set to "coef"
.
fields of the control
list (including the charset):
intercept
: character(s) used to represent the intercept when
model="full"
intercept_sep
: character(s) used to separate the intercept from
the other coefficients in model representation.
time_sep
: character(s) used to split the coefficients list by blocks
associated to time delays in the covariate inclusion into the logistic
model. The first block contains the intercept(s), the second block the
covariate values a time t-1, the third block at time t-2, etc.
level_sep
: character(s) used separate levels from model, see below.
open_p_value
and close_p_value
: delimiters used around the p-values
when p_value=TRUE
open_model
and close_model
: delimiters around the model when model
is not NULL
When model
is not NULL
, the coefficients of the logistic models are
presented, organized in rows associated to states. One state is used as the
reference state and the logistic model aims at predicting the ratio of
probability between another state and the reference one (in log scale).
When with_state
is TRUE
, the display includes for each row of
coefficients the target state. This is useful when using e.g. VGAM::vglm()
as unused levels of the target variable will be automatically dropped from
the model, leading to a reduce number of rows. The reference state is
either shown on the first row if model
is "full"
or after the state on
each row if model
is "coef"
. States are separated from the model
representation by the character(s) specified in level_sep
in the
control
list.
"latex"
formatWhen format="latex"
the parameters are interpreted as follows:
model
: the models are always represented completely in the LaTeX export
unless model
is set to NULL
.
constant_as_prob
: in the LaTeX export, constant logistic models are
always represented by the corresponding probability distribution on the
state space, regardless of the value of constant_as_prob
.
fields of the control
list:
orientation
: specifies the orientation of the tree, either the default
"vertical"
(expanding from top to bottom) or "horizontal"
(expanding
from right to left);
tab_orientation
: specifies the orientation of the tables used to
represent model coefficients in the tree, either the default "vertical"
(covariates are listed on one column) or "horizontal"
(covariates are listed
on one row);
fontsize
and prob_fontsize
handle the size of the fonts used for the
states and for the models, see draw_control()
for details;
decoration
can be used to add borders around states, see
draw_control()
for details;
When model
is not NULL
, the coefficients of the logistic models are
presented, organized in rows or in columns (depending tab_orientation
) on
associated to states. One state is used as the reference state and the
logistic model aims at predicting the ratio of probability between another
state and the reference one (in log scale). When with_state
is TRUE
,
the display includes for each row/column of coefficients the target state.
The reference state is shown on the first row/column.
When the representation includes the names of the variables used by the
logistic models, they are the one generated by the underlying logistic model,
e.g. stats::glm()
. Numerical variable names are used as is, while factors
have levels appended. The intercept is denoted by the intercept
member
of the control
list whenformat="text"
(as part of the charset). It is
always represented by (I)
when format="latex"
.
When format="text"
, the time delays are represented by an underscore
followed by the time delay. For instance if the model uses the numerical
covariate y
with two delays, it will appear with two variables y_1
and
y_2
.
When format="latex"
, the representation uses a temporal subscript of the
form t-1
, t-2
, etc.
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))) ) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5) draw(m_cov, control = draw_control(digits = 3)) draw(m_cov, model = NULL) draw(m_cov, p_value = TRUE) draw(m_cov, p_value = FALSE, control = draw_control(digits = 2)) draw(m_cov, model = "full", control = draw_control(digits = 3)) draw(m_cov, format = "latex", control = draw_control(orientation = "h"))
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))) ) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5) draw(m_cov, control = draw_control(digits = 3)) draw(m_cov, model = NULL) draw(m_cov, p_value = TRUE) draw(m_cov, p_value = FALSE, control = draw_control(digits = 2)) draw(m_cov, model = "full", control = draw_control(digits = 3)) draw(m_cov, format = "latex", control = draw_control(orientation = "h"))
This function 'draws' a context tree as a text.
## S3 method for class 'ctx_tree_cpp' draw(ct, format, control = draw_control(), frequency = NULL, ...) ## S3 method for class 'ctx_tree' draw(ct, format, control = draw_control(), frequency = NULL, ...)
## S3 method for class 'ctx_tree_cpp' draw(ct, format, control = draw_control(), frequency = NULL, ...) ## S3 method for class 'ctx_tree' draw(ct, format, control = draw_control(), frequency = NULL, ...)
ct |
a context tree. |
format |
a character string that specifies the output format of the
function. Possible values are |
control |
a list of low level control parameters of the text
representation. See details and |
frequency |
this parameter controls the display of node level
information in the tree. The default |
... |
additional arguments for draw. |
The function uses different text based formats (plain "ascii art" and LaTeX)
to represent the context tree. Fine tuning of the representation can be done
via the draw_control()
function.
In addition to the structure of the context tree, draw()
can represent
information attached to the nodes (contexts and partial contexts). This is
controlled by additional parameters depending on the type of the context
tree. In general, parameters given directly to draw()
specify what
information is represented while details on how this representation is made
can be controlled via the control
parameter and the associated
draw_control()
function.
the context tree (invisibly).
The format
parameter specifies the format used for the textual output.
With the default value "text"
the output is produced in "ascii art" using
by default only ascii characters (notice that draw_control()
can be used
to specified non ascii characters, but this is discouraged).
With the latex
value, the output is produced in LaTeX, leveraging the
forest Latex package (see
https://ctan.org/pkg/forest). Each call to draw()
produces a full
forest
LaTeX environment. This can be included as is in a LaTeX document,
provided the forest
package is loaded in the preamble of the document.
The LaTeX output is sanitized to avoid potential problems induced by
special characters in the names of the states of the context tree.
rdts_c <- sample(c("A", "B", "CD"), 100, replace = TRUE) ctree_c <- ctx_tree(rdts_c, min_size = 10, max_depth = 2) draw(ctree_c, frequency = "total") draw(ctree_c, frequency = "detailed") ## LaTeX output draw(ctree_c, "latex", frequency = "detailed") rdts_c <- sample(c("A$", "_{B", "{C}_{D}"), 100, replace = TRUE) ctree_c <- ctx_tree(rdts_c, min_size = 10, max_depth = 2) ## the LaTeX output is sanitized draw(ctree_c, "latex", frequency = "detailed")
rdts_c <- sample(c("A", "B", "CD"), 100, replace = TRUE) ctree_c <- ctx_tree(rdts_c, min_size = 10, max_depth = 2) draw(ctree_c, frequency = "total") draw(ctree_c, frequency = "detailed") ## LaTeX output draw(ctree_c, "latex", frequency = "detailed") rdts_c <- sample(c("A$", "_{B", "{C}_{D}"), 100, replace = TRUE) ctree_c <- ctx_tree(rdts_c, min_size = 10, max_depth = 2) ## the LaTeX output is sanitized draw(ctree_c, "latex", frequency = "detailed")
This function 'draws' a context tree as a text.
## S3 method for class 'vlmc' draw(ct, format, control = draw_control(), prob = TRUE, ...) ## S3 method for class 'vlmc_cpp' draw(ct, format, control = draw_control(), prob = TRUE, ...)
## S3 method for class 'vlmc' draw(ct, format, control = draw_control(), prob = TRUE, ...) ## S3 method for class 'vlmc_cpp' draw(ct, format, control = draw_control(), prob = TRUE, ...)
ct |
a fitted vlmc. |
format |
a character string that specifies the output format of the
function. Possible values are |
control |
a list of low level control parameters of the text
representation. See details and |
prob |
this parameter controls the display of node level information in
the tree. The default |
... |
additional arguments for draw. |
The function uses different text based formats (plain "ascii art" and LaTeX)
to represent the context tree. Fine tuning of the representation can be done
via the draw_control()
function.
In addition to the structure of the context tree, draw()
can represent
information attached to the nodes (contexts and partial contexts). This is
controlled by additional parameters depending on the type of the context
tree. In general, parameters given directly to draw()
specify what
information is represented while details on how this representation is made
can be controlled via the control
parameter and the associated
draw_control()
function.
the context tree (invisibly).
The format
parameter specifies the format used for the textual output.
With the default value "text"
the output is produced in "ascii art" using
by default only ascii characters (notice that draw_control()
can be used
to specified non ascii characters, but this is discouraged).
With the latex
value, the output is produced in LaTeX, leveraging the
forest Latex package (see
https://ctan.org/pkg/forest). Each call to draw()
produces a full
forest
LaTeX environment. This can be included as is in a LaTeX document,
provided the forest
package is loaded in the preamble of the document.
The LaTeX output is sanitized to avoid potential problems induced by
special characters in the names of the states of the context tree.
rdts <- sample(c("A", "B", "C"), 500, replace = TRUE) model <- vlmc(rdts, alpha = 0.05) draw(model) draw(model, prob = FALSE) draw(model, prob = NULL)
rdts <- sample(c("A", "B", "C"), 500, replace = TRUE) model <- vlmc(rdts, alpha = 0.05) draw(model) draw(model, prob = FALSE) draw(model, prob = NULL)
This function creates a representation of a discrete time series that can be further processed by model estimation functions.
dts(x, vals = NULL)
dts(x, vals = NULL)
x |
a discrete time series; can be numeric, character, factor or logical. |
vals |
the set of values that can be taken by the time series, a.k.a. the
state space, see details (defaults to |
The discrete time series x
can be a vector of numeric, character, factor or
logical type. If the state space of the series is not specified, that is when
vals
is NULL
, it is computed in a way that depends on the type of x
:
for a factor, vals
is set to the levels()
of x
;
for a logical vector, vals
is set to c(FALSE, TRUE)
;
for other types, vals
is set to all the unique values taken by the time
series (as returned by sort(unique(x))
).
If vals
is specified, the function makes sure that x
contains only the
specified values.
a discrete time series (of class that inherits from dts
).
x_dts <- dts(sample(c("A", "B"), 20, replace = TRUE)) x_dts
x_dts <- dts(sample(c("A", "B"), 20, replace = TRUE)) x_dts
This function returns a copy of the discrete time series used to build the
dts object (see dts()
).
dts_data(x)
dts_data(x)
x |
a dts object |
a vector representing the time seris
raw_dts <- sample(c("A", "B", "C"), 50, replace = TRUE) odts <- dts(raw_dts) back_to_raw <- dts_data(odts) ## should be TRUE identical(raw_dts, back_to_raw)
raw_dts <- sample(c("A", "B", "C"), 50, replace = TRUE) odts <- dts(raw_dts) back_to_raw <- dts_data(odts) ## should be TRUE identical(raw_dts, back_to_raw)
This function checks whether the sequence ctx
is represented in the context
tree ct
. If this is the case, it returns a description of matching node, an
object of class ctx_node
. If the sequence is not represented in the tree,
the function return NULL
.
find_sequence(ct, ctx, reverse = FALSE, ...) ## S3 method for class 'ctx_tree' find_sequence(ct, ctx, reverse = FALSE, ...) ## S3 method for class 'ctx_tree_cpp' find_sequence(ct, ctx, reverse = FALSE, ...)
find_sequence(ct, ctx, reverse = FALSE, ...) ## S3 method for class 'ctx_tree' find_sequence(ct, ctx, reverse = FALSE, ...) ## S3 method for class 'ctx_tree_cpp' find_sequence(ct, ctx, reverse = FALSE, ...)
ct |
a context tree. |
ctx |
a sequence to search in the context tree |
reverse |
specifies whether the sequence |
... |
additional parameters for the find_sequence function |
The function looks for sequences in general. The is_context()
function can
be used on the resulting object to test if the sequence is in addition a
proper context.
an object of class ctx_node
if the sequence ctx
is represented
in the context tree, NULL
when this is not the case.
sequence are given by default
in the temporal order and not in the "reverse" order used by many VLMC
research papers: older values are on the left. For instance, the context
c(1, 0)
is reported if the sequence 0, then 1 appeared in the time series
used to build the context tree. In the present function, reverse
refers
both to the order used for the ctx
parameter and for the default order used by the resulting ctx_node
object.
rdts <- c("A", "B", "C", "A", "A", "B", "B", "C", "C", "A") rdts_tree <- ctx_tree(rdts, max_depth = 3) find_sequence(rdts_tree, "A") ## returns NULL as "A" "C" does not appear in rdts find_sequence(rdts_tree, c("A", "C"))
rdts <- c("A", "B", "C", "A", "A", "B", "B", "C", "C", "A") rdts_tree <- ctx_tree(rdts, max_depth = 3) find_sequence(rdts_tree, "A") ## returns NULL as "A" "C" does not appear in rdts find_sequence(rdts_tree, c("A", "C"))
This function checks whether the sequence ctx
is represented in the context
tree of the COVLMC model ct
. If this is the case, it returns a description
of matching node, an object of class ctx_node_covlmc
. If the sequence is
not represented in the tree, the function return NULL
.
## S3 method for class 'covlmc' find_sequence(ct, ctx, reverse = FALSE, ...)
## S3 method for class 'covlmc' find_sequence(ct, ctx, reverse = FALSE, ...)
ct |
a context tree. |
ctx |
a sequence to search in the context tree |
reverse |
specifies whether the sequence |
... |
additional parameters for the find_sequence function |
The function looks for sequences in general. The is_context()
function can
be used on the resulting object to test if the sequence is in addition a
proper context.
an object of class ctx_node_covlmc
if the sequence ctx
is represented
in the context tree, NULL
when this is not the case
sequence are given by default
in the temporal order and not in the "reverse" order used by many VLMC
research papers: older values are on the left. For instance, the context
c(1, 0)
is reported if the sequence 0, then 1 appeared in the time series
used to build the context tree. In the present function, reverse
refers
both to the order used for the ctx
parameter and for the default order used by the resulting ctx_node
object.
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 10) ## not in the tree vals <- states(m_cov) find_sequence(m_cov, c(vals[2], vals[2])) ## in the tree but not a context node <- find_sequence(m_cov, c(vals[1])) node is_context(node) ## in the tree and a context node <- find_sequence(m_cov, c(vals[1], vals[1])) node is_context(node) model(node)
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 10) ## not in the tree vals <- states(m_cov) find_sequence(m_cov, c(vals[2], vals[2])) ## in the tree but not a context node <- find_sequence(m_cov, c(vals[1])) node is_context(node) ## in the tree and a context node <- find_sequence(m_cov, c(vals[1], vals[1])) node is_context(node) model(node)
A data set containing Earthquake that have occured during the period of 1900-2022 with GPS coordinates and magnitudes.
globalearthquake
globalearthquake
A data frame with 98785 rows and 12 variables:
Date and time in POSIXct format
latitude of the earthquake, from -90° to 90°
longitude of the earthquake, from -180° to 180°
the magnitude of the earthquake, indicating its strenth
date when the seisme occured
number of weeks since 1900/01/01
year
month of the year
day of the month
week number
day of the week from 1 = Sunday to 7 = Saturday
day of the year from 1 to 366
This is a compiled version of the full data set available on U.S. Geological Survey Earthquake Events (USGS) which is in the public domain.
The data set contains only the earthquake between 1900 and 2022 with a magnitude higher than 5.
Earthquake Catalog, U.S. Geological Survey, Department of the Interior. https://www.usgs.gov/programs/earthquake-hazards
This function returns TRUE
if the node is a proper context, FALSE
in the other case.
is_context(node)
is_context(node)
node |
a |
TRUE
if the node node
is a proper context,
FALSE
when this is not the case
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3) draw(rdts_ctree) ## 0, 0 is a context but 1, 0 is not is_context(find_sequence(rdts_ctree, c(0, 0))) is_context(find_sequence(rdts_ctree, c(1, 0)))
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3) draw(rdts_ctree) ## 0, 0 is a context but 1, 0 is not is_context(find_sequence(rdts_ctree, c(0, 0))) is_context(find_sequence(rdts_ctree, c(1, 0)))
This function returns TRUE
for VLMC models with covariates and FALSE
for other objects.
is_covlmc(x)
is_covlmc(x)
x |
an R object. |
TRUE
for VLMC models with covariates.
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5) # should be true is_ctx_tree(m_cov) # should be true is_covlmc(m_cov) # should be false is_vlmc(m_cov)
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5) # should be true is_ctx_tree(m_cov) # should be true is_covlmc(m_cov) # should be false is_vlmc(m_cov)
This function returns TRUE
for context trees and FALSE
for other objects.
is_ctx_tree(x)
is_ctx_tree(x)
x |
an R object. |
TRUE
for context trees.
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 2) is_ctx_tree(rdts_ctree) is_ctx_tree(rdts)
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 2) is_ctx_tree(rdts_ctree) is_ctx_tree(rdts)
This function returns TRUE
for discrete time series and FALSE
for other objects.
is_dts(x)
is_dts(x)
x |
an R object. |
TRUE
for discrete time series.
pre_dts <- sample(c("A", "B"), 20, replace = TRUE) x_dts <- dts(pre_dts) is_dts(x_dts) is_dts(pre_dts)
pre_dts <- sample(c("A", "B"), 20, replace = TRUE) x_dts <- dts(pre_dts) is_dts(x_dts) is_dts(pre_dts)
The function returns TRUE
if the context represented by this node is merged
with at least another one and FALSE
if this is not the case.
is_merged(node)
is_merged(node)
node |
A |
When a COVLMC is built on a time series with at least three distinct states,
some contexts can be merged: they use the same logistic model, leading to a
more parsimonious model. Those contexts are reported individually by
functions such as contexts.covlmc()
. The present function can be used
to detect such merging, while merged_with()
can be used to recover the
other contexts.
TRUE or FALSE, depending on the nature of the context
pc <- powerconsumption[powerconsumption$week == 15, ] rdts <- cut(pc$active_power, breaks = c(0, 1, 2, 3, 8)) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5, alpha = 0.1) ctxs <- contexts(m_cov) ## no merging sapply(ctxs, is_merged)
pc <- powerconsumption[powerconsumption$week == 15, ] rdts <- cut(pc$active_power, breaks = c(0, 1, 2, 3, 8)) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5, alpha = 0.1) ctxs <- contexts(m_cov) ## no merging sapply(ctxs, is_merged)
This function returns TRUE
if the node is using a reverse temporal ordering
and FALSE
in the other case.
is_reversed(node)
is_reversed(node)
node |
a |
TRUE
if the node node
use a reverse temporal ordering, FALSE
when this is not the case
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3) is_reversed(find_sequence(rdts_ctree, c(0, 0))) is_reversed(find_sequence(rdts_ctree, c(1, 0), reverse = TRUE))
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3) is_reversed(find_sequence(rdts_ctree, c(0, 0))) is_reversed(find_sequence(rdts_ctree, c(1, 0), reverse = TRUE))
This function returns TRUE
for VLMC models and FALSE
for other objects.
is_vlmc(x)
is_vlmc(x)
x |
an R object. |
TRUE
for VLMC models.
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) model <- vlmc(rdts) # should be true is_ctx_tree(model) # should be true is_vlmc(model) # should be false is_covlmc(model)
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) model <- vlmc(rdts) # should be true is_ctx_tree(model) # should be true is_vlmc(model) # should be false is_covlmc(model)
This function evaluates the log-likelihood of a VLMC with covariates fitted on a discrete time series.
## S3 method for class 'covlmc' logLik(object, initial = c("truncated", "specific", "extended"), ...)
## S3 method for class 'covlmc' logLik(object, initial = c("truncated", "specific", "extended"), ...)
object |
the covlmc representation. |
initial |
specifies the likelihood function, more precisely the way the
first few observations for which contexts cannot be calculated are
integrated in the likelihood. Defaults to |
... |
additional parameters for logLik. |
an object of class logLik
. This is a number, the log-likelihood of
the (CO)VLMC with the following attributes:
df
: the number of parameters used by the VLMC for this likelihood calculation
nobs
: the number of observations included in this likelihood calculation
initial
: the value of the initial
parameter used to compute this likelihood
## Likelihood for a fitted VLMC with covariates. pc <- powerconsumption[powerconsumption$week == 5, ] breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE) ) labels <- c(0, 1) rdts <- cut(pc$active_power, breaks = breaks, labels = labels) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5) ll <- logLik(m_cov) attributes(ll)
## Likelihood for a fitted VLMC with covariates. pc <- powerconsumption[powerconsumption$week == 5, ] breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE) ) labels <- c(0, 1) rdts <- cut(pc$active_power, breaks = breaks, labels = labels) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5) ll <- logLik(m_cov) attributes(ll)
This function evaluates the log-likelihood of a VLMC fitted on a discrete time series.
## S3 method for class 'vlmc' logLik(object, initial = c("truncated", "specific", "extended"), ...) ## S3 method for class 'vlmc_cpp' logLik(object, initial = c("truncated", "specific", "extended"), ...)
## S3 method for class 'vlmc' logLik(object, initial = c("truncated", "specific", "extended"), ...) ## S3 method for class 'vlmc_cpp' logLik(object, initial = c("truncated", "specific", "extended"), ...)
object |
the vlmc representation. |
initial |
specifies the likelihood function, more precisely the way the
first few observations for which contexts cannot be calculated are
integrated in the likelihood. Defaults to |
... |
additional parameters for logLik. |
an object of class logLik
. This is a number, the log-likelihood of
the (CO)VLMC with the following attributes:
df
: the number of parameters used by the VLMC for this likelihood calculation
nobs
: the number of observations included in this likelihood calculation
initial
: the value of the initial
parameter used to compute this likelihood
pc <- powerconsumption[powerconsumption$week == 5, ] breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE) ) labels <- c(0, 1) rdts <- cut(pc$active_power, breaks = breaks, labels = labels) m_nocovariate <- vlmc(rdts) ll <- logLik(m_nocovariate) ll attributes(ll)
pc <- powerconsumption[powerconsumption$week == 5, ] breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE) ) labels <- c(0, 1) rdts <- cut(pc$active_power, breaks = breaks, labels = labels) m_nocovariate <- vlmc(rdts) ll <- logLik(m_nocovariate) ll attributes(ll)
This function evaluates the log-likelihood of a VLMC fitted on a discrete
time series. When the optional argument newdata
is provided, the function
evaluates instead the log-likelihood for this (new) discrete time series.
loglikelihood( vlmc, newdata, initial = c("truncated", "specific", "extended"), ignore, ... ) ## S3 method for class 'vlmc' loglikelihood( vlmc, newdata, initial = c("truncated", "specific", "extended"), ignore, ... ) ## S3 method for class 'vlmc_cpp' loglikelihood( vlmc, newdata, initial = c("truncated", "specific", "extended"), ignore, ... )
loglikelihood( vlmc, newdata, initial = c("truncated", "specific", "extended"), ignore, ... ) ## S3 method for class 'vlmc' loglikelihood( vlmc, newdata, initial = c("truncated", "specific", "extended"), ignore, ... ) ## S3 method for class 'vlmc_cpp' loglikelihood( vlmc, newdata, initial = c("truncated", "specific", "extended"), ignore, ... )
vlmc |
the vlmc representation. |
newdata |
an optional object that can be interpreted as a discrete time
series (for instance a |
initial |
specifies the likelihood function, more precisely the way the
first few observations for which contexts cannot be calculated are integrated
in the likelihood. Defaults to |
ignore |
specifies the number of initial values for which the loglikelihood will not be computed. The minimal number depends on the likelihood function as detailed below. |
... |
additional parameters for loglikelihood. |
The definition of the likelihood function depends on the value of the
initial
parameters, see the section below as well as the dedicated
vignette: vignette("likelihood", package = "mixvlmc")
.
For VLMC objects, the method loglikelihood.vlmc
will be used. For VLMC with
covariables, loglikelihood.covlmc
will instead be called. For more
informations on loglikelihood
methods, use methods(loglikelihood)
and
their associated documentation.
an object of class logLikMixVLMC
and logLik
. This is a number,
the log-likelihood of the (CO)VLMC with the following attributes:
df
: the number of parameters used by the VLMC for this likelihood calculation
nobs
: the number of observations included in this likelihood calculation
initial
: the value of the initial
parameter used to compute this likelihood
In a (CO)VLMC of depth()
=k, we need k past values in order to compute the
context of a given observation. As a consequence, in a time series x
, the
contexts of x[1]
to x[k]
are unknown. Depending on the value of initial
different likelihood functions are used to tackle this difficulty:
initial=="truncated"
: the likelihood is computed using only
x[(k+1):length(x)]
initial=="specific"
: the likelihood is computed on the full time series
using a specific context for the initial values, x[1]
to x[k]
. Each of
the specific context is unique, leading to a perfect likelihood of 1 (0 in
log scale). Thus the numerical value of the likelihood is identical as the
one obtained with initial=="truncated"
but it is computed on length(x)
with a model with more parameters than in this previous case.
initial=="extended"
(default): the likelihood is computed on the full time series
using an extended context matching for the initial values, x[1]
to x[k]
.
This can be seen as a compromised between the two other possibilities:
the relaxed context matching needs in general to turn internal nodes
of the context tree into actual context, increasing the number of parameters,
but not as much as with "specific". However, the likelihood of say x[1]
with an empty context is generally not 1 and thus the full likelihood is
smaller than the one computed with "specific".
In all cases, the ignore
first values of the time series are not included
in the computed likelihood, but still used to compute contexts. If ignore
is not specified, it is set to the minimal possible value, that is k for the
truncated
likelihood and 0 for the other ones. If it is specified, it must
be larger or equal to k for truncated
.
See the dedicated vignette for a more mathematically oriented discussion:
vignette("likelihood", package = "mixvlmc")
.
## Likelihood for a fitted VLMC. pc <- powerconsumption[powerconsumption$week == 5, ] breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE) ) labels <- c(0, 1) rdts <- cut(pc$active_power, breaks = breaks, labels = labels) m_nocovariate <- vlmc(rdts) ll <- loglikelihood(m_nocovariate) ll attr(ll, "nobs") attr(ll, "df") ## Likelihood for a new time series with previously fitted VLMC. pc_new <- powerconsumption[powerconsumption$week == 11, ] rdts_new <- cut(pc_new$active_power, breaks = breaks, labels = labels) ll_new <- loglikelihood(m_nocovariate, newdata = rdts_new) ll_new attributes(ll_new) ll_new_specific <- loglikelihood(m_nocovariate, initial = "specific", newdata = rdts_new) ll_new_specific attributes(ll_new_specific) ll_new_extended <- loglikelihood(m_nocovariate, initial = "extended", newdata = rdts_new) ll_new_extended attributes(ll_new_extended)
## Likelihood for a fitted VLMC. pc <- powerconsumption[powerconsumption$week == 5, ] breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE) ) labels <- c(0, 1) rdts <- cut(pc$active_power, breaks = breaks, labels = labels) m_nocovariate <- vlmc(rdts) ll <- loglikelihood(m_nocovariate) ll attr(ll, "nobs") attr(ll, "df") ## Likelihood for a new time series with previously fitted VLMC. pc_new <- powerconsumption[powerconsumption$week == 11, ] rdts_new <- cut(pc_new$active_power, breaks = breaks, labels = labels) ll_new <- loglikelihood(m_nocovariate, newdata = rdts_new) ll_new attributes(ll_new) ll_new_specific <- loglikelihood(m_nocovariate, initial = "specific", newdata = rdts_new) ll_new_specific attributes(ll_new_specific) ll_new_extended <- loglikelihood(m_nocovariate, initial = "extended", newdata = rdts_new) ll_new_extended attributes(ll_new_extended)
This function evaluates the log-likelihood of a VLMC with covariates fitted
on a discrete time series. When the optional arguments newdata
is
provided, the function evaluates instead the log-likelihood for this (new)
discrete time series on the new covariates which must be provided through the
newcov
parameter.
## S3 method for class 'covlmc' loglikelihood( vlmc, newdata, initial = c("truncated", "specific", "extended"), ignore, newcov, ... )
## S3 method for class 'covlmc' loglikelihood( vlmc, newdata, initial = c("truncated", "specific", "extended"), ignore, newcov, ... )
vlmc |
the covlmc representation. |
newdata |
an optional object that can be interpreted as a discrete time
series (for instance a |
initial |
specifies the likelihood function, more precisely the way the
first few observations for which contexts cannot be calculated are integrated
in the likelihood. Defaults to |
ignore |
specifies the number of initial values for which the loglikelihood will not be computed. The minimal number depends on the likelihood function as detailed below. |
newcov |
an optional data frame with the new values for the covariates. |
... |
additional parameters for loglikelihood. |
The definition of the likelihood function depends on the value of the
initial
parameters, see the section below as well as the dedicated
vignette: vignette("likelihood", package = "mixvlmc")
.
an object of class logLikMixVLMC
and logLik
. This is a number,
the log-likelihood of the (CO)VLMC with the following attributes:
df
: the number of parameters used by the VLMC for this likelihood calculation
nobs
: the number of observations included in this likelihood calculation
initial
: the value of the initial
parameter used to compute this likelihood
In a (CO)VLMC of depth()
=k, we need k past values in order to compute the
context of a given observation. As a consequence, in a time series x
, the
contexts of x[1]
to x[k]
are unknown. Depending on the value of initial
different likelihood functions are used to tackle this difficulty:
initial=="truncated"
: the likelihood is computed using only
x[(k+1):length(x)]
initial=="specific"
: the likelihood is computed on the full time series
using a specific context for the initial values, x[1]
to x[k]
. Each of
the specific context is unique, leading to a perfect likelihood of 1 (0 in
log scale). Thus the numerical value of the likelihood is identical as the
one obtained with initial=="truncated"
but it is computed on length(x)
with a model with more parameters than in this previous case.
initial=="extended"
(default): the likelihood is computed on the full time series
using an extended context matching for the initial values, x[1]
to x[k]
.
This can be seen as a compromised between the two other possibilities:
the relaxed context matching needs in general to turn internal nodes
of the context tree into actual context, increasing the number of parameters,
but not as much as with "specific". However, the likelihood of say x[1]
with an empty context is generally not 1 and thus the full likelihood is
smaller than the one computed with "specific".
In all cases, the ignore
first values of the time series are not included
in the computed likelihood, but still used to compute contexts. If ignore
is not specified, it is set to the minimal possible value, that is k for the
truncated
likelihood and 0 for the other ones. If it is specified, it must
be larger or equal to k for truncated
.
See the dedicated vignette for a more mathematically oriented discussion:
vignette("likelihood", package = "mixvlmc")
.
## Likelihood for a fitted VLMC with covariates. pc <- powerconsumption[powerconsumption$week == 5, ] breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE) ) labels <- c(0, 1) rdts <- cut(pc$active_power, breaks = breaks, labels = labels) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5) ll <- loglikelihood(m_cov) ll attr(ll, "nobs") ## Likelihood for new time series and covariates with previously ## fitted VLMC with covariates pc_new <- powerconsumption[powerconsumption$week == 11, ] rdts_new <- cut(pc_new$active_power, breaks = breaks, labels = labels) rdts_cov_new <- data.frame(day_night = (pc_new$hour >= 7 & pc_new$hour <= 17)) ll_new <- loglikelihood(m_cov, newdata = rdts_new, newcov = rdts_cov_new) ll_new attributes(ll_new)
## Likelihood for a fitted VLMC with covariates. pc <- powerconsumption[powerconsumption$week == 5, ] breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE) ) labels <- c(0, 1) rdts <- cut(pc$active_power, breaks = breaks, labels = labels) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5) ll <- loglikelihood(m_cov) ll attr(ll, "nobs") ## Likelihood for new time series and covariates with previously ## fitted VLMC with covariates pc_new <- powerconsumption[powerconsumption$week == 11, ] rdts_new <- cut(pc_new$active_power, breaks = breaks, labels = labels) rdts_cov_new <- data.frame(day_night = (pc_new$hour >= 7 & pc_new$hour <= 17)) ll_new <- loglikelihood(m_cov, newdata = rdts_new, newcov = rdts_cov_new) ll_new attributes(ll_new)
The function returns NULL
when the context represented by the node
parameter is not merged with another context (see is_merged()
). In the
other case, it returns a list of contexts with which this one is merged.
merged_with(node)
merged_with(node)
node |
A |
If the context is merged, the function returns a list with one value for each
element in the state space (see states()
). The value is NULL
if the
corresponding context is not merged with the node
context, while it is a
ctx_node_covlmc
object in the other case. A context merged with node
differs from the context represented by node
only in its last value (in
temporal order) which is used as its name in the list. For instance, if the
context ABC
is merged only with CBC
(when represented in temporal
ordering), then the resulting list is of the form list("A" = NULL, "B" = NULL, "C"= ctx_node_covlmc(CBX))
.
NULL or a list of contexts merged with node
represented by
ctx_node_covlmc
objects
pc_week_15_16 <- powerconsumption[powerconsumption$week %in% c(15, 16), ] elec <- pc_week_15_16$active_power elec_rdts <- cut(elec, breaks = c(0, 0.4, 2, 8), labels = c("low", "typical", "high")) elec_cov <- data.frame(day = (pc_week_15_16$hour >= 7 & pc_week_15_16$hour <= 18)) elec_tune <- tune_covlmc(elec_rdts, elec_cov, min_size = 5) elec_model <- prune(as_covlmc(elec_tune), alpha = 3.961e-10) ctxs <- contexts(elec_model) for (ctx in ctxs) { if (is_merged(ctx)) { print(ctx) cat("\nis merged with\n\n") print(merged_with(ctx)) } }
pc_week_15_16 <- powerconsumption[powerconsumption$week %in% c(15, 16), ] elec <- pc_week_15_16$active_power elec_rdts <- cut(elec, breaks = c(0, 0.4, 2, 8), labels = c("low", "typical", "high")) elec_cov <- data.frame(day = (pc_week_15_16$hour >= 7 & pc_week_15_16$hour <= 18)) elec_tune <- tune_covlmc(elec_rdts, elec_cov, min_size = 5) elec_model <- prune(as_covlmc(elec_tune), alpha = 3.961e-10) ctxs <- contexts(elec_model) for (ctx in ctxs) { if (is_merged(ctx)) { print(ctx) cat("\nis merged with\n\n") print(merged_with(ctx)) } }
This function computes and returns predictive quality metrics for context based models such as VLMC and VLMC with covariates.
metrics(model, ...)
metrics(model, ...)
model |
The context based model on which to compute predictive metrics. |
... |
Additional parameters for predictive metrics computation. |
A context based model computes transition probabilities for its contexts.
Using a maximum transition probability decision rule, this can be used to
predict the new state that is the more likely to follow the current one,
given the context (see predict.vlmc()
). The quality of these predictions is
evaluated using standard metrics including:
accuracy
the full confusion matrix
the area under the roc curve (AUC), considering the context based model as a (conditional) probability estimator. We use Hand and Till (2001) multiclass AUC in case of a state space with more than 2 states
The returned value is guaranteed to have at least three components
accuracy
: the accuracy of the predictions
conf_mat
: the confusion matrix of the predictions, with predicted values
in rows and true values in columns
auc
: the AUC of the predictive model
David J. Hand and Robert J. Till (2001). "A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems." Machine Learning 45(2), p. 171–186. DOI: doi:10.1023/A:1010920819831.
metrics.vlmc()
, metrics.ctx_node()
, contexts.vlmc()
, predict.vlmc()
.
pc <- powerconsumption[powerconsumption$week == 5, ] breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE) ) labels <- c(0, 1) rdts <- cut(pc$active_power, breaks = breaks, labels = labels) model <- vlmc(rdts) metrics(model)
pc <- powerconsumption[powerconsumption$week == 5, ] breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE) ) labels <- c(0, 1) rdts <- cut(pc$active_power, breaks = breaks, labels = labels) model <- vlmc(rdts) metrics(model)
This function computes and returns predictive quality metrics for context based models such as VLMC and VLMC with covariates.
## S3 method for class 'covlmc' metrics(model, ...) ## S3 method for class 'metrics.covlmc' print(x, ...)
## S3 method for class 'covlmc' metrics(model, ...) ## S3 method for class 'metrics.covlmc' print(x, ...)
model |
The context based model on which to compute predictive metrics. |
... |
Additional parameters for predictive metrics computation. |
x |
A metrics.covlmc object, results of a call to |
A context based model computes transition probabilities for its contexts.
Using a maximum transition probability decision rule, this can be used to
predict the new state that is the more likely to follow the current one,
given the context (see predict.vlmc()
). The quality of these predictions is
evaluated using standard metrics including:
accuracy
the full confusion matrix
the area under the roc curve (AUC), considering the context based model as a (conditional) probability estimator. We use Hand and Till (2001) multiclass AUC in case of a state space with more than 2 states
An object of class metrics.covlmc
with the following components:
accuracy
: the accuracy of the predictions
conf_mat
: the confusion matrix of the predictions, with predicted values
in rows and true values in columns
auc
: the AUC of the predictive model
The object has a print method that recalls basic information about the model together with the values of the components above.
print(metrics.covlmc)
: Prints the predictive metrics of the VLMC model with covariates.
As explained in details in loglikelihood.covlmc()
documentation and in
the dedicated vignette("likelihood", package = "mixvlmc")
, the first
initial values of a time series do not in general have a proper context for
a COVLMC with a non zero order. In order to predict something meaningful
for those values, we rely on the notion of extended context defined in the
documents mentioned above. This follows the same logic as using
loglikelihood.covlmc()
with the parameter initial="extended"
. All
covlmc functions that need to manipulate initial values with no proper
context use the same approach.
David J. Hand and Robert J. Till (2001). "A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems." Machine Learning 45(2), p. 171–186. DOI: doi:10.1023/A:1010920819831.
metrics.vlmc()
, metrics.ctx_node()
, contexts.vlmc()
, predict.vlmc()
.
pc <- powerconsumption[powerconsumption$week == 5, ] breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE) ) labels <- c(0, 1) rdts <- cut(pc$active_power, breaks = breaks, labels = labels) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5) metrics(m_cov)
pc <- powerconsumption[powerconsumption$week == 5, ] breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE) ) labels <- c(0, 1) rdts <- cut(pc$active_power, breaks = breaks, labels = labels) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5) metrics(m_cov)
This function computes and returns predictive quality metrics for a node
(ctx_node
) extracted from a context tree.
## S3 method for class 'ctx_node' metrics(model, ...)
## S3 method for class 'ctx_node' metrics(model, ...)
model |
T |
... |
Additional parameters for predictive metrics computation. |
Compared to metrics.vlmc()
, this function focuses on a single context and
assesses the quality of its predictions, disregarding observations that have
other contexts. Apart from this limited scope, the function operates as
metrics.vlmc()
.
The returned value is guaranteed to have at least three components
accuracy
: the accuracy of the predictions
conf_mat
: the confusion matrix of the predictions, with predicted values
in rows and true values in columns
auc
: the AUC of the predictive model
David J. Hand and Robert J. Till (2001). "A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems." Machine Learning 45(2), p. 171–186. DOI: doi:10.1023/A:1010920819831.
metrics.vlmc()
, metrics.ctx_node()
, contexts.vlmc()
, predict.vlmc()
.
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) model <- vlmc(rdts) model_ctxs <- contexts(model) metrics(model_ctxs[[4]])
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) model <- vlmc(rdts) model_ctxs <- contexts(model) metrics(model_ctxs[[4]])
This function computes and returns predictive quality metrics for a node
(ctx_node_covlmc
) extracted from a covlmc
## S3 method for class 'ctx_node_covlmc' metrics(model, ...)
## S3 method for class 'ctx_node_covlmc' metrics(model, ...)
model |
A |
... |
Additional parameters for predictive metrics computation. |
Compared to metrics.covlmc()
, this function focuses on a single context and
assesses the quality of its predictions, disregarding observations that have
other contexts. Apart from this limited scope, the function operates as
metrics.covlmc()
.
an object of class metrics.covlmc
with the following components:
accuracy
: the accuracy of the predictions
conf_mat
: the confusion matrix of the predictions, with predicted values
in rows and true values in columns
auc
: the AUC of the predictive model
David J. Hand and Robert J. Till (2001). "A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems." Machine Learning 45(2), p. 171–186. DOI: doi:10.1023/A:1010920819831.
metrics.vlmc()
, metrics.ctx_node()
, contexts.vlmc()
, predict.vlmc()
.
pc <- powerconsumption[powerconsumption$week == 5, ] breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE) ) labels <- c(0, 1) rdts <- cut(pc$active_power, breaks = breaks, labels = labels) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5) m_ctxs <- contexts(m_cov) ## get the predictive metrics for each context lapply(m_ctxs, metrics)
pc <- powerconsumption[powerconsumption$week == 5, ] breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE) ) labels <- c(0, 1) rdts <- cut(pc$active_power, breaks = breaks, labels = labels) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5) m_ctxs <- contexts(m_cov) ## get the predictive metrics for each context lapply(m_ctxs, metrics)
This function computes and returns predictive quality metrics for context based models such as VLMC and VLMC with covariates.
## S3 method for class 'vlmc' metrics(model, ...) ## S3 method for class 'metrics.vlmc' print(x, ...)
## S3 method for class 'vlmc' metrics(model, ...) ## S3 method for class 'metrics.vlmc' print(x, ...)
model |
The context based model on which to compute predictive metrics. |
... |
Additional parameters for predictive metrics computation. |
x |
A metrics.vlmc object, results of a call to |
A context based model computes transition probabilities for its contexts.
Using a maximum transition probability decision rule, this can be used to
predict the new state that is the more likely to follow the current one,
given the context (see predict.vlmc()
). The quality of these predictions is
evaluated using standard metrics including:
accuracy
the full confusion matrix
the area under the roc curve (AUC), considering the context based model as a (conditional) probability estimator. We use Hand and Till (2001) multiclass AUC in case of a state space with more than 2 states
An object of class metrics.vlmc
with the following components:
accuracy
: the accuracy of the predictions
conf_mat
: the confusion matrix of the predictions, with predicted values
in rows and true values in columns
auc
: the AUC of the predictive model
The object has a print method that recalls basic information about the model together with the values of the components above.
print(metrics.vlmc)
: Prints the predictive metrics of the VLMC model.
As explained in details in loglikelihood.vlmc()
documentation and in the
dedicated vignette("likelihood", package = "mixvlmc")
, the first initial
values of a time series do not in general have a proper context for a VLMC
with a non zero order. In order to predict something meaningful for those
values, we rely on the notion of extended context defined in the documents
mentioned above. This follows the same logic as using
loglikelihood.vlmc()
with the parameter initial="extended"
. All vlmc
functions that need to manipulate initial values with no proper context use
the same approach.
David J. Hand and Robert J. Till (2001). "A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems." Machine Learning 45(2), p. 171–186. DOI: doi:10.1023/A:1010920819831.
metrics.vlmc()
, metrics.ctx_node()
, contexts.vlmc()
, predict.vlmc()
.
pc <- powerconsumption[powerconsumption$week == 5, ] breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE) ) labels <- c(0, 1) rdts <- cut(pc$active_power, breaks = breaks, labels = labels) model <- vlmc(rdts) metrics(model)
pc <- powerconsumption[powerconsumption$week == 5, ] breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE) ) labels <- c(0, 1) rdts <- cut(pc$active_power, breaks = breaks, labels = labels) model <- vlmc(rdts) metrics(model)
This function returns a representation of the logistic model associated to a COVLMC context from its node in the associated context tree.
model(node, type = c("coef", "full"))
model(node, type = c("coef", "full"))
node |
A |
type |
specifies the model information to return, either the
coefficients only ( |
Full model extraction is only possible if the COVLMC model what not fully
trimmed (see trim.covlmc()
). Notice that find_sequence.covlmc()
can
produce node that are not context: in this case this function return NULL
.
if node
is a context, the coefficients of the logistic model (as a
vector or a matrix depending on the size of the state space) or a logistic
model as a R object. If node
is not a context, NULL
.
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 10) vals <- states(m_cov) node <- find_sequence(m_cov, c(vals[1], vals[1])) node model(node) model(node, type = "full")
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 10) vals <- states(m_cov) node <- find_sequence(m_cov, c(vals[1], vals[1])) node model(node) model(node, type = "full")
This function returns the parent node of the node represented by the
node
parameter. The result is NULL
if node
is the root node of
its context tree (representing the empty sequence).
parent(node) ## S3 method for class 'ctx_node' parent(node) ## S3 method for class 'ctx_node_cpp' parent(node)
parent(node) ## S3 method for class 'ctx_node' parent(node) ## S3 method for class 'ctx_node_cpp' parent(node)
node |
a |
Each node of a context tree represents a sequence. When find_sequence()
is
called with success, the returned object represents the corresponding node in
the context tree. Unless the original sequence is empty, this node has a
parent node which is returned as a ctx_node
object by the present function.
Another interpretation is that the function returns the node
object
associated to the sequence obtained by removing the oldest value from the
original sequence.
a ctx_node
object if node
does correspond to the empty
sequence or NULL
when this is not the case
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3) ctx_00 <- find_sequence(rdts_ctree, c(0, 0)) ## the parent sequence/node corresponds to the 0 context parent(ctx_00) identical(parent(ctx_00), find_sequence(rdts_ctree, c(0))) ## C++ backend rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3, backend = "C++") ctx_00 <- find_sequence(rdts_ctree, c(0, 0)) ## the parent sequence/node corresponds to the 0 context parent(ctx_00) identical(parent(ctx_00), find_sequence(rdts_ctree, c(0)))
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3) ctx_00 <- find_sequence(rdts_ctree, c(0, 0)) ## the parent sequence/node corresponds to the 0 context parent(ctx_00) identical(parent(ctx_00), find_sequence(rdts_ctree, c(0))) ## C++ backend rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 3, backend = "C++") ctx_00 <- find_sequence(rdts_ctree, c(0, 0)) ## the parent sequence/node corresponds to the 0 context parent(ctx_00) identical(parent(ctx_00), find_sequence(rdts_ctree, c(0)))
This function plots the results of tune_vlmc()
or tune_covlmc()
.
## S3 method for class 'tune_vlmc' plot( x, value = c("criterion", "likelihood"), cutoff = c("quantile", "native"), ... ) ## S3 method for class 'tune_covlmc' plot( x, value = c("criterion", "likelihood"), cutoff = c("quantile", "native"), ... )
## S3 method for class 'tune_vlmc' plot( x, value = c("criterion", "likelihood"), cutoff = c("quantile", "native"), ... ) ## S3 method for class 'tune_covlmc' plot( x, value = c("criterion", "likelihood"), cutoff = c("quantile", "native"), ... )
x |
a |
value |
the criterion to plot (default "criterion"). |
cutoff |
the scale used for the cut off criterion (default "quantile") |
... |
additional parameters passed to |
The standard plot consists in showing the evolution of the criterion
used to select the model (AIC()
or BIC()
) as a function of the
cut off criterion expressed in the quantile scale (the quantile is used
by default to offer a common default behaviour between vlmc()
and
covlmc()
). Parameters can be used to display instead the loglikelihood()
of the model (by setting value="likelihood"
) and to use the native
scale for the cut off when available (by setting cutoff="native"
).
the tune_vlmc
object invisibly
The function sets several default before calling base::plot()
, namely:
type
: "l" by default to use a line representation;
xlab
: "Cut off (quantile scale)" by default, adapted to the actual
scale;
ylab
: the name of the criterion or "Log likelihood".
These parameters can be overridden by specifying other values when calling
the function. All parameters specified in addition to x
, value
and
cutoff
are passed to base::plot()
.
rdts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE) tune_result <- tune_vlmc(rdts) ## default plot plot(tune_result) ## likelihood plot(tune_result, value = "likelihood") ## parameters overriding plot(tune_result, value = "likelihood", xlab = "Cut off", type = "b" ) pc <- powerconsumption[powerconsumption$week %in% 10:12, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) rdts_best_model_tune <- tune_covlmc(rdts, rdts_cov, criterion = "AIC") plot(rdts_best_model_tune) plot(rdts_best_model_tune, value = "likelihood")
rdts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE) tune_result <- tune_vlmc(rdts) ## default plot plot(tune_result) ## likelihood plot(tune_result, value = "likelihood") ## parameters overriding plot(tune_result, value = "likelihood", xlab = "Cut off", type = "b" ) pc <- powerconsumption[powerconsumption$week %in% 10:12, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) rdts_best_model_tune <- tune_covlmc(rdts, rdts_cov, criterion = "AIC") plot(rdts_best_model_tune) plot(rdts_best_model_tune, value = "likelihood")
This function returns the positions of the sequence represented by node
in the time series used to build the context tree in which the sequence is
represented. This is only possible is those positions were saved during the
construction of the context tree. In positions were not saved, a call to this
function produces an error.
positions(node) ## S3 method for class 'ctx_node' positions(node) ## S3 method for class 'ctx_node_cpp' positions(node)
positions(node) ## S3 method for class 'ctx_node' positions(node) ## S3 method for class 'ctx_node_cpp' positions(node)
node |
a |
A position of a sequence ctx
in the time series x
is an index value t
such that the sequence ends with x[t]
. Thus x[t+1]
is after the context.
For instance if x=c(0, 0, 1, 1)
and ctx=c(0, 1)
(in standard state
order), then the position of ctx
in x
is 3.
positions of the sequence represented by node
is the original
time series as a integer vector
rdts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE) rdts_tree <- ctx_tree(rdts, max_depth = 3, min_size = 5) subseq <- find_sequence(rdts_tree, factor(c("B", "A"), levels = c("A", "B", "C"))) if (!is.null(subseq)) { positions(subseq) }
rdts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE) rdts_tree <- ctx_tree(rdts, max_depth = 3, min_size = 5) subseq <- find_sequence(rdts_tree, factor(c("B", "A"), levels = c("A", "B", "C"))) if (!is.null(subseq)) { positions(subseq) }
A data set containing measurements of the electric power consumption of one household with a time resolution of 10 minutes for the full year of 2008.
powerconsumption
powerconsumption
A data frame with 52704 rows and 15 variables:
month of 2008
day of the month
hour (0 to 23)
starting minute of the 10 minutes period of this row
global average active power on the 10 minute period (in kilowatt)
global average reactive power on the 10 minute period (in kilowatt)
Average voltage on the 10 minute period (in volt)
global average current intensity on the 10 minute period (in ampere)
energy sub-metering No. 1 (in watt-hour of active energy averaged over the 10 minute period). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered)
energy sub-metering No. 2 (in watt-hour of active energy averaged over the 10 minute period). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light.
energy sub-metering No. 3 (in watt-hour of active energy averaged over the 10 minute period). It corresponds to an electric water-heater and an air-conditioner.
week number
day of the week from 1 = Sunday to 7 = Saturday
day of the year from 1 to 366 (2008 is a leap year)
Date and time in POSIXct format
This is a simplified version of the full data available on the UCI Machine Learning Repository under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, and provided by Georges Hebrail and Alice Berard.
The original data have been averaged over a 10 minute time period (discarding missing data in each period). The data set contains only the measurements from year 2008.
Notice that the different variables are expressed in the adapted units. In particular, the sub-meters are measuring active energy (in watt-hour) while the global active power is expressed in kilowatt.
Individual household electric power consumption, 2012, G. Hebrail and A. Berard, UC Irvine Machine Learning repository. doi:10.24432/C58K54
This function computes one step ahead predictions for a discrete time series based on a VLMC with covariates.
## S3 method for class 'covlmc' predict( object, newdata, newcov, type = c("raw", "probs"), final_pred = TRUE, ... )
## S3 method for class 'covlmc' predict( object, newdata, newcov, type = c("raw", "probs"), final_pred = TRUE, ... )
object |
a fitted covlmc object. |
newdata |
a time series adapted to the covlmc object. |
newcov |
a data frame with the new values for the covariates. |
type |
character indicating the type of prediction required. The default
|
final_pred |
if |
... |
additional arguments. |
Given a time series X
, at time step t
, a context is computed using
observations from X[1]
to X[t-1]
(see the dedicated section). The
prediction is then the most probable state for X[t]
given this logistic
model of the context and the corresponding values of the covariates. The time
series of predictions is returned by the function when type="raw"
(default
case).
When type="probs"
, the function returns of the probabilities of each state
for X[t]
as estimated by the logistic models. Those probabilities are
returned as a matrix of probabilities with column names given by the state
names.
A vector of predictions if type="raw"
or a matrix of state
probabilities if type="probs"
.
As explained in details in loglikelihood.covlmc()
documentation and in
the dedicated vignette("likelihood", package = "mixvlmc")
, the first
initial values of a time series do not in general have a proper context for
a COVLMC with a non zero order. In order to predict something meaningful
for those values, we rely on the notion of extended context defined in the
documents mentioned above. This follows the same logic as using
loglikelihood.covlmc()
with the parameter initial="extended"
. All
covlmc functions that need to manipulate initial values with no proper
context use the same approach.
pc <- powerconsumption[powerconsumption$week == 10, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.2, 0.7, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5, alpha = 0.5) rdts_probs <- predict(m_cov, rdts[1:144], rdts_cov[1:144, , drop = FALSE], type = "probs") rdts_preds <- predict(m_cov, rdts[1:144], rdts_cov[1:144, , drop = FALSE], type = "raw", final_pred = FALSE )
pc <- powerconsumption[powerconsumption$week == 10, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.2, 0.7, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5, alpha = 0.5) rdts_probs <- predict(m_cov, rdts[1:144], rdts_cov[1:144, , drop = FALSE], type = "probs") rdts_preds <- predict(m_cov, rdts[1:144], rdts_cov[1:144, , drop = FALSE], type = "raw", final_pred = FALSE )
This function computes one step ahead predictions for a discrete time series based on a VLMC.
## S3 method for class 'vlmc' predict(object, newdata, type = c("raw", "probs"), final_pred = TRUE, ...) ## S3 method for class 'vlmc_cpp' predict(object, newdata, type = c("raw", "probs"), final_pred = TRUE, ...)
## S3 method for class 'vlmc' predict(object, newdata, type = c("raw", "probs"), final_pred = TRUE, ...) ## S3 method for class 'vlmc_cpp' predict(object, newdata, type = c("raw", "probs"), final_pred = TRUE, ...)
object |
a fitted vlmc object. |
newdata |
a time series adapted to the vlmc object. |
type |
character indicating the type of prediction required. The default
|
final_pred |
if |
... |
additional arguments. |
Given a time series X
, at time step t
, a context is computed using
observations from X[1]
to X[t-1]
(see the dedicated section). The
prediction is then the most probable state for X[t]
given this contexts.
Ties are broken according to the natural order in the state space, favouring
"small" values. The time series of predictions is returned by the function
when type="raw"
(default case).
When type="probs"
, each X[t]
is associated to the conditional
probabilities of the next state given the context. Those probabilities are
returned as a matrix of probabilities with column names given by the state
names.
A vector of predictions if type="raw"
or a matrix of state
probabilities if type="probs"
.
As explained in details in loglikelihood.vlmc()
documentation and in the
dedicated vignette("likelihood", package = "mixvlmc")
, the first initial
values of a time series do not in general have a proper context for a VLMC
with a non zero order. In order to predict something meaningful for those
values, we rely on the notion of extended context defined in the documents
mentioned above. This follows the same logic as using
loglikelihood.vlmc()
with the parameter initial="extended"
. All vlmc
functions that need to manipulate initial values with no proper context use
the same approach.
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) model <- vlmc(rdts, min_size = 5) predict(model, rdts[1:5]) predict(model, rdts[1:5], "probs") ## C++ backend pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) model <- vlmc(rdts, min_size = 5, backend = "C++") predict(model, rdts[1:5]) predict(model, rdts[1:5], "probs")
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) model <- vlmc(rdts, min_size = 5) predict(model, rdts[1:5]) predict(model, rdts[1:5], "probs") ## C++ backend pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) model <- vlmc(rdts, min_size = 5, backend = "C++") predict(model, rdts[1:5]) predict(model, rdts[1:5], "probs")
This function prints a list of contexts i.e. a contexts
object listing
ctx_node
objects.
## S3 method for class 'contexts' print(x, reverse = TRUE, ...)
## S3 method for class 'contexts' print(x, reverse = TRUE, ...)
x |
the |
reverse |
specifies whether the contexts should be reported in
temporal order ( |
... |
additional arguments for the print function. |
the x
object, invisibly
rdts <- c("A", "B", "C", "A", "A", "B", "B", "C", "C", "A") rdts_tree <- ctx_tree(rdts, max_depth = 3) print(contexts(rdts_tree))
rdts <- c("A", "B", "C", "A", "A", "B", "B", "C", "C", "A") rdts_tree <- ctx_tree(rdts, max_depth = 3) print(contexts(rdts_tree))
This function prints a discrete time series.
## S3 method for class 'dts' print(x, n = 5, ...)
## S3 method for class 'dts' print(x, n = 5, ...)
x |
the |
n |
the number of time steps of time series to print (defaults to 5) |
... |
additional arguments for the print function. |
the x
object, invisibly
x_dts <- dts(sample(c("A", "B"), 20, replace = TRUE)) print(x_dts, n = 10)
x_dts <- dts(sample(c("A", "B"), 20, replace = TRUE)) print(x_dts, n = 10)
This function prunes a VLMC.
prune(vlmc, alpha = 0.05, cutoff = NULL, ...) ## S3 method for class 'vlmc' prune(vlmc, alpha = 0.05, cutoff = NULL, ...) ## S3 method for class 'vlmc_cpp' prune(vlmc, alpha = 0.05, cutoff = NULL, ...)
prune(vlmc, alpha = 0.05, cutoff = NULL, ...) ## S3 method for class 'vlmc' prune(vlmc, alpha = 0.05, cutoff = NULL, ...) ## S3 method for class 'vlmc_cpp' prune(vlmc, alpha = 0.05, cutoff = NULL, ...)
vlmc |
a fitted VLMC model. |
alpha |
number in (0,1] (default: 0.05) cut off value in quantile scale for pruning. |
cutoff |
positive number: cut off value in native (log likelihood ratio)
scale for pruning. Defaults to the value obtained from |
... |
additional arguments for the prune function. |
In general, pruning a VLMC is more efficient than constructing two VLMC (the
base one and pruned one). Up to numerical instabilities, building a VLMC with
a a
cut off and then pruning it with a b
cut off (with a>b
) should
produce the same VLMC than building directly the VLMC with a b
cut off.
Interesting cut off values can be extracted from a VLMC using the cutoff()
function.
As automated model selection is provided by tune_vlmc()
, the direct use of cutoff
should be reserved to advanced exploration of the set of trees that can be
obtained from a complex one, e.g. to implement model selection techniques that
are not provided by tune_vlmc()
.
a pruned VLMC
cutoff()
and tune_vlmc()
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) base_model <- vlmc(rdts, alpha = 0.1) model_cuts <- cutoff(base_model) pruned_model <- prune(base_model, model_cuts[3]) draw(pruned_model) direct_simple <- vlmc(rdts, alpha = model_cuts[3]) draw(direct_simple) # pruned_model and direct_simple should be identical all.equal(pruned_model, direct_simple)
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) base_model <- vlmc(rdts, alpha = 0.1) model_cuts <- cutoff(base_model) pruned_model <- prune(base_model, model_cuts[3]) draw(pruned_model) direct_simple <- vlmc(rdts, alpha = model_cuts[3]) draw(direct_simple) # pruned_model and direct_simple should be identical all.equal(pruned_model, direct_simple)
This function prunes a vlmc with covariates. This model must have been
estimated with keep_data=TRUE
to enable the pruning.
## S3 method for class 'covlmc' prune(vlmc, alpha = 0.05, cutoff = NULL, ...)
## S3 method for class 'covlmc' prune(vlmc, alpha = 0.05, cutoff = NULL, ...)
vlmc |
a fitted VLMC model with covariates. |
alpha |
number in (0,1) (default: 0.05) cutoff value in quantile scale for pruning. |
cutoff |
not supported by the vlmc with covariates. |
... |
additional arguments for the prune function. |
Post pruning a VLMC with covariates is not as straightforward as the same
procedure applied to vlmc()
(see cutoff.vlmc()
and prune.vlmc()
). For
efficiency reasons, covlmc()
estimates only the logistic models that are
considered useful for a given set construction parameters. With a more
aggressive pruning threshold, some contexts become leaves of the context tree
and new logistic models must be estimated. Thus the pruning opportunities
given by cutoff.covlmc()
are only a subset of interesting cut offs for a
given covlmc.
Nevertheless, covlmc
share with vlmc()
the principle that post pruning a
covlmc should give the same model as buidling directly the covlmc, provided
that the post pruning alpha is smaller than the alpha used to build the
initial model.
a pruned covlmc.
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5, keep_data = TRUE) draw(m_cov) m_cov_cuts <- cutoff(m_cov) p_cov <- prune(m_cov, m_cov_cuts[1]) draw(p_cov)
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5, keep_data = TRUE) draw(m_cov) m_cov_cuts <- cutoff(m_cov) p_cov <- prune(m_cov, m_cov_cuts[1]) draw(p_cov)
This function reverses the order in which the sequence represented by the
ctx_node
parameter will be reported in other functions, mainly
as_sequence()
.
## S3 method for class 'ctx_node' rev(x)
## S3 method for class 'ctx_node' rev(x)
x |
a |
a ctx_node
using the opposite ordering convention as the parameter
of the function
rdts <- c("A", "B", "C", "A", "A", "B", "B", "C", "C", "A") rdts_tree <- ctx_tree(rdts, max_depth = 3) res <- find_sequence(rdts_tree, c("A", "B")) print(res) r_res <- rev(res) print(r_res) as_sequence(r_res)
rdts <- c("A", "B", "C", "A", "A", "B", "B", "C", "C", "A") rdts_tree <- ctx_tree(rdts, max_depth = 3) res <- find_sequence(rdts_tree, c("A", "B")) print(res) r_res <- rev(res) print(r_res) as_sequence(r_res)
This function simulates a time series from the distribution estimated by the given covlmc object.
## S3 method for class 'covlmc' simulate(object, nsim = 1, seed = NULL, covariate, init = NULL, ...)
## S3 method for class 'covlmc' simulate(object, nsim = 1, seed = NULL, covariate, init = NULL, ...)
object |
a fitted covlmc object. |
nsim |
length of the simulated time series (defaults to 1). |
seed |
an optional random seed (see the dedicated section). |
covariate |
values of the covariates. |
init |
an optional initial sequence for the time series given by an object that can be interpreted as a discrete time series. |
... |
additional arguments. |
A VLMC with covariates model needs covariates to compute its transition
probabilities. The covariates must be submitted as a data frame using the
covariate
argument. In addition, the time series can be initiated by a
fixed sequence specified via the init
parameter.
a simulated discrete time series of the same type as the one used to
build the covlmc with a seed
attribute (see the Random seed section). The
results has also the dts_simulated
class to hide the seed
attribute when using
print
or similar function.
As explained in details in loglikelihood.covlmc()
documentation and in
the dedicated vignette("likelihood", package = "mixvlmc")
, the first
initial values of a time series do not in general have a proper context for
a COVLMC with a non zero order. In order to simulate something meaningful
for those values, we rely on the notion of extended context defined in the
documents mentioned above. This follows the same logic as using
loglikelihood.covlmc()
with the parameter initial="extended"
. All
covlmc functions that need to manipulate initial values with no proper
context use the same approach.
This function reproduce the behaviour of stats::simulate()
. If seed
is
NULL
the function does not change the random generator state and returns
the value of .Random.seed as a seed
attribute in the return value. This
can be used to reproduce exactly the simulation results by setting
.Random.seed to this value. Notice that if the random seed has not be
initialised by R so far, the function issues a call to runif(1)
to
perform this initialisation (as is done in stats::simulate()
).
It seed
is an integer, it is used in a call to set.seed()
before the
simulation takes place. The integer is saved as a seed
attribute in the
return value. The integer seed is completed by an attribute kind
which
contains the value as.list([RNGkind()])
exactly as with
stats::simulate()
. The random generator state is reset to its original
value at the end of the call.
stats::simulate()
for details and examples on the random number generator setting
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5) # new week with day light from 6:00 to 18:00 new_cov <- data.frame(day_night = rep(c(rep(FALSE, 59), rep(TRUE, 121), rep(FALSE, 60)), times = 7)) new_rdts <- simulate(m_cov, nrow(new_cov), seed = 0, covariate = new_cov) new_rdts_2 <- simulate(m_cov, nrow(new_cov), seed = 0, covariate = new_cov, init = rdts[1:10])
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 5) # new week with day light from 6:00 to 18:00 new_cov <- data.frame(day_night = rep(c(rep(FALSE, 59), rep(TRUE, 121), rep(FALSE, 60)), times = 7)) new_rdts <- simulate(m_cov, nrow(new_cov), seed = 0, covariate = new_cov) new_rdts_2 <- simulate(m_cov, nrow(new_cov), seed = 0, covariate = new_cov, init = rdts[1:10])
This function simulates a time series from the distribution estimated by the given vlmc object.
## S3 method for class 'vlmc' simulate(object, nsim = 1L, seed = NULL, init = NULL, burnin = 0L, ...)
## S3 method for class 'vlmc' simulate(object, nsim = 1L, seed = NULL, init = NULL, burnin = 0L, ...)
object |
a fitted vlmc object. |
nsim |
length of the simulated time series (defaults to 1). |
seed |
an optional random seed (see the dedicated section). |
init |
an optional initial sequence for the time series given by an object that can be interpreted as a discrete time series. |
burnin |
number of initial observations to discard or |
... |
additional arguments. |
The time series can be initiated by a fixed sequence specified via the init
parameter.
a simulated discrete time series of the same type as the one used to
build the vlmc with a seed
attribute (see the Random seed section). The
results has also the dts_simulated
class to hide the seed
attribute when using
print
or similar function.
When using a VLMC for simulation purposes, we are generally interested in
the stationary distribution of the corresponding Markov chain. To reduce
the dependence of the samples from the initial values and get closer to
this stationary distribution (if it exists), it is recommended to discard
the first samples which are produced in a so-called "burn in" (or "warm
up") period. The burnin
parameter can be used to implement this approach.
The VLMC is used to produce a sample of size burnin + nsim
but the first
burnin
values are discarded. Notice that this burn in values can be
partially given by the init
parameter if it is specified.
If burnin
is set to "auto"
, the burnin
period is set to 64 * context_number(object)
, following the heuristic proposed in Mächler and
Bühlmann (2004).
This function reproduce the behaviour of stats::simulate()
. If seed
is
NULL
the function does not change the random generator state and returns
the value of .Random.seed as a seed
attribute in the return value. This
can be used to reproduce exactly the simulation results by setting
.Random.seed to this value. Notice that if the random seed has not be
initialised by R so far, the function issues a call to runif(1)
to
perform this initialisation (as is done in stats::simulate()
).
It seed
is an integer, it is used in a call to set.seed()
before the
simulation takes place. The integer is saved as a seed
attribute in the
return value. The integer seed is completed by an attribute kind
which
contains the value as.list([RNGkind()])
exactly as with
stats::simulate()
. The random generator state is reset to its original
value at the end of the call.
As explained in details in loglikelihood.vlmc()
documentation and in the
dedicated vignette("likelihood", package = "mixvlmc")
, the first initial
values of a time series do not in general have a proper context for a VLMC
with a non zero order. In order to simulate something meaningful for those
values when init
is not provided, we rely on the notion of extended
context defined in the documents mentioned above. This follows the same
logic as using loglikelihood.vlmc()
with the parameter
initial="extended"
. All vlmc functions that need to manipulate initial
values with no proper context use the same approach.
Mächler, M. and Bühlmann, P. (2004) "Variable Length Markov Chains: Methodology, Computing, and Software" Journal of Computational and Graphical Statistics, 13 (2), 435-455, doi:10.1198/1061860043524
stats::simulate()
for details and examples on the random number
generator setting
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) model <- vlmc(rdts, min_size = 5) new_rdts <- simulate(model, 500, seed = 0) new_rdts_2 <- simulate(model, 500, seed = 0, init = rdts[1:5]) new_rdts_3 <- simulate(model, 500, seed = 0, burnin = 500)
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) model <- vlmc(rdts, min_size = 5) new_rdts <- simulate(model, 500, seed = 0) new_rdts_2 <- simulate(model, 500, seed = 0, init = rdts[1:5]) new_rdts_3 <- simulate(model, 500, seed = 0, burnin = 500)
This function simulates a time series from the distribution estimated by the given vlmc object.
## S3 method for class 'vlmc_cpp' simulate( object, nsim = 1, seed = NULL, init = NULL, burnin = 0L, sample = c("fast", "slow", "R"), ... )
## S3 method for class 'vlmc_cpp' simulate( object, nsim = 1, seed = NULL, init = NULL, burnin = 0L, sample = c("fast", "slow", "R"), ... )
object |
a fitted vlmc object. |
nsim |
length of the simulated time series (defaults to 1). |
seed |
an optional random seed (see the dedicated section). |
init |
an optional initial sequence for the time series given by an object that can be interpreted as a discrete time series. |
burnin |
number of initial observations to discard or |
sample |
specifies which implementation of |
... |
additional arguments. |
The time series can be initiated by a fixed sequence specified via the init
parameter.
a simulated discrete time series of the same type as the one used to
build the vlmc with a seed
attribute (see the Random seed section). The
results has also the dts_simulated
class to hide the seed
attribute when using
print
or similar function.
The R backend for vlmc()
uses base::sample()
to generate samples for each
context. Internally, this function sorts the probabilities of each state in
decreasing probability order (among other things), which is not needed in our
case. The C++ backend can be used with three different implementations:
sample="fast"
uses a dedicated C++ implementation adapted to the data structures
used internally. In general, the simulated time series obtained with this
implementation will be different from the one generated with the R backend,
even using the same seed.
sample="slow"
uses another C++ implementation that mimics base::sample()
in
order to maximize the chance to provide identical simulation results regardless
of the backend (when using the same random seed). This process is not perfect
as we use the std::lib sort algorithm which is not guaranteed to give identical
results as the ones of R internal 'revsort'.
sample="R"
uses direct calls to base::sample()
. Results are guaranteed
to be identical between the two backends, but at the price of higher running
time.
When using a VLMC for simulation purposes, we are generally interested in
the stationary distribution of the corresponding Markov chain. To reduce
the dependence of the samples from the initial values and get closer to
this stationary distribution (if it exists), it is recommended to discard
the first samples which are produced in a so-called "burn in" (or "warm
up") period. The burnin
parameter can be used to implement this approach.
The VLMC is used to produce a sample of size burnin + nsim
but the first
burnin
values are discarded. Notice that this burn in values can be
partially given by the init
parameter if it is specified.
If burnin
is set to "auto"
, the burnin
period is set to 64 * context_number(object)
, following the heuristic proposed in Mächler and
Bühlmann (2004).
This function reproduce the behaviour of stats::simulate()
. If seed
is
NULL
the function does not change the random generator state and returns
the value of .Random.seed as a seed
attribute in the return value. This
can be used to reproduce exactly the simulation results by setting
.Random.seed to this value. Notice that if the random seed has not be
initialised by R so far, the function issues a call to runif(1)
to
perform this initialisation (as is done in stats::simulate()
).
It seed
is an integer, it is used in a call to set.seed()
before the
simulation takes place. The integer is saved as a seed
attribute in the
return value. The integer seed is completed by an attribute kind
which
contains the value as.list([RNGkind()])
exactly as with
stats::simulate()
. The random generator state is reset to its original
value at the end of the call.
As explained in details in loglikelihood.vlmc()
documentation and in the
dedicated vignette("likelihood", package = "mixvlmc")
, the first initial
values of a time series do not in general have a proper context for a VLMC
with a non zero order. In order to simulate something meaningful for those
values when init
is not provided, we rely on the notion of extended
context defined in the documents mentioned above. This follows the same
logic as using loglikelihood.vlmc()
with the parameter
initial="extended"
. All vlmc functions that need to manipulate initial
values with no proper context use the same approach.
Mächler, M. and Bühlmann, P. (2004) "Variable Length Markov Chains: Methodology, Computing, and Software" Journal of Computational and Graphical Statistics, 13 (2), 435-455, doi:10.1198/1061860043524
stats::simulate()
for details and examples on the random number
generator setting
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) model <- vlmc(rdts, min_size = 5) new_rdts <- simulate(model, 500, seed = 0) new_rdts_2 <- simulate(model, 500, seed = 0, init = rdts[1:5]) new_rdts_3 <- simulate(model, 500, seed = 0, burnin = 500)
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1) )) ) model <- vlmc(rdts, min_size = 5) new_rdts <- simulate(model, 500, seed = 0) new_rdts_2 <- simulate(model, 500, seed = 0, init = rdts[1:5]) new_rdts_3 <- simulate(model, 500, seed = 0, burnin = 500)
This function returns the state space of an object for which this is meaningful such as a discrete time series or a context tree.
states(x) ## S3 method for class 'ctx_tree' states(x) ## S3 method for class 'dts' states(x)
states(x) ## S3 method for class 'ctx_tree' states(x) ## S3 method for class 'dts' states(x)
x |
an object with a state space. |
the state space of the context tree.
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 2) ## should be c(0, 1) states(rdts_ctree) x_dts <- dts(sample(c("A", "B", "C"), 20, replace = TRUE)) ## should be c("A", "B", "C") states(x_dts)
rdts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0) rdts_ctree <- ctx_tree(rdts, min_size = 1, max_depth = 2) ## should be c(0, 1) states(rdts_ctree) x_dts <- dts(sample(c("A", "B", "C"), 20, replace = TRUE)) ## should be c("A", "B", "C") states(x_dts)
This function returns a trimmed context tree from which match positions have been removed.
trim(ct, ...)
trim(ct, ...)
ct |
a context tree. |
... |
additional arguments for the trim function. |
a trimmed context tree.
## context tree trimming rdts <- sample(as.factor(c("A", "B", "C")), 1000, replace = TRUE) rdts_tree <- ctx_tree(rdts, max_depth = 10, min_size = 5, keep_position = TRUE) print(object.size(rdts_tree)) rdts_tree <- trim(rdts_tree) print(object.size(rdts_tree))
## context tree trimming rdts <- sample(as.factor(c("A", "B", "C")), 1000, replace = TRUE) rdts_tree <- ctx_tree(rdts, max_depth = 10, min_size = 5, keep_position = TRUE) print(object.size(rdts_tree)) rdts_tree <- trim(rdts_tree) print(object.size(rdts_tree))
This function returns a trimmed COVLMC from which cached data have been removed.
## S3 method for class 'covlmc' trim(ct, keep_model = FALSE, ...)
## S3 method for class 'covlmc' trim(ct, keep_model = FALSE, ...)
ct |
a context tree. |
keep_model |
specifies whether to keep the internal models (or not) |
... |
additional arguments for the trim function. |
Called with keep_model
set to FALSE
(default case), the trimming is maximal and reduces
further usability of the model. In particular loglikelihood.covlmc()
cannot be used
for new data, contexts.covlmc()
do not support model extraction, and
simulate.covlmc()
, metrics.covlmc()
and prune.covlmc()
cannot be used at all.
Called with keep_model
set to TRUE
, the trimming process is less complete. In
particular internal models are simplified using butcher::butcher()
and some
additional minor reductions. This saves less memory but enables the use of
loglikelihood.covlmc()
for new data as
well as the use of simulate.covlmc()
.
a trimmed context tree.
pc <- powerconsumption[powerconsumption$week %in% 5:7, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 10, keep_data = TRUE) print(object.size(m_cov), units = "Mb") t_m_cov_model <- trim(m_cov, keep_model = TRUE) print(object.size(t_m_cov_model), units = "Mb") t_m_cov <- trim(m_cov) print(object.size(t_m_cov), units = "Mb")
pc <- powerconsumption[powerconsumption$week %in% 5:7, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) m_cov <- covlmc(rdts, rdts_cov, min_size = 10, keep_data = TRUE) print(object.size(m_cov), units = "Mb") t_m_cov_model <- trim(m_cov, keep_model = TRUE) print(object.size(t_m_cov_model), units = "Mb") t_m_cov <- trim(m_cov) print(object.size(t_m_cov), units = "Mb")
This function returns a trimmed context tree from which match positions have been removed.
## S3 method for class 'vlmc' trim(ct, ...)
## S3 method for class 'vlmc' trim(ct, ...)
ct |
a VLMC. |
... |
additional arguments for the trim function. |
a trimmed VLMC
## VLMC trimming is generally useless unless match positions were kept pc <- powerconsumption[powerconsumption$week %in% 5:6, ] rdts <- cut(pc$active_power, breaks = 4) model <- vlmc(rdts, keep_match = TRUE) print(object.size(model)) model <- trim(model) ## memory use should be reduced print(object.size(model)) nm_model <- vlmc(rdts) print(object.size(nm_model)) nm_model <- trim(nm_model) ## no effect when match positions are not kept print(object.size(nm_model))
## VLMC trimming is generally useless unless match positions were kept pc <- powerconsumption[powerconsumption$week %in% 5:6, ] rdts <- cut(pc$active_power, breaks = 4) model <- vlmc(rdts, keep_match = TRUE) print(object.size(model)) model <- trim(model) ## memory use should be reduced print(object.size(model)) nm_model <- vlmc(rdts) print(object.size(nm_model)) nm_model <- trim(nm_model) ## no effect when match positions are not kept print(object.size(nm_model))
This function returns a trimmed context tree from which match positions have been removed.
## S3 method for class 'vlmc_cpp' trim(ct, ...)
## S3 method for class 'vlmc_cpp' trim(ct, ...)
ct |
a VLMC. |
... |
additional arguments for the trim function. |
Trimming in the C++ backend is done directly in the Rcpp
managed memory and
cannot be detected at R level using e.g. utils::object.size()
.
a trimmed VLMC
## VLMC trimming is generally useless unless match positions were kept pc <- powerconsumption[powerconsumption$week %in% 5:6, ] rdts <- cut(pc$active_power, breaks = 4) model <- vlmc(rdts, backend = "C++", keep_match = TRUE) model <- trim(model)
## VLMC trimming is generally useless unless match positions were kept pc <- powerconsumption[powerconsumption$week %in% 5:6, ] rdts <- cut(pc$active_power, breaks = 4) model <- vlmc(rdts, backend = "C++", keep_match = TRUE) model <- trim(model)
This function fits a Variable Length Markov Chain with Covariates (coVLMC) to a discrete time series coupled with a time series of covariates by optimizing an information criterion (BIC or AIC).
tune_covlmc( x, covariate, criterion = c("BIC", "AIC"), initial = c("truncated", "specific", "extended"), alpha_init = NULL, min_size = 5, max_depth = 100, verbose = 0, save = c("best", "initial", "all"), trimming = c("full", "partial", "none"), best_trimming = c("none", "partial", "full") )
tune_covlmc( x, covariate, criterion = c("BIC", "AIC"), initial = c("truncated", "specific", "extended"), alpha_init = NULL, min_size = 5, max_depth = 100, verbose = 0, save = c("best", "initial", "all"), trimming = c("full", "partial", "none"), best_trimming = c("none", "partial", "full") )
x |
an object that can be interpreted as a discrete time series, such
as an integer vector or a |
covariate |
a data frame of covariates. |
criterion |
criterion used to select the best model. Either |
initial |
specifies the likelihood function, more precisely the way the
first few observations for which contexts cannot be calculated are
integrated in the likelihood. See |
alpha_init |
if non |
min_size |
integer >= 1 (default: 5). Tune the minimum number of
observations for a context in the growing phase of the context tree (see
|
max_depth |
integer >= 1 (default: 100). Longest context considered in growing phase of the initial context tree (see details). |
verbose |
integer >= 0 (default: 0). Verbosity level of the pruning process. |
save |
specify which BIC models are saved during the pruning process.
The default value |
trimming |
specify the type of trimming used when saving the intermediate models, see details. |
best_trimming |
specify the type of trimming used when saving the best model and the initial one (see details). |
This function automates the process of fitting a large coVLMC to a discrete
time series with covlmc()
and of pruning the tree (with cutoff()
and
prune()
) to get an optimal with respect to an information criterion. To
avoid missing long term dependencies, the function uses the max_depth
parameter as an initial guess but then relies on an automatic increase of the
value to make sure the initial context tree is only limited by the min_size
parameter. The initial value of the alpha
parameter of covlmc()
is also
set to a conservative value (0.5) to avoid prior simplification of the
context tree. This can be overridden by setting the alpha_init
parameter to
a more adapted value.
Once the initial coVLMC is obtained, the cutoff()
and prune()
functions
are used to build all the coVLMC models that could be generated using smaller
values of the alpha parameter. The best model is selected from this
collection, including the initial complex tree, as the one that minimizes the
chosen information criterion.
a list with the following components:
best_model
: the optimal COVLMC
criterion
: the criterion used to select the optimal VLMC
initial
: the likelihood function used to select the optimal VLMC
results
: a data frame with details about the pruning process
saved_models
: a list of intermediate COVLMCs if save="initial"
or
save="all"
. It contains an initial
component with the large coVLMC
obtained first and an all
component with a list of all the other coVLMC
obtained by pruning the initial one.
covlmc
objects tend to be large and saving all the models during the
search for the optimal model can lead to an unreasonable use of memory. To
avoid this problem, models are kept in trimmed form only using
trim.covlmc()
with keep_model=FALSE
. Both the initial model and the
best one are saved untrimmed. This default behaviour corresponds to
trimming="full"
. Setting trimming="partial"
asks the function to use
keep_model=TRUE
in trim.covlmc()
for intermediate models. Finally,
trimming="none"
turns off trimming, which is discouraged expected for
small data sets.
In parallel processing contexts (e.g. using foreach::%dopar%), the memory
occupation of the results can become very large as models tend to keep
environments attached to the formulas. In this situation, it is highly
recommended to trim all saved models, including the best one and the
initial one. This can be done via the best_trimming
parameter whose
possible values are identical to the ones of trimming
.
covlmc()
, cutoff()
and prune()
pc <- powerconsumption[powerconsumption$week %in% 6:7, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) rdts_best_model_tune <- tune_covlmc(rdts, rdts_cov) draw(as_covlmc(rdts_best_model_tune))
pc <- powerconsumption[powerconsumption$week %in% 6:7, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1)))) rdts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17)) rdts_best_model_tune <- tune_covlmc(rdts, rdts_cov) draw(as_covlmc(rdts_best_model_tune))
This function fits a Variable Length Markov Chain (VLMC) to a discrete time series by optimizing an information criterion (BIC or AIC).
tune_vlmc( x, criterion = c("BIC", "AIC"), initial = c("truncated", "specific", "extended"), alpha_init = NULL, cutoff_init = NULL, min_size = 2L, max_depth = 100L, backend = getOption("mixvlmc.backend", "R"), verbose = 0, save = c("best", "initial", "all") )
tune_vlmc( x, criterion = c("BIC", "AIC"), initial = c("truncated", "specific", "extended"), alpha_init = NULL, cutoff_init = NULL, min_size = 2L, max_depth = 100L, backend = getOption("mixvlmc.backend", "R"), verbose = 0, save = c("best", "initial", "all") )
x |
an object that can be interpreted as a discrete time series, such
as an integer vector or a |
criterion |
criterion used to select the best model. Either |
initial |
specifies the likelihood function, more precisely the way the
first few observations for which contexts cannot be calculated are
integrated in the likelihood. Default to |
alpha_init |
if non |
cutoff_init |
if non |
min_size |
integer >= 1 (default: 2). Minimum number of observations for a context in the growing phase of the initial context tree. |
max_depth |
integer >= 1 (default: 100). Longest context considered in growing phase of the initial context tree (see details). |
backend |
backend "R" or "C++" (default: as specified by the
"mixvlmc.backend" option). Specifies the implementation used to represent
the context tree and to built it. See |
verbose |
integer >= 0 (default: 0). Verbosity level of the pruning process. |
save |
specify which BIC models are saved during the pruning process.
The default value |
This function automates the process of fitting a large VLMC to a discrete
time series with vlmc()
and of pruning the tree (with cutoff()
and
prune()
) to get an optimal with respect to an information criterion. To
avoid missing long term dependencies, the function uses the max_depth
parameter as an initial guess but then relies on an automatic increase of the
value to make sure the initial context tree is only limited by the min_size
parameter. The initial value of the cutoff
parameter of vlmc()
is also
set to conservative values (depending on the criterion) to avoid prior
simplification of the context tree. This default value can be overridden
using the cutoff_init
or alpha_init
parameter.
Once the initial VLMC is obtained, the cutoff()
and prune()
functions are
used to build all the VLMC models that could be generated using larger values
of the initial cut off parameter. The best model is selected from this
collection, including the initial complex tree, as the one that minimizes the
chosen information criterion.
a list with the following components:
best_model
: the optimal VLMC
criterion
: the criterion used to select the optimal VLMC
initial
: the likelihood function used to select the optimal VLMC
results
: a data frame with details about the pruning process
saved_models
: a list of intermediate VLMCs if save="initial"
or
save="all"
. It contains an initial
component with the large VLMC
obtained first and an all
component with a list of all the other VLMC
obtained by pruning the initial one.
rdts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE) tune_result <- tune_vlmc(rdts) draw(tune_result$best_model)
rdts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE) tune_result <- tune_vlmc(rdts) draw(tune_result$best_model)
This function fits a Variable Length Markov Chain (VLMC) to a discrete time series.
vlmc( x, alpha = 0.05, cutoff = NULL, min_size = 2L, max_depth = 100L, prune = TRUE, keep_match = FALSE, backend = getOption("mixvlmc.backend", "R") )
vlmc( x, alpha = 0.05, cutoff = NULL, min_size = 2L, max_depth = 100L, prune = TRUE, keep_match = FALSE, backend = getOption("mixvlmc.backend", "R") )
x |
an object that can be interpreted as a discrete time series, such
as an integer vector or a |
alpha |
number in (0,1] (default: 0.05) cut off value in quantile scale in the pruning phase. |
cutoff |
non negative number: cut off value in native (likelihood ratio)
scale in the pruning phase. Defaults to the value obtained from |
min_size |
integer >= 1 (default: 2). Minimum number of observations for a context in the growing phase of the context tree. |
max_depth |
integer >= 1 (default: 100). Longest context considered in growing phase of the context tree. |
prune |
logical: specify whether the context tree should be pruned (default behaviour). |
keep_match |
logical: specify whether to keep the context matches (default to FALSE) |
backend |
"R" or "C++" (default: as specified by the "mixvlmc.backend" option). Specifies the implementation used to represent the context tree and to built it. See details. |
The VLMC is built using Bühlmann and Wyner's algorithm which consists in
fitting a context tree (see ctx_tree()
) to a time series and then pruning
it in such as way that the conditional distribution of the next state of the
time series given the context is significantly different from the
distribution given a truncated version of the context.
The construction of the context tree is controlled by min_size
and
max_depth
, exactly as in ctx_tree()
. Significativity is measured using a
likelihood ratio test (threshold can be specified in terms of the ratio
itself with cutoff
) or in quantile scale with alpha
.
Pruning can be postponed by setting prune=FALSE
. Using a combination of
cutoff()
and prune()
, the complexity of the VLMC can then be adjusted.
Any VLMC model can be pruned after construction, prune=FALSE
is a
convenience parameter to avoid setting alpha=1
(which essentially prevents
any pruning). Automated model selection is provided by tune_vlmc()
.
a fitted vlmc model.
Two back ends are available to compute context trees:
the "R" back end represents the tree in pure R data structures (nested lists) that be easily processed further in pure R (C++ helper functions are used to speed up the construction).
the "C++" back end represents the tree with C++ classes. This back end is
considered experimental. The tree is built with an optimised suffix tree
algorithm which speeds up the construction by at least a factor 10 in
standard settings. As the tree is kept outside of R direct reach, context
trees built with the C++ back end must be restored after a
saveRDS()
/readRDS()
sequence. This is done automatically by recomputing
completely the context tree.
Bühlmann, P. and Wyner, A. J. (1999), "Variable length Markov chains. Ann. Statist." 27 (2) 480-513 doi:10.1214/aos/1018031204
cutoff()
, prune()
and tune_vlmc()
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))) ) model <- vlmc(rdts) draw(model) depth(model) ## reduce the depth of the model shallow_model <- vlmc(rdts, max_depth = 3) draw(shallow_model, prob = FALSE) ## improve probability estimates robust_model <- vlmc(rdts, min_size = 25) draw(robust_model, prob = FALSE) ## show the frequencies draw(robust_model)
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))) ) model <- vlmc(rdts) draw(model) depth(model) ## reduce the depth of the model shallow_model <- vlmc(rdts, max_depth = 3) draw(shallow_model, prob = FALSE) ## improve probability estimates robust_model <- vlmc(rdts, min_size = 25) draw(robust_model, prob = FALSE) ## show the frequencies draw(robust_model)
This function fits a Variable Length Markov Chain (VLMC) to a discrete time series.
## Default S3 method: vlmc( x, alpha = 0.05, cutoff = NULL, min_size = 2L, max_depth = 100L, prune = TRUE, keep_match = FALSE, backend = getOption("mixvlmc.backend", "R") )
## Default S3 method: vlmc( x, alpha = 0.05, cutoff = NULL, min_size = 2L, max_depth = 100L, prune = TRUE, keep_match = FALSE, backend = getOption("mixvlmc.backend", "R") )
x |
a numeric, character, factor or logical vector |
alpha |
number in (0,1] (default: 0.05) cut off value in quantile scale in the pruning phase. |
cutoff |
non negative number: cut off value in native (likelihood ratio)
scale in the pruning phase. Defaults to the value obtained from |
min_size |
integer >= 1 (default: 2). Minimum number of observations for a context in the growing phase of the context tree. |
max_depth |
integer >= 1 (default: 100). Longest context considered in growing phase of the context tree. |
prune |
logical: specify whether the context tree should be pruned (default behaviour). |
keep_match |
logical: specify whether to keep the context matches (default to FALSE) |
backend |
"R" or "C++" (default: as specified by the "mixvlmc.backend" option). Specifies the implementation used to represent the context tree and to built it. See details. |
The VLMC is built using Bühlmann and Wyner's algorithm which consists in
fitting a context tree (see ctx_tree()
) to a time series and then pruning
it in such as way that the conditional distribution of the next state of the
time series given the context is significantly different from the
distribution given a truncated version of the context.
The construction of the context tree is controlled by min_size
and
max_depth
, exactly as in ctx_tree()
. Significativity is measured using a
likelihood ratio test (threshold can be specified in terms of the ratio
itself with cutoff
) or in quantile scale with alpha
.
Pruning can be postponed by setting prune=FALSE
. Using a combination of
cutoff()
and prune()
, the complexity of the VLMC can then be adjusted.
Any VLMC model can be pruned after construction, prune=FALSE
is a
convenience parameter to avoid setting alpha=1
(which essentially prevents
any pruning). Automated model selection is provided by tune_vlmc()
.
a fitted vlmc model.
Two back ends are available to compute context trees:
the "R" back end represents the tree in pure R data structures (nested lists) that be easily processed further in pure R (C++ helper functions are used to speed up the construction).
the "C++" back end represents the tree with C++ classes. This back end is
considered experimental. The tree is built with an optimised suffix tree
algorithm which speeds up the construction by at least a factor 10 in
standard settings. As the tree is kept outside of R direct reach, context
trees built with the C++ back end must be restored after a
saveRDS()
/readRDS()
sequence. This is done automatically by recomputing
completely the context tree.
Bühlmann, P. and Wyner, A. J. (1999), "Variable length Markov chains. Ann. Statist." 27 (2) 480-513 doi:10.1214/aos/1018031204
cutoff()
, prune()
and tune_vlmc()
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))) ) model <- vlmc(rdts) draw(model) depth(model) ## reduce the depth of the model shallow_model <- vlmc(rdts, max_depth = 3) draw(shallow_model, prob = FALSE) ## improve probability estimates robust_model <- vlmc(rdts, min_size = 25) draw(robust_model, prob = FALSE) ## show the frequencies draw(robust_model)
pc <- powerconsumption[powerconsumption$week == 5, ] rdts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))) ) model <- vlmc(rdts) draw(model) depth(model) ## reduce the depth of the model shallow_model <- vlmc(rdts, max_depth = 3) draw(shallow_model, prob = FALSE) ## improve probability estimates robust_model <- vlmc(rdts, min_size = 25) draw(robust_model, prob = FALSE) ## show the frequencies draw(robust_model)
This function fits a Variable Length Markov Chain (VLMC) to a discrete time series.
## S3 method for class 'dts' vlmc( x, alpha = 0.05, cutoff = NULL, min_size = 2L, max_depth = 100L, prune = TRUE, keep_match = FALSE, backend = getOption("mixvlmc.backend", "R") )
## S3 method for class 'dts' vlmc( x, alpha = 0.05, cutoff = NULL, min_size = 2L, max_depth = 100L, prune = TRUE, keep_match = FALSE, backend = getOption("mixvlmc.backend", "R") )
x |
a discrete time series represented by a |
alpha |
number in (0,1] (default: 0.05) cut off value in quantile scale in the pruning phase. |
cutoff |
non negative number: cut off value in native (likelihood ratio)
scale in the pruning phase. Defaults to the value obtained from |
min_size |
integer >= 1 (default: 2). Minimum number of observations for a context in the growing phase of the context tree. |
max_depth |
integer >= 1 (default: 100). Longest context considered in growing phase of the context tree. |
prune |
logical: specify whether the context tree should be pruned (default behaviour). |
keep_match |
logical: specify whether to keep the context matches (default to FALSE) |
backend |
"R" or "C++" (default: as specified by the "mixvlmc.backend" option). Specifies the implementation used to represent the context tree and to built it. See details. |
The VLMC is built using Bühlmann and Wyner's algorithm which consists in
fitting a context tree (see ctx_tree()
) to a time series and then pruning
it in such as way that the conditional distribution of the next state of the
time series given the context is significantly different from the
distribution given a truncated version of the context.
The construction of the context tree is controlled by min_size
and
max_depth
, exactly as in ctx_tree()
. Significativity is measured using a
likelihood ratio test (threshold can be specified in terms of the ratio
itself with cutoff
) or in quantile scale with alpha
.
Pruning can be postponed by setting prune=FALSE
. Using a combination of
cutoff()
and prune()
, the complexity of the VLMC can then be adjusted.
Any VLMC model can be pruned after construction, prune=FALSE
is a
convenience parameter to avoid setting alpha=1
(which essentially prevents
any pruning). Automated model selection is provided by tune_vlmc()
.
a fitted vlmc model.
Two back ends are available to compute context trees:
the "R" back end represents the tree in pure R data structures (nested lists) that be easily processed further in pure R (C++ helper functions are used to speed up the construction).
the "C++" back end represents the tree with C++ classes. This back end is
considered experimental. The tree is built with an optimised suffix tree
algorithm which speeds up the construction by at least a factor 10 in
standard settings. As the tree is kept outside of R direct reach, context
trees built with the C++ back end must be restored after a
saveRDS()
/readRDS()
sequence. This is done automatically by recomputing
completely the context tree.
Bühlmann, P. and Wyner, A. J. (1999), "Variable length Markov chains. Ann. Statist." 27 (2) 480-513 doi:10.1214/aos/1018031204
cutoff()
, prune()
and tune_vlmc()
pc <- powerconsumption[powerconsumption$week == 5, ] power_dts <- dts(cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))) )) model <- vlmc(power_dts) draw(model) depth(model) ## reduce the depth of the model shallow_model <- vlmc(power_dts, max_depth = 3) draw(shallow_model, prob = FALSE) ## improve probability estimates robust_model <- vlmc(power_dts, min_size = 25) draw(robust_model, prob = FALSE) ## show the frequencies draw(robust_model)
pc <- powerconsumption[powerconsumption$week == 5, ] power_dts <- dts(cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))) )) model <- vlmc(power_dts) draw(model) depth(model) ## reduce the depth of the model shallow_model <- vlmc(power_dts, max_depth = 3) draw(shallow_model, prob = FALSE) ## improve probability estimates robust_model <- vlmc(power_dts, min_size = 25) draw(robust_model, prob = FALSE) ## show the frequencies draw(robust_model)