Fuzzy Classifiers and Quantile Statistics for continuous data monitoring with adaptive thresholds

By Assad Ebrahim, on May 24th, 2020 (4,322 views) | Leave your comment!
Topic: Maths--Data Science, Maths--Technical, Software Engineering, SWEng--Toolbox, Technology

Spread the love

Abstract This brief note explores the use of fuzzy classifiers, with membership functions chosen using a statistical heuristic (quantile statistics), to monitor time-series metrics. The time series can arise from environmental measurements, industrial process control data, or sensor system outputs. We demonstrate implementation using the R language on an example dataset (ozone levels in New York City). Click here to skip straight to the coded solution), or read on for the discussion.

Fuzzy classification into 5 classes using p10 and p90 levels to achieve an 80-20 rule in the outermost classes and graded class membership in the inner three classes. Comparison with crisp classifier using the same 80-20 rule is shown in the bottom panel of the figure.

1. Case study: providing automatic classification of urban air quality

Time-series approach
Consider the problem of monitoring ozone levels in an urban environment and issuing a daily text message alert to residents indicating meaningful risk levels. What should these levels be, and how should the thresholds be set? If we imagine a scenario where clinical studies on human response do not yet exist on which to base the thresholds, an alternative approach is to provide a statistical classification of ozone levels relative to previous history, i.e. classed as VERY-LOW, LOW, VERY-HIGH, HIGH, and NORMAL. Let’s see how we might do this using an example dataset from a New York City study

Looking at the NYC ozone study
The NYC ozone dataset from the State Department of Conservation is built into R and measures mean ozone levels (alongside 5 other air quality variables) over 153 days in the summer of 1973 (May 1st to Sep 30th, 1300-1500h, Roosevelt Island station). Let’s load it up in R and inspect it.

### Load airquality dataset
library(datasets)       
airquality               
help(airquality)        # show dataset details

airquality$Ozone        # parts per billion

### Data visualization
plot(airquality$Ozone)
lines(airquality$Ozone)
title("Ozone level at Roosevelt Island, NYC\n mean ppb, 1300h-1500h, May 1-Sep 30, 1973") 
hist(airquality$Ozone)

1973 ozone study in NYC over 153 days in the summer. Exploratory plot in R using airquality$Ozone in library(datasets)

To summarize the observations, use boxplot() to show graphically its non-parametric statistics i.e. the 5-number summary at min (p0), p25, p50, p75, and max (p100) levels. quantile() shows the numerical values. Note that by convention the 5-number summary captures the middle 50% of the distribution (between p25 and p75).

boxplot(airquality$Ozone)
clean <- function(v) { v[!is.na(v)] }      # remove's NAs from a list  (ex: clean(d))
dd <- clean(airquality$Ozone)
quantile(dd)                               # 5-number summary at min, Q1, Q2, Q3, max levels

Boxplot shows 5-number summary: min/max are the whiskers, Quartile 1, Quartile 3 (25th and 75th percentiles) are the box bottom and top, and Quartile 2 (median) is the bar within the box.

2. Choosing a statistical classifier

Classifiers are heuristics matching input data to categorical labels, with the “goodness of fit” depending on an (often subjective) separate estimate of the correct answer.

In what follows, we use the non-parametric statistics heuristic with p-levels associated with quantiles as a way to capture a fixed % of the distribution. For time-series data that are, or can be assumed to be, essentially stationary, the selection of thresholds can be done once and adjusted dynamically; for highly varying, seasonal, or other non-stationary time-series, the analysis becomes more complicated, with the central ideas below requiring adaptation.

Let us take as a tenet the use of the 80-20 rule for flagging HIGH and LOW outliers, i.e. 1 in 5 observations should be abnormal, and the remaining 4 considered within usual levels of variation. This implies setting the quantiles at p10 and p90 levels. Notice, quantile() is overridden with the new probs vector containing the p10 and p90 levels, and the resulting distribution is plotted against these thresholding curves.

qq <- quantile(dd, probs=c(0, 0.1, 0.5, 0.9, 1)) # 5-number summary at custom levels
qq
plot(airquality$Ozone)
curve(0*x+qq[[2]],add=TRUE)    # add horizontal lines at the quartile levels
curve(0*x+qq[[4]],add=TRUE)

Using custom levels of p10 and p90 for the upper and lower classes leaves still a broad range of values in the middle class.

3. The difference between crisp (conventional) and fuzzy classifiers.

If this is all we do, then it remains only to formalize the classification around the pre-selected p10 and p90 thresholds. The is given below, and it should be no surprise that the results show 80% of the values classed as normal, with 10% in each of the two classes LOW and HIGH. The size of the buckets was pre-determined by the tenet of an 80-20 classification outcome. The only choice was what this meant in terms of the thresholds, and this was provided by the quantile() function at the respective p-levels.

cc3 <- c("LOW", "NORMAL", "HIGH")

# Crisp classifier
classify <- function(x) {
    if (x < qq[[2]]) cc3[1]
    else if (x > qq[[4]]) cc3[3]
    else cc3[2]
}
b <- sapply(dd, classify)
b
table(b)
table(b)/length(dd)

Non-fuzzy classifier using the p10 and p90 threshold levels.

But you should be protesting at the use of such a classifier. Within the “normal” bucket, containing 80% of the distribution, there is a wide range of values, which leads to a problem that bedevils all attempts at crisp classification into a manageable number of levels: (1) values at the top and bottom ends (for example 85 and 18 ppb) are both classed together as normal despite the fact that they are closer to points on the other side of the class boundaries (for example 89 and 12 ppb) that are classed as HIGH and LOW respectively.

Crisp classifiers get around the above limitations by using more complex mathematical models
The discussion so far has taken the simplest possible approach to classification, and the critique of the simple approach above is deserved. But of course, no one in statistical classification would use the threshold logic as described above without accommodating for distance between points. One approach would be to find the separating hyperplane that minimizes the average distance between pairs on either sides of that plane. Another approach, similar, is based on choosing centroids of clusters, again minimizing distance between pairs of points. Both of these optimization algorithms get around the above critique and give good results, but they are much more complex mathematically (see the classic reference [Tibshirani/2009] Elements of Statistical Learning, free PDF download courtesy of the authors & publishers).

4. The fuzzy classifier: making a naive classification approach work with low computational impact

The appeal with fuzzy classification is that it is able to preserve the simple approach and resolve discontinuity problem in a computationally efficient way using simple min-max in a point calculation, rather than requiring to solve a combinatorial optimization problem ranging over all pairs of points.

The key to fuzzy classification is a definition of set membership that includes degree of membership as a mechanism to allow “fuzzy boundaries”. A single element now is a member of all classes but to different degrees. The final classification has a strength, which matches more closely our intuition that classifications are inherently fuzzy, resisting crisp boundaries.

Let’s look at the how this is implemented.

# Defining linear membership functions L(x), M(x), H(x) (low, medium, high) based on p10 and p90 quantile levels
x0   <- qq[[2]]
x1   <- qq[[4]]
Ly0  <- 1.0
Ly1  <- 0.0
Hy0  <- 0.0
Hy1  <- 1.0
xmin <- qq[[1]]
xmed <- qq[[3]]
xmax <- qq[5]

L <- function(x) {                                       # Low
         if (x < x0) Ly0
    else if (x > x1) Ly1
    else             Ly0 + (x-x0)*(Ly1-Ly0)/(x1-x0)
}

H <- function(x) {                                       # High 
         if (x < x0) Hy0
    else if (x > x1) Hy1
    else             Hy0 + (x-x0)*(Hy1-Hy0)/(x1-x0)
}

M <- function(x) {                                      # Medium
         if (x < xmed) Hy0 + (x-xmin)*(Hy1-Hy0)/(xmed-xmin)
    else               Hy0 + (x-xmax)*(Ly1-Ly0)/(xmax-xmed)
}

# show fuzzy membership functions
x <- 1:max(dd)
plot(c(1,max(dd)),c(0,1))
lines(x,sapply(x,L))
lines(x,sapply(x,H))
lines(x,sapply(x,M))

Fuzzy membership functions L(x) – Low, M(x) – Medium, H(x) – High, using the p10 and p90 thresholds.

Let’s talk through what is going on above. The fuzzy classifier above uses membership functions L(x), M(x), H(x) (low, medium, high) to generate overlapping degrees of membership for each class. The membership functions are defined over the same domain, allowing classifications that show graded membership.

What happens next is the key bit: the classification of a given point x is made by applying a fuzzification operation across the candidate fuzzy sets. In this case the fuzzification operator is the simple mathematical maximum, max(). Working an example: the value 70 is classified into the “LOW” bucket based on L(70) returning the highest (max) value from the three choices L(70), M(70), H(70).

# example:
cc3 <- c("LOW", "NORMAL", "HIGH")
fv <- c(L(70),M(70),H(70))     # fuzzy vector
fv
max(fv)
which.max(fv)
cc3[which.max(fv)]

We use this notion to define the fuzzy classifier itself, and apply it to the dataset:

# Fuzzy classifier into 3 classes
cc3 <- c("LOW", "NORMAL", "HIGH")

fuzzy_classify3 <- function(x) {
    fv <- c(L(x), M(x), H(x))
    # print(fv)     #debug
    # print(max(fv))  #debug
    m3 <- which.max(fv)
    cc3[m3]
}

bb <- sapply(dd, fuzzy_classify3)
bb
table(bb)
table(bb)/length(dd)

Example showing fuzzy classification using max over membership functions.

Fuzzy classification of NYC ozone levels into 3 classes.

Notice that the medium cluster now occurs only 35% of the time (or 44% of the 80% previously classed as NORMAL), which is a slight problem. The fuzzy classifier into three clusters no longer preserves the 80-20 rule that led us to use p10 and p90 thresholds in the first place.

The fix is to use 5 clusters where the outermost two (VERY HIGH and VERY LOW) preserve the 80-20 rule based on the p90 and p10 clusters in line with the crisp classifier . Fuzziness is limited to the middle cluster now distinguished into three: medium-high, medium-low (now labelled HIGH and LOW), and the reduced MIDDLE class holding the 35% of instances.

cc5 <- c("VERY LOW", "LOW", "MEDIUM", "HIGH", "VERY HIGH")

fuzzy_classify5 <- function(x) {
    fv <- c(L(x), M(x), H(x))
    # print(fv)     #debug
    # print(max(fv))  #debug
    m3 <- which.max(fv)
    m5 <- if (m3==1 && max(fv)==1) 1 else if (m3==3 && max(fv)==1) 5 else (1+m3)
    cc5[m5]
}

bb <- sapply(dd, fuzzy_classify5)
bb
table(bb)
table(bb)/length(dd)

Fuzzy classification into 5 classes.

5. Evaluating the outcome

The fuzzy classifier described above provides a continuous transition between the classification strengths between classes that share a common boundary. This can be seen by observing that each membership functions are piecewise linear and therefore piecewise continuous hence the max hull is also piecewise continuous.

The fact that the classification changes themselves are abrupt was not the concern (this is true in any categorical labeling, including in the separating hyperplane solution). Rather it is the degree of membership separating two objects across class boundaries e.g. 25 (classed LOW) and 26 (classed MEDIUM). Using the fuzzy classifier, this inter-cluster membership gradient is arbitrarily small based on the distance between the points, in the above example it is less than 2%.

The difference in membership between classes is less than 2% at the class boundary.

This continuous degree of membership allows both categorical (rules-based) and proportional (level-based) responses to be used as part of an overall fuzzy systems control.

6. Using fuzzy classifiers – adaptive thresholds, unsupervised vs. supervised learning, and a general framework

In the application context of the daily text alert, fuzzy classification would provide the autonomous categorization each alert/alarm. The 80-20 tenet would be reflected in the fact that approximately 20% of alerts will be in the VERY HIGH or VERY LOW risk category.

Stepping back from the example context, the general framework for using a fuzzy classifier is as follows:

A. TRAIN the classifier on available history.
1. Where possible, CHOOSE simple thresholds. In the above, we used the 80-20 rule and therefore p10/p90 levels
2. DEFINE a fuzzy classifier on 5 classes – the outermost two classes preserve the 20% outliers, the inner three classes hold the 80% and provide finer grained classifications of these cases.

B. MONITOR for new inputs
3. CLASSIFY – classify incoming observations as done above using max() fuzzifier.
4. RESPOND – decide response with rules which could include alerts or mitigation actions or both

C. UPDATE thresholds dynamically
5. RECOMPUTE thresholds – add new observations to past history and reform the membership functions with each observation.

This approach offers:
1. a methodology for continuous monitoring (watchdog), including the use of sensors to instrument a process
2. a way to dynamically adapt the sensitivities, i.e. for the watchdog “to learn”
3. lowers barrier to entry — unsupervised monitoring can start using descriptive statistics adjusted for any application specific tenets. If specific training data are available, thresholds can be refined using known subsets.
4. computationally efficient classifier

One should, of course, compare a heuristic as above with the optimized classifier based on a more computationally intensive combinatorial optimization approach.

7. Possibilities

For the application programmer, the appeal of fuzzy classifiers lies in the opportunity to define a fuzzy control system while remaining in the categorical abstraction layer of e.g. if HIGH then CLOSE WINDOWS, etc. This keeps the quantitative basis of the underlying analog sensor (in this case the ozone levels) encapsulated. As a second example, a fuzzy temperature checking could we use a thermometer and knowledge of normal body temperatures to classify temperatures into VERY LOW, LOW, NORMAL, ELEVATED, HIGH FEVER levels. A user does not need to understand how the heat from our bodies warms mercury which rises (analog thermometer) or how a temperature sensor is digitally sampled. The layer driving the appropriate response is the fuzzy classification; the number itself is a second, supporting layer of information.

Sensors becomes interesting for training set purposes. If the sensors can sample data from an a-priori known phenomena or pre-classified events, then the statistics around these known samples can enhance the detection/classification. For analog sensors, this opens up myriad possibilities:

car sensors (brakes, auto gear changes)
light senors (outside conditions based on changing inside light — sunny, cloudy, rainy, as well as time of day based on light quality coming into a given south facing room)

Conclusion
The above statistical approach using fuzzy classification and adaptive thresholds is both simple, intuitive, of low computational cost, and can be effective when quick immediately workable solutions are needed, and better solutions can be added iteratively. The above method can be used with any single-variable data series that is essentially stationary. For non-stationary time series and multi-dimensional variables/classification, more extensive techniques are required.

Appendix 1: Source codes in R for the fuzzy classifier

The exploratory code mirroring the article above is here.

The final solution code is here, providing a library file and an example usage file.

Appendix 2: Using the fuzzy classifier

To use the fuzzy classifier, start by loading the fuzzy_classify.r library file:

source("fuzzy_classify.r")   # check it is visible in R's search path or current working directory getwd()

The main function is fuzzycluster(d) which runs the fuzzy classifer on dataset d and creates the fuzzy data frame (fdf) holding 9 results needed for analysis, visualization, etc.

Follow the example user code to see how to use the fuzzy classifier library on the ozone data set.

source("fuzzy_classify.r")     #### load FUNCTIONS for fuzzy classifier
library(datasets)       # load built-in datasets
d <- airquality$Ozone   # set time series
### Exploratory data visualization
plot(d)
lines(d)
title("Ozone level at Roosevelt Island, NY\n mean ppb, 1300h-1500h, May 1-Sep 30, 1973") 
hist(d)
boxplot(d)

### Run the fuzzy classifier
fdf <- fuzzycluster(d)  # form fuzzy data frame fdf with 9 columns

The key function above is fuzzycluster(d) defined in the library file here which runs the fuzzy classifer on dataset vector d and creates the fuzzy data frame (fdf) holding 9 results needed for analysis, visualization, etc.

d – data vector, should be numeric
dd – cleaned data, with NA removed
q – quantiles using p10, p90 instead of p25, p75
bb – fuzzy classifier results for each value in dd what is its classification
vhigh, high, med, low, vlow – sublist showing which values are in what buckets

### Review results
plotq(fdf$dd,fdf$q)             # show the cleaned up dataset with thresholds
plot_fmf3(fdf$dd)               # show fuzzy membership functions
fdf$bb                          # show fuzzy bucketization
table(fdf$bb)                   # summarize classification
table(fdf$bb)/length(fdf$dd)    # relative class frequencies
table(head(fdf$bb,50))/50       # last 50 days performance
summary(fdf$vhigh)              # summarize buckets
summary(fdf$high)
summary(fdf$med)
summary(fdf$low)
summary(fdf$vlow)

Appendix 3: References

Papers

[Zadeh/1965] Fuzzy Sets, Lotfi Zadeh, Information & Control, Vol 8, Issue 3, June 1965, Pages 338-353
[Zadeh/1972d], Outline of a New Approach to the Analysis of Complex Systems and Decision Processes by Lotfi Zadeh, 1972, Technical Report University of California Berkeley: ERL M342
[Zadeh, 1977] – Fuzzy Logic
Books
[Zadeh/2012] Computing with Words, £80
[Zadeh/Aliev/2019] Fuzzy Logic: Theory & Applications, Vol.1 & 2, £111
[Cox/1994] Fuzzy Systems Handbook, £5.70, 2nd ed (1998), £28
[Heske/1996] Fuzzy Logic for Real World Design, Ted & Jill Heske, 1996 Accessible to a secondary school audience.
[Ross/1994] Fuzzy Logic with Engineering Applications, Tim Ross, £4.90, 4th ed £53
[Chen/Pham/2001] Introduction to Fuzzy Sets, Fuzzy Logic, and Fuzzy Control Systems, (Sale price: £6.12, Regular price: £60) by Guanrong Chen and Truang Tat Pham, 2001, CRC Press, 328pp. Comprehensive treatment of interval methods, i.e. Calculus of Fuzzy Rules (CFR)
[Li/Yen/1995], Fuzzy Sets and Fuzzy Decision-Making, by Hong Xing Li and Vincent C. Yen, 1995, CRC press, 270pp.
[Nguyen/Walker/2023], A First Course in Fuzzy Logic, by Hung T. Nguyen, Walker, Walker, 4th ed (£40), 3rd ed (£20), 2nd ed, 1st ed
[Nguyen/Prasad/Walker/2002], A First Course in Fuzzy and Neural Control, £32
Related Technologies
[Tibshirani/2009] Elements of Statistical Learning, 2nd Edition, free PDF download courtesy of the authors & publishers), Trevor Hastie, Robert Tibshirani, and Jerome Friedman, 2009, Springer-Verlag. Review of First Edition in SIAM (2002) by Michael Chernick.
Historical
[Trillas/2011] Lotfi A. Zadeh: On the Man and his Work, E. Trillas, Scientia Iranica, Volume 18, Issue 3, June 2011, Pages 574-579
Lotfi A. Zadeh biography (Wikipedia) | Obituary: Zadeh, Sep 19th, 2017, New York Times | Zadeh Papers: Zadeh Papers & Technical Reports at Berkeley, Journal articles at ACM
[Kreinovich, 2011] – In the Beginning was the word, and the word was Fuzzy (Online) Short expository essay reflecting on Kreinovich’s introduction to fuzzy logic in the former USSR and its connection to the dialectic philosophy of Hegel.

Mathematical Science & Technologies

Fuzzy Classifiers and Quantile Statistics for continuous data monitoring with adaptive thresholds

1. Case study: providing automatic classification of urban air quality

2. Choosing a statistical classifier

3. The difference between crisp (conventional) and fuzzy classifiers.

4. The fuzzy classifier: making a naive classification approach work with low computational impact

5. Evaluating the outcome

6. Using fuzzy classifiers – adaptive thresholds, unsupervised vs. supervised learning, and a general framework

7. Possibilities

Appendix 1: Source codes in R for the fuzzy classifier

Appendix 2: Using the fuzzy classifier

Appendix 3: References

Leave a Reply

Dear Readers:

Reading List…

Mathematics History & Philosophy

Topics in Mathematics: Pure & Applied Mathematics

Technology: Electronics & Embedded Computing

Technology: Sensors & Intelligent Systems

Maths Education

Explore…

Timeline

Mathematical Science & Technologies

Fuzzy Classifiers and Quantile Statistics for continuous data monitoring with adaptive thresholds

1. Case study: providing automatic classification of urban air quality

2. Choosing a statistical classifier

3. The difference between crisp (conventional) and fuzzy classifiers.

4. The fuzzy classifier: making a naive classification approach work with low computational impact

5. Evaluating the outcome

6. Using fuzzy classifiers – adaptive thresholds, unsupervised vs. supervised learning, and a general framework

7. Possibilities

Appendix 1: Source codes in R for the fuzzy classifier

Appendix 2: Using the fuzzy classifier

Appendix 3: References

Leave a Reply

Dear Readers:

Reading List…

Mathematics History & Philosophy

Topics in Mathematics: Pure & Applied Mathematics

Technology: Electronics & Embedded Computing

Technology: Sensors & Intelligent Systems

Maths Education

Browse…

Explore…

Timeline