Abstract This brief note explores the use of fuzzy classifiers, with membership functions chosen using a statistical heuristic (quantile statistics), to monitor time-series metrics. The time series can arise from environmental measurements, industrial process control data, or sensor system outputs. We demonstrate implementation using the R language on an example dataset (ozone levels in New York City). Click here to skip straight to the coded solution), or read on for the discussion.
1. Case study: providing automatic classification of urban air quality
Time-series approach
Consider the problem of monitoring ozone levels in an urban environment and issuing a daily text message alert to residents indicating meaningful risk levels. What should these levels be, and how should the thresholds be set? If we imagine a scenario where clinical studies on human response do not yet exist on which to base the thresholds, an alternative approach is to provide a statistical classification of ozone levels relative to previous history, i.e. classed as VERY-LOW, LOW, VERY-HIGH, HIGH, and NORMAL. Let’s see how we might do this using an example dataset from a New York City study
Looking at the NYC ozone study
The NYC ozone dataset from the State Department of Conservation is built into R and measures mean ozone levels (alongside 5 other air quality variables) over 153 days in the summer of 1973 (May 1st to Sep 30th, 1300-1500h, Roosevelt Island station). Let’s load it up in R and inspect it.
### Load airquality dataset library(datasets) airquality help(airquality) # show dataset details airquality$Ozone # parts per billion ### Data visualization plot(airquality$Ozone) lines(airquality$Ozone) title("Ozone level at Roosevelt Island, NYC\n mean ppb, 1300h-1500h, May 1-Sep 30, 1973") hist(airquality$Ozone)
To summarize the observations, use boxplot() to show graphically its non-parametric statistics i.e. the 5-number summary at min (p0), p25, p50, p75, and max (p100) levels. quantile() shows the numerical values. Note that by convention the 5-number summary captures the middle 50% of the distribution (between p25 and p75).
boxplot(airquality$Ozone) clean <- function(v) { v[!is.na(v)] } # remove's NAs from a list (ex: clean(d)) dd <- clean(airquality$Ozone) quantile(dd) # 5-number summary at min, Q1, Q2, Q3, max levels
2. Choosing a statistical classifier
Classifiers are heuristics matching input data to categorical labels, with the "goodness of fit" depending on an (often subjective) separate estimate of the correct answer.
In what follows, we use the non-parametric statistics heuristic with p-levels associated with quantiles as a way to capture a fixed % of the distribution. For time-series data that are, or can be assumed to be, essentially stationary, the selection of thresholds can be done once and adjusted dynamically; for highly varying, seasonal, or other non-stationary time-series, the analysis becomes more complicated, with the central ideas below requiring adaptation.
Let us take as a tenet the use of the 80-20 rule for flagging HIGH and LOW outliers, i.e. 1 in 5 observations should be abnormal, and the remaining 4 considered within usual levels of variation. This implies setting the quantiles at p10 and p90 levels. Notice, quantile() is overridden with the new probs vector containing the p10 and p90 levels, and the resulting distribution is plotted against these thresholding curves.
qq <- quantile(dd, probs=c(0, 0.1, 0.5, 0.9, 1)) # 5-number summary at custom levels qq plot(airquality$Ozone) curve(0*x+qq[[2]],add=TRUE) # add horizontal lines at the quartile levels curve(0*x+qq[[4]],add=TRUE)
3. The difference between crisp (conventional) and fuzzy classifiers.
If this is all we do, then it remains only to formalize the classification around the pre-selected p10 and p90 thresholds. The is given below, and it should be no surprise that the results show 80% of the values classed as normal, with 10% in each of the two classes LOW and HIGH. The size of the buckets was pre-determined by the tenet of an 80-20 classification outcome. The only choice was what this meant in terms of the thresholds, and this was provided by the quantile() function at the respective p-levels.
cc3 <- c("LOW", "NORMAL", "HIGH") # Crisp classifier classify <- function(x) { if (x < qq[[2]]) cc3[1] else if (x > qq[[4]]) cc3[3] else cc3[2] } b <- sapply(dd, classify) b table(b) table(b)/length(dd)
But you should be protesting at the use of such a classifier. Within the "normal" bucket, containing 80% of the distribution, there is a wide range of values, which leads to a problem that bedevils all attempts at crisp classification into a manageable number of levels: (1) values at the top and bottom ends (for example 85 and 18 ppb) are both classed together as normal despite the fact that they are closer to points on the other side of the class boundaries (for example 89 and 12 ppb) that are classed as HIGH and LOW respectively.
Crisp classifiers get around the above limitations by using more complex mathematical models
The discussion so far has taken the simplest possible approach to classification, and the critique of the simple approach above is deserved. But of course, no one in statistical classification would use the threshold logic as described above without accommodating for distance between points. One approach would be to find the separating hyperplane that minimizes the average distance between pairs on either sides of that plane. Another approach, similar, is based on choosing centroids of clusters, again minimizing distance between pairs of points. Both of these optimization algorithms get around the above critique and give good results, but they are much more complex mathematically (see the classic reference [Tibshirani/2009] Elements of Statistical Learning, free PDF download courtesy of the authors & publishers).
4. The fuzzy classifier: making a naive classification approach work with low computational impact
The appeal with fuzzy classification is that it is able to preserve the simple approach and resolve discontinuity problem in a computationally efficient way using simple min-max in a point calculation, rather than requiring to solve a combinatorial optimization problem ranging over all pairs of points.
The key to fuzzy classification is a definition of set membership that includes degree of membership as a mechanism to allow "fuzzy boundaries". A single element now is a member of all classes but to different degrees. The final classification has a strength, which matches more closely our intuition that classifications are inherently fuzzy, resisting crisp boundaries.
Let's look at the how this is implemented.
# Defining linear membership functions L(x), M(x), H(x) (low, medium, high) based on p10 and p90 quantile levels x0 <- qq[[2]] x1 <- qq[[4]] Ly0 <- 1.0 Ly1 <- 0.0 Hy0 <- 0.0 Hy1 <- 1.0 xmin <- qq[[1]] xmed <- qq[[3]] xmax <- qq[5] L <- function(x) { # Low if (x < x0) Ly0 else if (x > x1) Ly1 else Ly0 + (x-x0)*(Ly1-Ly0)/(x1-x0) } H <- function(x) { # High if (x < x0) Hy0 else if (x > x1) Hy1 else Hy0 + (x-x0)*(Hy1-Hy0)/(x1-x0) } M <- function(x) { # Medium if (x < xmed) Hy0 + (x-xmin)*(Hy1-Hy0)/(xmed-xmin) else Hy0 + (x-xmax)*(Ly1-Ly0)/(xmax-xmed) } # show fuzzy membership functions x <- 1:max(dd) plot(c(1,max(dd)),c(0,1)) lines(x,sapply(x,L)) lines(x,sapply(x,H)) lines(x,sapply(x,M))
Let's talk through what is going on above. The fuzzy classifier above uses membership functions L(x), M(x), H(x) (low, medium, high) to generate overlapping degrees of membership for each class. The membership functions are defined over the same domain, allowing classifications that show graded membership.
What happens next is the key bit: the classification of a given point x is made by applying a fuzzification operation across the candidate fuzzy sets. In this case the fuzzification operator is the simple mathematical maximum, max(). Working an example: the value 70 is classified into the "LOW" bucket based on L(70) returning the highest (max) value from the three choices L(70), M(70), H(70).
# example: cc3 <- c("LOW", "NORMAL", "HIGH") fv <- c(L(70),M(70),H(70)) # fuzzy vector fv max(fv) which.max(fv) cc3[which.max(fv)]
We use this notion to define the fuzzy classifier itself, and apply it to the dataset:
# Fuzzy classifier into 3 classes cc3 <- c("LOW", "NORMAL", "HIGH") fuzzy_classify3 <- function(x) { fv <- c(L(x), M(x), H(x)) # print(fv) #debug # print(max(fv)) #debug m3 <- which.max(fv) cc3[m3] } bb <- sapply(dd, fuzzy_classify3) bb table(bb) table(bb)/length(dd)
Notice that the medium cluster now occurs only 35% of the time (or 44% of the 80% previously classed as NORMAL), which is a slight problem. The fuzzy classifier into three clusters no longer preserves the 80-20 rule that led us to use p10 and p90 thresholds in the first place.
The fix is to use 5 clusters where the outermost two (VERY HIGH and VERY LOW) preserve the 80-20 rule based on the p90 and p10 clusters in line with the crisp classifier . Fuzziness is limited to the middle cluster now distinguished into three: medium-high, medium-low (now labelled HIGH and LOW), and the reduced MIDDLE class holding the 35% of instances.
cc5 <- c("VERY LOW", "LOW", "MEDIUM", "HIGH", "VERY HIGH") fuzzy_classify5 <- function(x) { fv <- c(L(x), M(x), H(x)) # print(fv) #debug # print(max(fv)) #debug m3 <- which.max(fv) m5 <- if (m3==1 && max(fv)==1) 1 else if (m3==3 && max(fv)==1) 5 else (1+m3) cc5[m5] } bb <- sapply(dd, fuzzy_classify5) bb table(bb) table(bb)/length(dd)
5. Evaluating the outcome
The fuzzy classifier described above provides a continuous transition between the classification strengths between classes that share a common boundary. This can be seen by observing that each membership functions are piecewise linear and therefore piecewise continuous hence the max hull is also piecewise continuous.
The fact that the classification changes themselves are abrupt was not the concern (this is true in any categorical labeling, including in the separating hyperplane solution). Rather it is the degree of membership separating two objects across class boundaries e.g. 25 (classed LOW) and 26 (classed MEDIUM). Using the fuzzy classifier, this inter-cluster membership gradient is arbitrarily small based on the distance between the points, in the above example it is less than 2%.
This continuous degree of membership allows both categorical (rules-based) and proportional (level-based) responses to be used as part of an overall fuzzy systems control.
6. Using fuzzy classifiers - adaptive thresholds, unsupervised vs. supervised learning, and a general framework
In the application context of the daily text alert, fuzzy classification would provide the autonomous categorization each alert/alarm. The 80-20 tenet would be reflected in the fact that approximately 20% of alerts will be in the VERY HIGH or VERY LOW risk category.
Stepping back from the example context, the general framework for using a fuzzy classifier is as follows:
A. TRAIN the classifier on available history.
1. Where possible, CHOOSE simple thresholds. In the above, we used the 80-20 rule and therefore p10/p90 levels
2. DEFINE a fuzzy classifier on 5 classes - the outermost two classes preserve the 20% outliers, the inner three classes hold the 80% and provide finer grained classifications of these cases.
B. MONITOR for new inputs
3. CLASSIFY - classify incoming observations as done above using max() fuzzifier.
4. RESPOND - decide response with rules which could include alerts or mitigation actions or both
C. UPDATE thresholds dynamically
5. RECOMPUTE thresholds - add new observations to past history and reform the membership functions with each observation.
This approach offers:
1. a methodology for continuous monitoring (watchdog), including the use of sensors to instrument a process
2. a way to dynamically adapt the sensitivities, i.e. for the watchdog "to learn"
3. lowers barrier to entry -- unsupervised monitoring can start using descriptive statistics adjusted for any application specific tenets. If specific training data are available, thresholds can be refined using known subsets.
4. computationally efficient classifier
One should, of course, compare a heuristic as above with the optimized classifier based on a more computationally intensive combinatorial optimization approach.
7. Possibilities
For the application programmer, the appeal of fuzzy classifiers lies in the opportunity to define a fuzzy control system while remaining in the categorical abstraction layer of e.g. if HIGH then CLOSE WINDOWS, etc. This keeps the quantitative basis of the underlying analog sensor (in this case the ozone levels) encapsulated. As a second example, a fuzzy temperature checking could we use a thermometer and knowledge of normal body temperatures to classify temperatures into VERY LOW, LOW, NORMAL, ELEVATED, HIGH FEVER levels. A user does not need to understand how the heat from our bodies warms mercury which rises (analog thermometer) or how a temperature sensor is digitally sampled. The layer driving the appropriate response is the fuzzy classification; the number itself is a second, supporting layer of information.
Sensors becomes interesting for training set purposes. If the sensors can sample data from an a-priori known phenomena or pre-classified events, then the statistics around these known samples can enhance the detection/classification. For analog sensors, this opens up myriad possibilities:
- car sensors (brakes, auto gear changes)
- light senors (outside conditions based on changing inside light -- sunny, cloudy, rainy, as well as time of day based on light quality coming into a given south facing room)
Conclusion
The above statistical approach using fuzzy classification and adaptive thresholds is both simple, intuitive, of low computational cost, and can be effective when quick immediately workable solutions are needed, and better solutions can be added iteratively. The above method can be used with any single-variable data series that is essentially stationary. For non-stationary time series and multi-dimensional variables/classification, more extensive techniques are required.
Appendix 1: Source codes in R for the fuzzy classifier
The exploratory code mirroring the article above is here.
The final solution code is here, providing a library file and an example usage file.
Appendix 2: Using the fuzzy classifier
To use the fuzzy classifier, start by loading the fuzzy_classify.r library file:
source("fuzzy_classify.r") # check it is visible in R's search path or current working directory getwd()
The main function is fuzzycluster(d) which runs the fuzzy classifer on dataset d and creates the fuzzy data frame (fdf) holding 9 results needed for analysis, visualization, etc.
Follow the example user code to see how to use the fuzzy classifier library on the ozone data set.
source("fuzzy_classify.r") #### load FUNCTIONS for fuzzy classifier library(datasets) # load built-in datasets d <- airquality$Ozone # set time series ### Exploratory data visualization plot(d) lines(d) title("Ozone level at Roosevelt Island, NY\n mean ppb, 1300h-1500h, May 1-Sep 30, 1973") hist(d) boxplot(d) ### Run the fuzzy classifier fdf <- fuzzycluster(d) # form fuzzy data frame fdf with 9 columns
The key function above is fuzzycluster(d) defined in the library file here which runs the fuzzy classifer on dataset vector d and creates the fuzzy data frame (fdf) holding 9 results needed for analysis, visualization, etc.
- d - data vector, should be numeric
- dd - cleaned data, with NA removed
- q - quantiles using p10, p90 instead of p25, p75
- bb - fuzzy classifier results for each value in dd what is its classification
- vhigh, high, med, low, vlow - sublist showing which values are in what buckets
### Review results plotq(fdf$dd,fdf$q) # show the cleaned up dataset with thresholds plot_fmf3(fdf$dd) # show fuzzy membership functions fdf$bb # show fuzzy bucketization table(fdf$bb) # summarize classification table(fdf$bb)/length(fdf$dd) # relative class frequencies table(head(fdf$bb,50))/50 # last 50 days performance summary(fdf$vhigh) # summarize buckets summary(fdf$high) summary(fdf$med) summary(fdf$low) summary(fdf$vlow)
Appendix 3: References
-
Papers
- [Zadeh/1965] Fuzzy Sets, Lotfi Zadeh, Information & Control, Vol 8, Issue 3, June 1965, Pages 338-353
- [Zadeh/1972d], Outline of a New Approach to the Analysis of Complex Systems and Decision Processes by Lotfi Zadeh, 1972, Technical Report University of California Berkeley: ERL M342
- [Zadeh, 1977] – Fuzzy Logic
Books
- [Zadeh/2012] Computing with Words, £80
- [Zadeh/Aliev/2019] Fuzzy Logic: Theory & Applications, Vol.1 & 2, £111
- [Cox/1994] Fuzzy Systems Handbook, £5.70, 2nd ed (1998), £28
- [Heske/1996] Fuzzy Logic for Real World Design, Ted & Jill Heske, 1996 Accessible to a secondary school audience.
- [Ross/1994] Fuzzy Logic with Engineering Applications, Tim Ross, £4.90, 4th ed £53
- [Chen/Pham/2001] Introduction to Fuzzy Sets, Fuzzy Logic, and Fuzzy Control Systems, (Sale price: £6.12, Regular price: £60) by Guanrong Chen and Truang Tat Pham, 2001, CRC Press, 328pp. Comprehensive treatment of interval methods, i.e. Calculus of Fuzzy Rules (CFR)
- [Li/Yen/1995], Fuzzy Sets and Fuzzy Decision-Making, by Hong Xing Li and Vincent C. Yen, 1995, CRC press, 270pp.
- [Nguyen/Walker/2023], A First Course in Fuzzy Logic, by Hung T. Nguyen, Walker, Walker, 4th ed (£40), 3rd ed (£20), 2nd ed, 1st ed
- [Nguyen/Prasad/Walker/2002], A First Course in Fuzzy and Neural Control, £32
Related Technologies
- [Tibshirani/2009] Elements of Statistical Learning, 2nd Edition, free PDF download courtesy of the authors & publishers), Trevor Hastie, Robert Tibshirani, and Jerome Friedman, 2009, Springer-Verlag. Review of First Edition in SIAM (2002) by Michael Chernick.
Historical
- [Trillas/2011] Lotfi A. Zadeh: On the Man and his Work, E. Trillas, Scientia Iranica, Volume 18, Issue 3, June 2011, Pages 574-579
- Lotfi A. Zadeh biography (Wikipedia) | Obituary: Zadeh, Sep 19th, 2017, New York Times | Zadeh Papers: Zadeh Papers & Technical Reports at Berkeley, Journal articles at ACM
- [Kreinovich, 2011] - In the Beginning was the word, and the word was Fuzzy (Online) Short expository essay reflecting on Kreinovich's introduction to fuzzy logic in the former USSR and its connection to the dialectic philosophy of Hegel.
Leave a Reply