Fuzzy Classifiers and Quantile Statistics for continuous data monitoring with adaptive thresholds

Abstract This brief note explores the use of fuzzy classifiers, with membership functions chosen using a statistical heuristic (quantile statistics), to monitor time-series metrics. The time series can arise from environmental measurements, industrial process control data, or sensor system outputs. We demonstrate implementation using the R language on an example dataset (ozone levels in New York City). Click here to skip straight to the coded solution), or read on for the discussion.

Fuzzy classification into 5 classes using p10 and p90 levels to achieve an 80-20 rule in the outermost classes and graded class membership in the inner three classes. Comparison with crisp classifier using the same 80-20 rule is shown in the bottom panel of the figure.

Continue reading this article…

Data Science

Short Articles on Data Science
Continue reading this article…

Everything (Desktop Search)

If you haven’t done so already, you may want to start by reading the Preface to the Computing Series: Software as a Force Multiplier, Sections 1-3.

“Everything” you need for ultra-fast desktop search

1. Everything(tm) is an ultra-fast desktop search utility that can scan through hundreds of thousands of files in milliseconds using a pre-built and real-time updated index.

“Everything” brings order to information growing at scale (documents, photographs, source code, spreadsheets, etc.), and tames the problem of proliferating folder trees.

Everything is a fast desktop search utility that can index 1 million files in less than 1 minute, and generate search queries in milliseconds.

Everything is a fast desktop search utility that can index 1 million files in less than 1 minute, and generate search queries in milliseconds.

We’ve all been in the scenario of searching through electronic documents for a document you know you prepared three, maybe four weeks prior… maybe it was longer… and now you can’t remember where you saved it… or in what format: was it a quickly written text file, a word document, a few paragraphs within One Note, on a desktop post-it note, or did you email yourself from your phone?… After trying different Windows searches in various recently used folders and looking through Word, Excel, and PDF files, and trying to remember possible filenames to search for, at some point you prepare mentally for the moment when you will give up the search and attempt to redo the missing work, salvaging as much of it as you can remember.

The general problem of wasted effort locating information we know we have, occurs more often than we’d like to admit. With “Everything“, it can be better.

Continue reading this article…

A Radar Tracking Approach to Data Mining

(Statistics and Data Mining II)

Automated decision problems are frequently encountered in statistical data processing and data mining. An heuristic filter or heuristic classifier typically has a limited set of input data from which to arrive at a set of conclusions and make a decision: REJECT, ACCEPT, or UNDETERMINED. In such cases, pre-processing the input data before applying the heuristic classifier can substantially enhance the performance of the decision system.

In this article, I’ll motivate the use of a radar-tracking algorithm to improve the performance of automated decision making and statistical estimation in data processing. I will illustrate using the website visitation statistics problem.

Continue reading this article…

Analysis of Visitor Statistics: Data Mining in-the-Small

(Statistics and Data Mining I)

For a variety of reasons, meaningful website visitation and visitor behavior statistics are an elusive data set to generate. This article introduces the visitor statistics problem, and describes seven challenges that must be overcome by statistical and data analysis techniques aiming for accurate estimates. Along the way, we’ll encounter the “Good News Cheap, Bad News Expensive” Paradox of Data Mining — or, why information is often used “as-is”.

This article is the first in a series on algorithms, statistics and data analysis techniques (using free and open source tools) using the visitor statistics problem as a vehicle for illustration.

Continue reading this article…

Stats: 1,066,417 article views since 2010 (March update)

Dear Readers:

Welcome to the conversation!  We publish long-form pieces as well as a curated collection of spotlighted articles covering a broader range of topics.   Notifications for new long-form articles are through the feeds (you can join below).  We love hearing from you.  Feel free to leave your thoughts in comments, or use the contact information to reach us!

Reading List…

Looking for the best long-form articles on this site? Below is a curated list by the main topics covered.

Mathematics-History & Philosophy

  1. What is Mathematics?
  2. Prehistoric Origins of Mathematics
  3. The Mathematics of Uruk & Susa (3500-3000 BCE)
  4. How Algebra Became Abstract: George Peacock & the Birth of Modern Algebra (England, 1830)
  5. The Rise of Mathematical Logic: from Laws of Thoughts to Foundations for Mathematics
  6. Mathematical Finance and The Rise of the Modern Financial Marketplace
  7. A Course in the Philosophy and Foundations of Mathematics
  8. The Development of Mathematics
  9. Catalysts in the Development of Mathematics
  10. Characteristics of Modern Mathematics

Electronic & Software Engineering

  1. Electronics in the Junior School - Gateway to Technology
  2. Coding for Pre-Schoolers - A Turtle Logo in Forth
  3. Experimenting with Microcontrollers - an Arduino development kit for under £12
  4. Making Sensors Talk for under £5, and Voice Controlled Hardware
  5. Computer Programming: A brief survey from the 1940s to the present
  6. Forth, Lisp, & Ruby: languages that make it easy to write your own domain specific language (DSL)
  7. Programming Microcontrollers: Low Power, Small Footprints & Fast Prototypes
  8. Building a 13-key pure analog electronic piano.
  9. TinyPhoto: Embedded Graphics and Low-Fat Computing
  10. Computing / Software Toolkits
  11. Assembly Language programming (Part 1 | Part 2 | Part 3)
  12. Bare Bones Programming: The C Language

Pure & Applied Mathematics

  1. Fuzzy Classifiers & Quantile Statistics Techniques in Continuous Data Monitoring
  2. LOGIC in a Nutshell: Theory & Applications (including a FORTH simulator and digital circuit design)
  3. Finite Summation of Integer Powers: (Part 1 | Part 2 | Part 3)
  4. The Mathematics of Duelling
  5. A Radar Tracking Approach to Data Mining
  6. Analysis of Visitor Statistics: Data Mining in-the-Small
  7. Why Zero Raised to the Zero Power IS One

Technology: Sensors & Intelligent Systems

  1. Knowledge Engineering & the Emerging Technologies of the Next Decade
  2. Sensors and Systems
  3. Unmanned Autonomous Systems & Networks of Sensors
  4. The Advance of Marine Micro-ROVs

Math Education

  1. Teaching Enriched Mathematics, Part 1
  2. Teaching Enriched Mathematics, Part 2: Levelling Student Success Factors
  3. A Course in the Philosophy and Foundations of Mathematics
  4. Logic, Proof, and Professional Communication: five reflections
  5. Good mathematical technique and the case for mathematical insight

Explore…

Timeline