Abstract This brief note explores the use of fuzzy classifiers, with membership functions chosen using a statistical heuristic (quantile statistics), to monitor time-series metrics. The time series can arise from environmental measurements, industrial process control data, or sensor system outputs. We demonstrate implementation using the R language on an example dataset (ozone levels in New York City). Click here to skip straight to the coded solution), or read on for the discussion.
Fuzzy classification into 5 classes using p10 and p90 levels to achieve an 80-20 rule in the outermost classes and graded class membership in the inner three classes. Comparison with crisp classifier using the same 80-20 rule is shown in the bottom panel of the figure.
Continue reading this article…
Short Articles on Data Science
Continue reading this article…
If you haven’t done so already, you may want to start by reading the Preface to the Computing Series: Software as a Force Multiplier, Sections 1-3.
“Everything” you need for ultra-fast desktop search
1. Everything(tm) is an ultra-fast desktop search utility that can scan through hundreds of thousands of files in milliseconds using a pre-built and real-time updated index.
“Everything” brings order to information growing at scale (documents, photographs, source code, spreadsheets, etc.), and tames the problem of proliferating folder trees.
Everything is a fast desktop search utility that can index 1 million files in less than 1 minute, and generate search queries in milliseconds.
We’ve all been in the scenario of searching through electronic documents for a document you know you prepared three, maybe four weeks prior… maybe it was longer… and now you can’t remember where you saved it… or in what format: was it a quickly written text file, a word document, a few paragraphs within One Note, on a desktop post-it note, or did you email yourself from your phone?… After trying different Windows searches in various recently used folders and looking through Word, Excel, and PDF files, and trying to remember possible filenames to search for, at some point you prepare mentally for the moment when you will give up the search and attempt to redo the missing work, salvaging as much of it as you can remember.
The general problem of wasted effort locating information we know we have, occurs more often than we’d like to admit. With “Everything“, it can be better.
Continue reading this article…
By Assad Ebrahim, on November 15th, 2010 (11,800 views) |
Topic: Maths--Data Science
(Statistics and Data Mining II)
Automated decision problems are frequently encountered in statistical data processing and data mining. An heuristic filter or heuristic classifier typically has a limited set of input data from which to arrive at a set of conclusions and make a decision: REJECT, ACCEPT, or UNDETERMINED. In such cases, pre-processing the input data before applying the heuristic classifier can substantially enhance the performance of the decision system.
In this article, I’ll motivate the use of a radar-tracking algorithm to improve the performance of automated decision making and statistical estimation in data processing. I will illustrate using the website visitation statistics problem.
Continue reading this article…
By Assad Ebrahim, on September 3rd, 2010 (16,587 views) |
Topic: Maths--Data Science
(Statistics and Data Mining I)
For a variety of reasons, meaningful website visitation and visitor behavior statistics are an elusive data set to generate. This article introduces the visitor statistics problem, and describes seven challenges that must be overcome by statistical and data analysis techniques aiming for accurate estimates. Along the way, we’ll encounter the “Good News Cheap, Bad News Expensive” Paradox of Data Mining — or, why information is often used “as-is”.
This article is the first in a series on algorithms, statistics and data analysis techniques (using free and open source tools) using the visitor statistics problem as a vehicle for illustration.
Continue reading this article…
|
Stats: 1,089,379 article views since 2010 (Aug '24 update)
Dear Readers: Welcome to the conversation! We publish long-form pieces as well as a curated collection of spotlighted articles covering a broader range of topics. Notifications for new long-form articles are through the feeds (you can join below). We love hearing from you. Feel free to leave your thoughts in comments, or use the contact information to reach us!
|