Data Science

Spread the love

Short Articles on Data Science



#211 – Antibiotics effective on drug-resistant bacteria have been found using computer-aided drug discovery running machine learning/AI search algorithms on databases of pharmaceutical compounds
Feb 21st, 2020
Researchers at MIT trained a deep learning algorithm using 2,500 compounds that were effective at killing bacteria. They then turned the algorithm loose on 6000 compounds under investigation and found “halicin”. They then expanded the search space of the algorithm to 107 million (~7%) of a massive database of 1.5 billion known pharma compounds. In a few hours, the algorithm had identified 23 promising candidates, of which two, in addition to halicin, have been found to be highly effective in lab trials on almost all known drug-resistant bacteria.

This is good news for medical science, and great news for the potency of deep machine learning.

[1] Discovering novel super antibiotics using machine learning



#191 – Statistical Testing: Why Signal to Noise (and Type II errors) matter.
Mon 19th August, 2019

The sample size for a statistical test is a function of three factors:

(1) significance level, or what likelihood to assign Type 1 errors (false alarms). Typically this is at least 5% (95% confidence).

(2) power of the test to detect the effect, or what likelihood to assign Type 2 errors (missed detections). Typically this is at least 20% (80% power), though for important studies, this will be 10% (90% power).

(3) Signal to Noise Ratio (SNR), which is the ratio of the difference in the means (signal) and the standard deviation in measurements (noise).

The result indicates the recommended sample size.

Rules of thumb:
SNR=0.2 means N=500,
SNR=1.0 means N=20,
SNR>=2.0 means N<=6.

Sample Size from Signal to Noise Ratio and Desired Power

Sample Size from Signal to Noise Ratio and Desired Power

Reference:
Power & Sample Size



#186 – Data looks better naked. Less is more… effective, attractive, impactive.
Sat 20th July, 2019
In Aug 2013, Joey Cherdarchuk, cofounder of Darkhorse Analytics, published the 3-part series called Data looks better naked.

Everyone working with tables, numbers, data, visualizations, will benefit from looking at the slides. The changes are easy to make but powerful. They embody some fundamental tenets: (1) maximize the data-ink ratio by reducing non-data related ink, (2) simplify, (3) to let the data speak for itself i.e. get rid of the clutter.

Love the quote at the end: “Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away” – Antoine de Saint-Exupery

Data Looks better Naked - Less Terrible Tables

The series:
Part I: Improve your Bar Charts

Part II: Improve your Data Tables

Part III: Improve your Heat Maps

References:

  1. Data Looks Better Naked series
    DarkHorse Analytics
  2. Joey Cherdarchuk


#113 20170806 – Tribute to Usain Bolt – 100m stats


#101 20160917 – Data & Infographics – The Fallen in the World War


#100 20160917 – Data Visualization – The Fallen in the World War


#092 20160313 – Financial Signals in the US Macro Economy


#081 20151007 – Analytics in the AWS cloud


#080 20150923 – Distinguishing occupants in a room


#55b 20140927 – Big Data isn’t about Big The focus on data should be about analytical capability rather than size: “‘Big Data’ is the subjective state a company finds itself in when its human and technical infrastructure can’t keep pace with its data needs.”


#049 20140819 – Presenting Data: Less terrible tables


#036 20140627 – “The Map is Not the Territory”


#033 20140624 – Who, really, is a Data Scientist?


#024 20140508 – Innovation I


#018 20140426 – The technical world tends to splinter


#013 20140424 – Probability is the heart of simulation

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

  

  

  

Your comments are valued! (Please indulge the gatekeeping question as spam-bots cannot (yet) do simple arithmetic...) - required

Optionally add an image (JPEG only)

 

Stats: 1,072,680 article views since 2010 (May '24 update)

Dear Readers:

Welcome to the conversation!  We publish long-form pieces as well as a curated collection of spotlighted articles covering a broader range of topics.   Notifications for new long-form articles are through the feeds (you can join below).  We love hearing from you.  Feel free to leave your thoughts in comments, or use the contact information to reach us!

Reading List…

Looking for the best long-form articles on this site? Below is a curated list by the main topics covered.

Mathematics-History & Philosophy

  1. What is Mathematics?
  2. Prehistoric Origins of Mathematics
  3. The Mathematics of Uruk & Susa (3500-3000 BCE)
  4. How Algebra Became Abstract: George Peacock & the Birth of Modern Algebra (England, 1830)
  5. The Rise of Mathematical Logic: from Laws of Thoughts to Foundations for Mathematics
  6. Mathematical Finance and The Rise of the Modern Financial Marketplace
  7. A Course in the Philosophy and Foundations of Mathematics
  8. The Development of Mathematics
  9. Catalysts in the Development of Mathematics
  10. Characteristics of Modern Mathematics

Electronic & Software Engineering

  1. Electronics in the Junior School - Gateway to Technology
  2. Coding for Pre-Schoolers - A Turtle Logo in Forth
  3. Experimenting with Microcontrollers - an Arduino development kit for under £12
  4. Making Sensors Talk for under £5, and Voice Controlled Hardware
  5. Computer Programming: A brief survey from the 1940s to the present
  6. Forth, Lisp, & Ruby: languages that make it easy to write your own domain specific language (DSL)
  7. Programming Microcontrollers: Low Power, Small Footprints & Fast Prototypes
  8. Building a 13-key pure analog electronic piano.
  9. TinyPhoto: Embedded Graphics and Low-Fat Computing
  10. Computing / Software Toolkits
  11. Assembly Language programming (Part 1 | Part 2 | Part 3)
  12. Bare Bones Programming: The C Language

Pure & Applied Mathematics

  1. Fuzzy Classifiers & Quantile Statistics Techniques in Continuous Data Monitoring
  2. LOGIC in a Nutshell: Theory & Applications (including a FORTH simulator and digital circuit design)
  3. Finite Summation of Integer Powers: (Part 1 | Part 2 | Part 3)
  4. The Mathematics of Duelling
  5. A Radar Tracking Approach to Data Mining
  6. Analysis of Visitor Statistics: Data Mining in-the-Small
  7. Why Zero Raised to the Zero Power IS One

Technology: Sensors & Intelligent Systems

  1. Knowledge Engineering & the Emerging Technologies of the Next Decade
  2. Sensors and Systems
  3. Unmanned Autonomous Systems & Networks of Sensors
  4. The Advance of Marine Micro-ROVs

Math Education

  1. Teaching Enriched Mathematics, Part 1
  2. Teaching Enriched Mathematics, Part 2: Levelling Student Success Factors
  3. A Course in the Philosophy and Foundations of Mathematics
  4. Logic, Proof, and Professional Communication: five reflections
  5. Good mathematical technique and the case for mathematical insight

Explore…

Timeline