In mathematics, truth is universal. In data, truth lies in the where clause of the query. As large organizations have grown to rely on their data more significantly for decision making, a common problem is not being able to agree on what the data is. As the volume and velocity of data grow, challenges emerge in answering questions with precision. A simple question like “what was the revenue yesterday” could become mired in details. Did your query account for transactions that haven’t been finalized? If I query again later, should I exclude orders that have been returned since the last query? What time zone should I use? The list goes on and on. In any large enough organization, you are also likely to find multiple copies if the same data. Independent systems might record the same information with slight variance. Sometimes systems will import data from other systems; a process which could become out of sync for several reasons. For any sufficiently large system, answering analytical questions with precision can become a non-trivial challenge. The business intelligence community aspires to provide a “single source of truth” – one canonical place where data consumers can go to get precise, reliable, and […]
The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.
Fast radio bursts are an astrophysical phenomenon first observed in 2007. While many observations have been made, science has yet to explain the mechanism for these events. This has led some to ask: could it be a form of extra-terrestrial communication? Probably not. Kyle asks Gerry Zhang who works at the Berkeley SETI Research Center about this possibility and more importantly, about his applications of deep learning to detect fast radio bursts. Radio astronomy captures observations from space which can be converted to a waterfall chart or spectrogram. These data structures can be formatted in a visual way and also make great candidates for applying deep learning to the task of detecting the fast radio bursts. About the “Data Skeptic” Podcast
This episode explores the root concept of what it is to be Bayesian: describing knowledge of a system probabilistically, having an appropriate prior probability, know how to weigh new evidence, and following Bayes’s rule to compute the revised distribution. We present this concept in a few different contexts but primarily focus on how our bird Yoshi sends signals about her food preferences. Like many animals, Yoshi is a complex creature whose preferences cannot easily be summarized by a straightforward utility function the way they might in a textbook reinforcement learning problem. Her preferences are sequential, conditional, and evolving. We may not always know what our bird is thinking, but we have some good indicators that give us clues. About the “Data Skeptic” Podcast
This is our interview with Dorje Brody about his recent paper with David Meier, How to model fake news. This paper uses the tools of communication theory and a sub-topic called filtering theory to describe the mathematical basis for an information channel which can contain fake news. Thanks to our sponsor Gartner. About the “Data Skeptic” Podcast
Without getting into definitions, we have an intuitive sense of what a “community” is. The Louvain Method for Community Detection is one of the best known mathematical techniques designed to detect communities. This method requires typical graph data in which people are nodes and edges are their connections. It’s easy to imagine this data in the context of Facebook or LinkedIn but the technique applies just as well to any other dataset like cellular phone calling records or pen-pals. The Louvain Method provides a means of measuring the strength of any proposed community based on a concept known as Modularity. Modularity is a value in the range that measure the density of links internal to a community against links external to the community. The quite palatable assumption here is that a genuine community would have members that are strongly interconnected. A community is not necessarily the same thing as a clique; it is not required that all community members know each other. Rather, we simply define a community as a graph structure where the nodes are more connected to each other than connected to people outside the community. It’s only natural that any person in a community has many connections to people […]