Rho: High Performance R
[MP4] [0:20:14] [2016/07/13]The Rho project (formerly known as CXXR) is working on transforming the current R interpreter into a high performance virtual machine for R. Using modern software engineering techniques and the…
New Paradigms In Shiny App Development: Designer + Data Scientist Pairing
[MP4] [0:14:04] [2016/07/13]With the help of Shiny, advanced analytics practitioners have been liberated from professional application development constraints: long-turn development cycles, difficult interactions with IT groups…
RosettaHUB-Sheets, a programmable, collaborative web-based spreadsheet for R, Python and Spark
[MP4] [0:21:12] [2016/07/13]RosettaHUB-Sheets combine the flexibility of the bi-dimensional data representation model of classic spreadsheets with the power of R, Python, Spark and SQL. RosettaHUB-Sheets are web based, they can…
Zero-overhead integration of R, JS, Ruby and C/C++
[MP4] [0:19:54] [2016/07/13]R is very powerful and flexible, but certain tasks are best solved by using R in combination with other programming languages. GNU R includes APIs to talk to some languages, e.g., Fortran and…
Group and sparse group partial least squares approaches applied in a genomics context
[MP4] [0:19:27] [2016/07/13]In this talk, I will concentrate on a class of multivariate statistical methods called Partial Least Squares (PLS). They are used for analysing the association between two blocks of 'omics' data,…
How can I get everyone else in my organisation to love R as much as I do?
[MP4] [0:15:40] [2016/07/13]Learning R is dangerous. It entices us in by presenting an incredibly powerful tool to solve our particular problem; for free! And as we learn how to do that, we uncover more things that make our…
Wrapping Your R tools to Analyze National-Scale Cancer Genomics in the Cloud
[MP4] [0:17:10] [2016/07/13]The Cancer Genomics Cloud (CGC), built by Seven Bridges and funded by the National Cancer Institute hosts The Cancer Genome Atlas (TCGA), that is one of the world's largest cancer genomics data…
R at Google with Craig Citro
[MP4] [0:18:24] [2016/07/12]I'll discuss R at Google - the explosive growth of R,interfaces between R and other parts of the Google computational infrastructure,documentation and support,Google contributions to the external…
Scalable Machine Learning in R with H2O
[MP4] [0:22:10] [2016/07/12]The focus of this talk is scalable machine learning using the H2O R packages. H2O is an open source, distributed machine learning platform designed for big data, with the added benefit that it's easy…
R for Big Data and Applications: Using R at Oracle
[MP4] [0:15:11] [2016/07/12]R continues to take the world by storm. At Oracle, we see more and more enterprises using R for mission critical applications, where scalability, performance, and deployment concerns have been…
Reusable R for automation, small area estimation and legacy systems
[MP4] [0:18:06] [2016/07/12]Running a complex model once is easy, just pull up your statistical program of choice, plug in the data, the model and off you go. The problem comes when you then find yourself trying to scale to…
High performance climate downscaling in R
[MP4] [0:12:29] [2016/07/12]Global Climate Models (GCMs) can be used to assess the impacts of future climate change on particular regions of interest, municipalities or pieces of infrastructure. However, the coarse spatial scale…
Helping R Stay in the Lead by Deploying Models with PFA
[MP4] [0:18:58] [2016/07/12]We introduce a new language for deploying analytic models into products, services and operational systems called the Portable Format for Analytics (PFA). PFA is an example of what is sometimes called…
Introducing the permutations package
[MP4] [0:15:45] [2016/07/12]A 'permutation' is a bijection from a finite set to itself. Permutations are important and interesting objects in a range of mathematical contexts including group theory, recreational mathematics, and…
Profvis: Profiling tools for faster R code
[MP4] [0:17:57] [2016/07/12]As programming languages go, R has a bit of a reputation for being slow. This reputation is mostly undeserved, and it hinges on the fact that R's copy-on-modify semantics make its performance…
How to use the archivist package to boost reproducibility of your research
[MP4] [0:18:22] [2016/07/12]The R package archivist allows you to share and reproduce R objects - artifacts with other researchers, either through a knitr script, embedded hooks in figure/table captions, shared folder or…
viztrackr: Tracking and discovering plots via automatic semantic annotations
[MP4] [0:17:42] [2016/07/12]Data analyses often produce many different data visualizations. Keeping track of these plots is crucial for both correctness and reproducibility of analytic results. Analysts typically resort to…
RCloud - Collaborative Environment for Visualization and Big Data Analytics
[MP4] [0:51:38] [2016/07/12]Analyzing Big Data in real life poses challenges with respect to performance, methodology and reusability. R is well known for its succinct syntax for analytic tasks as well as plethora of tools for…
Changing lives with Data Science at Microsoft
[MP4] [0:13:33] [2016/07/12]Whether it's called data science, machine learning, or analytics, the combination of new data sources and statistical modeling has produced some truly revolutionary applications. Many of these…
A Lap Around R Tools for Visual Studio
[MP4] [0:20:01] [2016/07/12]R Tools for Visual Studio is a new, Open Source and free tool for R Users built on top of the powerful Visual Studio IDE. In this talk, we will take you on a tour of its features and show you how they…
Towards a grammar of interactive graphics
[MP4] [1:00:24] [2016/07/09]I announced ggvis in 2014, but there has been little progress on it since. In this talk, I'll tell you a little bit about what I've been working on instead (data ingest, purrr, multiple models, ...)…
Day 3 Siepr 130 Ligtning Talks 1:00 PM - 1:40 PM
[MP4] [0:50:34] [2016/07/08]Day 3 Siepr 130 Ligtning Talks 1:00 PM - 1:40 PM Host: Julie Josse
Day 3 Econ 140 Lightning Talks 10:30 AM - 12:00 PM
[MP4] [1:11:09] [2016/07/08]Day 3 Econ 140 Lightning Talks 10:30 AM - 12:00 PM Host: Max Kuhn
Day 2 Siepr 120 Lightning Talks 10:30 AM 12:00 PM
[MP4] [1:19:34] [2016/07/07]Day2 Siepr 120 Lightning Talks 10:30 AM 12:00 PM Host Joe Rickert
Analysis of big biological sequence datasets using the DECIPHER package
[MP4] [0:19:42] [2016/07/06]Recent advances in DNA sequencing have led to the generation of massive amounts of biological sequence data. As a result, there is an urgent need for packages that assist in organizing and evaluating…
Optimizing Food Inspections with Analytics
[MP4] [0:05:20] [2016/06/18]In 2013 the City of Chicago was the recipient of a Bloomberg Philanthropies grant to develop a smart data platform. The aim of the platform is to develop tools to help city government increase…
Let's meet on satRday!
[MP4] [0:04:57] [2016/06/18]The idea of organizing cheap regional conferences, as a link between local R User Groups and international conferences on R, was brought up at the EARL 2015 conference in Boston, which was quickly…
The Best Time to Post on Reddit
[MP4] [0:04:04] [2016/06/18]I used R to visualize the best time to post on Reddit using data collected with Google's BigQuery. I queried a publicly accessible dataset containing almost 200 million Reddit posts (all of the posts…
Getting R into your bathroom
[MP4] [0:04:53] [2016/06/18]Have you ever considered how to use R when decorating your home? In this presentation I will show how R can be used to generate beautiful mathematical patterns based on complex numbers. In my case, we…
Estimating causal dose response functions using the causaldrf R package
[MP4] [0:06:11] [2016/06/18]Causal inference aims at the fundamental question of how changing the level of a cause or treatment can affect a subsequent outcome. Whether data analysts want to admit it or not, many analyses in…
rbokeh: A Simple, Flexible, Declarative Framework for Interactive Graphics
[MP4] [0:18:52] [2016/06/18]The rbokeh package is an R interface to the Bokeh visualization library. The interface is designed to be simple but flexible, allowing the expressiveness required for rapid generation of ad hoc…
Most Likely Transformations
[MP4] [0:18:43] [2016/06/18]The "mlt" package implements maximum likelihood estimation in the class of conditional transformation models. Based on a suitable explicit parameterisation of the unconditional or…
Using Jupyter notebooks with R in the classroom
[MP4] [0:14:23] [2016/06/18]When teaching statistics to non-programmers, the challenges of programming in R often exceed the challenge presented by new statistics concepts. This presentation will discuss a recent paper comparing…
ggduo: Pairs plot for two group data
[MP4] [0:14:11] [2016/06/18]The R package 'GGally' provides several amalgam plots that build on the basic 'ggplot2' plotting system. Functions produce multivariate plots like generalized scatterplot matrices, and parallel…
DataSHIELD: Taking the analysis to the data
[MP4] [0:22:17] [2016/06/18]Irrespective of discipline, data access and analysis barriers result from a range of scenarios: * ethical-legal restrictions surrounding confidentiality and the sharing of, or access to, disclosive…
Big data algorithms for rank-based estimation
[MP4] [0:19:08] [2016/06/18]Rank-based (R) estimation for statistical models is a robust nonparametric alternative to classical estimation procedures such as least squares. R methods have been developed for models ranging from…
Revolutionize how you teach and blog: add interactivity
[MP4] [0:15:40] [2016/06/18]R vignettes, blog posts and teaching materials are typically standard web pages generated with R Markdown. DataCamp has developed a framework to make this static content interactive: R code chunks are…
Estimation of causal effects in network-dependent data
[MP4] [0:17:01] [2016/06/18]We describe two R packages which facilitate causal inference research in network-dependent data: \pkg{simcausal} package for conducting network-based simulation studies; and \pkg{tmlenet} package for…
Superheat: Supervised heatmaps for visualizing complex data
[MP4] [0:18:03] [2016/06/18]Technological advancements of the modern era have enabled the collection of huge amounts of data in science and beyond. Accordingly, computationally intensive statistical and machine learning…
Statistical Thinking in a Data Science Course
[MP4] [1:01:16] [2016/06/18]The intuition and experience needed for sound statistics practice can be hard to learn, and a course that combines computing, statistics, and working with data offers an excellent learning environment…
Using Shiny for Formative Assessments
[MP4] [0:17:18] [2016/06/18]Shiny has become a popular approach for R developers to create interactive dashboards. Given the rich set of features available in Shiny, it has the capability for data entry and collection. This talk…
Practical tools for exploratory web graphics
[MP4] [0:20:23] [2016/06/18]Interactive statistical graphics toolkits play an important role in the exploratory phase of a data analysis cycle. Web graphics are rarely used during this phase, and are commonly reserved solely for…
Authoring Books with R Markdown
[MP4] [0:17:18] [2016/06/18]Markdown is a simple and popular language for writing. R Markdown (http://rmarkdown.rstudio.com) has made it really easy to author documents that contain R code, and convert these documents to a…
ranger: A fast implementation of random forests for high dimensional data
[MP4] [0:14:55] [2016/06/18]Random forests are widely used in applications, such as gene expression analysis, credit scoring, image processing or genome-wide association studies (GWAS). With currently available software, the…
Dynamic Data in the Statistics Classroom
[MP4] [0:16:16] [2016/06/18]The call for using {\em real} data in the classroom has long meant using datasets which are culled, cleaned, and wrangled prior to any student working with the observations. However, an important part…
Data Landscapes: a pragmatic and philosophical visualisation of the sustainable urban landscape
[MP4] [0:16:26] [2016/06/18]The Vernacular Ecology Index (VEI) is a newly proposed assessment method for sustainable urban development. It is composed of five elements (energy, culture, systems, placeness and vernacular) that…
Shiny Gadgets: Interactive tools for Programming and Data Analysis
[MP4] [0:18:26] [2016/06/18]A Shiny Gadget is an interactive tool that enhances your R programming experience. You make Shiny Gadgets with the same package that you use to make Shiny Apps, but you use Gadgets in a very different…
Gradient Boosted Trees Model: deploying R models into production environments*
[MP4] [0:19:26] [2016/06/18]R is the tool of choice for analyzing data and training sophisticated models, but large scale systems are usually implemented in more traditional languages like Java and C++. I will describe…
swirl-tbp: a package for interactively learning R programming and data science through the addition…
[MP4] [0:19:26] [2016/06/18]The R package 'swirl' allows users to learn R programming by completing interactive lessons within the R console. Lessons (written in the YAML mark-up language) can include educational content such as…
Flexible and Interpretable Regression Using Convex Penalties
[MP4] [1:00:23] [2016/06/17]We consider the problem of fitting a regression model that is both flexible and interpretable. We propose two procedures for this task: the Fused Lasso Additive Model (FLAM), which is an additive…
Spatial data in R: simple features and future perspectives
[MP4] [0:40:48] [2016/06/17]Simple feature access is an open standard for handling feature data (mostly points, lines and polygons) that has seen wide adoption in databases, javascript, and linked data. Currently, R does not…
Using R in a regulatory environment: FDA experiences.
[MP4] [0:20:25] [2016/06/17]The Food and Drug Administration (FDA) regulates products which account for approximately one fourth of consumer spending in the United States of America, and has global impact, particularly for…
brglm: Reduced-bias inference in generalized linear models
[MP4] [0:16:24] [2016/06/17]This presentation focuses on the brglm R package, which provides methods for reduced-bias inference in univariate generalised linear models and multinomial regression models with either ordinal or…
Providing Digital Provenance: from Modeling through Production
[MP4] [0:18:17] [2016/06/17]Reproducibility is important throughout the entire data science process. As recent studies have shown, subconscious biases in the exploratory analysis phase of a project can have vast repercussions…
Revisiting the Boston data set (Harrison and Rubinfeld, 1978)
[MP4] [0:17:42] [2016/06/17]In the extended topical sphere of Regional Science, more scholars are addressing empirical questions using spatial and spatio-temporal data. An emerging challenge is to alert "new arrivals"…
How to do one's taxes with R
[MP4] [0:15:24] [2016/06/17]In this talk it is shown how to generate a return of tax (German VAT) with R and send it over the internet to the tax administration. As this is certainly not a standard application for R (special…
Making the R community more open
[MP4] [0:16:51] [2016/06/17]The R community has historically done a pretty bad job at welcoming newcomers, and has been pretty "closed" ironically. Documentation written by experts for experts, email lists being a…
Bringing the Power of R to Citizen Data Scientists
[MP4] [0:11:08] [2016/06/17]Organizations have an increasing amount of data that can be converted into information that offers the ability to make better decisions, find new opportunities, and improve efficiency. R is a critical…
Approximate inference in R: A case study with GLMMs and glmmsr
[MP4] [0:18:29] [2016/06/17]The use of realistic statistical models for complex data is often hindered by the high cost of conducting inference about the model parameters. Because of this, it is sometimes necessary to use…
The simulator: An Engine for Streamlining Simulations
[MP4] [0:19:26] [2016/06/17]Methodological statisticians spend an appreciable amount of their time writing code for simulation studies. Every paper introducing a new method has a simulation section in which the new method is…
Extending CRAN packages with binaries: x13binary
[MP4] [0:15:13] [2016/06/17]The x13binary package provides pre-built binaries of X-13ARIMA-SEATS, the seasonal adjustment software by the U.S. Census Bureau. X-13 is anwell-established tool for de-seasonalization of timeseries,…
A spatial policy tool for cycling potential in England
[MP4] [0:18:05] [2016/06/17]Utility cycling is an increasingly common objective worldwide. The Propensity to Cycle Tool (PCT) www.pct.bike is a planning support system created using open source software; including R (Shiny) for…
Adding R, Jupyter and Spark to the toolset for understanding the complex computing systems at CERN's…
[MP4] [0:16:26] [2016/06/17]High Energy Physics (HEP) has a decades long tradition of statistical data analysis and of using large computing infrastructures. CERN's current flagship project LHC has collected over 100 PB of data,…
Simulation and power analysis of generalized linear mixed models
[MP4] [0:16:59] [2016/06/17]As computers have improved, so has the prevalence of simulation studies to explore implications for assumption violations and explore statistical power. The simglm package allows for flexible…
GNU make for reproducible data analysis using R and other statistical software
[MP4] [0:16:38] [2016/06/17]As a statistical consultant, I often find myself repeating similar steps for data analysis projects. These steps follow a pattern of reading, cleaning, summarising, plotting and analysing data then…
Tools for Robust R Packages
[MP4] [0:22:47] [2016/06/17]Building an R package is a great way of encapsulating code, documentation and data, in a single testable and easily distributable unit. At Mango we are building R packages regularly, and have been…
SpatialProbit for fast and accurate spatial probit estimations
[MP4] [0:16:47] [2016/06/17]This package meets the emerging needs of powerful and reliable models for the analysis of spatial discrete choice data. Since the explosion of available and voluminous geospatial and location data,…
How Teradata Aster R Scales Data Science
[MP4] [0:10:00] [2016/06/17]One of the key advantages of using R for data mining and machine learning is that one may use the same environmentÊfor both data munging and algorithm execution. The problem with R, however, is that…
Visualizing multifactorial and multi-attribute effect sizes in linear mixed models with a view…
[MP4] [0:19:10] [2016/06/17]In Brockhoff et al (2016), the close link between Cohen's d, the effect size in an ANOVA framework, and the so-called Thurstonian (Signal detection) d-prime was used to suggest better visualizations…
Run-time Testing Using assertive
[MP4] [0:14:16] [2016/06/17]assertive is a group of R packages that lets you check that your code is running as you want it to. assert_* functions test a condition and throw an error if it fails, letting you write robust code…
R markdown: Lifesaver or death trap?
[MP4] [0:16:12] [2016/06/17]The popularity of R markdown is unquestionable, but will it prove as useful to the blind community as it is for our sighted peers? The short answer is "yes" but the more realistic answer is…
mumm: An R-package for fitting multiplicative mixed models using the Template Model Builder (TMB)
[MP4] [0:17:00] [2016/06/17]Non-linear mixed models of various kinds are fundamental extensions of the linear mixed models commonly used in a wide range of applications. An important example of a non-linear mixed model is the…
Exploring the R / SQL boundary
[MP4] [0:21:28] [2016/06/17]Databases have a long history of delivering highly scalable solutions for storing, manipulating, and analyzing data, transaction processing and data warehousing, while R is the most widely used…
Classifying Murderers in Imbalanced Data Using randomForest
[MP4] [0:20:35] [2016/06/17]In order to allocate resources more effectively with the goal of providing safer communities, R's randomForest algorithm was used to identify candidates who may commit or attempt murder. And while…
Multivoxel Pattern Analysis of fMRI Data
[MP4] [0:16:27] [2016/06/17]Analysis of functional magnetic resonance imaging (fMRI) data has traditionally been carried out by analyzing each voxel's time-series independently with a linear model. While this approach has been…
Phylogenetically informed analysis of microbiome data using adaptive gPCA in R
[MP4] [0:17:43] [2016/06/17]When analyzing microbiome data, biologists often use exploratory methods that take into account the relatedness of the bacterial species present in the data. This helps in the interpretability and…
Efficient in-memory non-equi joins using data.table
[MP4] [0:16:21] [2016/06/17]A join operation combines two (or more) tables on some shared columns based on a condition. An equi-join is a case where this combination condition is defined by the binary operator $==$. It is a…
Network Diffusion of Innovations in R: Introducing netdiffuseR
[MP4] [0:15:00] [2016/06/17]The Diffusion of Innovations theory, while one of the oldest social science theories, has embedded and flowed in its popularity over its 100 year or so history. In contrast to contagion models,…
Grid Computing in R with Easy Scalability
[MP4] [0:16:59] [2016/06/17]Parallel computing is useful for speeding up computing tasks and many R packages exist to aid in using parallel computing. Unfortunately it is not always trivial to parallelize jobs and can take a…
Heatmaps in R: Overview and best practices
[MP4] [0:19:49] [2016/06/17]A heatmap is a popular graphical method for visualizing high-dimensional data, in which a table of numbers are encoded as a grid of colored cells. The rows and columns of the matrix are ordered to…
Efficient tabular data ingestion and manipulation with MonetDBLite
[MP4] [0:16:57] [2016/06/17]We present "MonetDBLite", a new R package containing an embedded version of MonetDB. MonetDB is a free and open source relational database focused on analytical applications. MonetDBLite…
On the emergence of R as a platform for emergency outbreak response
[MP4] [0:18:16] [2016/06/17]The recent Ebola virus disease outbreak in West Africa has been a terrible reminder of the necessities of rapid evaluation and response to emerging infectious disease threats. For such response to be…
Resource-Aware Scheduling Strategies for Parallel Machine Learning R Programs though RAMBO
[MP4] [0:18:25] [2016/06/17]We present resource-aware scheduling strategies for parallel R programs leading to efficient utilization of parallel computer architectures by estimating resource demands. We concentrate on…
ETL for medium data
[MP4] [0:17:27] [2016/06/17]Packages provide users with software that extends the core functionality of R, as well as data that illustrates the use of that functionality. However, by design the type of data that can be contained…
Meta-Analysis of Epidemiological Dose-Response Studies with the dosresmeta R package
[MP4] [0:15:27] [2016/06/17]Quantitative exposures (e.g. smoking, alcohol consumption) in predicting binary health outcomes (e.g. mortality, incidence of a disease) are frequently categorized and modeled with indicator…
Size of Datasets for Analytics and Implications for R
[MP4] [0:17:18] [2016/06/17]With so much hype about "big data" and the industry pushing for distributed computing vs traditional single-machine tools, one wonders about the future of R. In this talk I will argue that…
permuter: An R package for randomization inference
[MP4] [0:13:36] [2016/06/17]Software packages for randomization inference are few and far between. This forces researchers either to rely on specialized stand-alone programs or to use classical statistical tests that may require…
OPERA: Online Prediction by ExpeRts Aggregation
[MP4] [0:16:28] [2016/06/17]We present an R package for prediction of time series based on online robust aggregation of a finite set of forecasts (machine learning method, statistical model, physical model, human expertise,…
Predicting individual treatment effects
[MP4] [0:16:50] [2016/06/17]Treatments for complicated diseases often help some patients but not all and predicting the treatment effect of new patients is important in order to make sure every patient gets the best possible…
Notebooks with R Markdown
[MP4] [0:18:08] [2016/06/17]Notebook interfaces for data analysis have compelling advantages including the close association of code and output and the ability to intersperse narrative with computation. Notebooks are also an…
Importing modern data into R
[MP4] [0:14:56] [2016/06/17]This talk explores modern trends in data storage formats and the tools, packages and best practices to import this data into R. We will start with a quick recap of the existing tools and packages for…
Forty years of S
[MP4] [1:04:06] [2016/06/16]Bell Labs in the 1970s was a hotbed of research in computing, statistics and many other fields. The conditions there encouraged the growth of the S language and influenced its content. The 40th…
edeaR: Extracting knowledge from process data
[MP4] [0:18:53] [2016/06/16]During the last decades, the logging of events in a business context has increased massively. Information concerning activities within a broad range of business processes is recorded in so-called…
R in machine learning competitions
[MP4] [0:19:42] [2016/06/16]Kaggle is a community of almost 450K data scientists who have built almost 2MM machine learning models to participate in our competitions. Data scientists come to Kaggle to learn, collaborate and…
Using Spark with Shiny and R Markdown
[MP4] [0:16:18] [2016/06/16]R is well-suited to handle data that can fit in memory but additional tools are needed when the amount of data you want to analyze in R grows beyond the limits of your machine's RAM. There have been a…
Compiling parts of R using the NIMBLE system for programming algorithms
[MP4] [0:15:24] [2016/06/16]The NIMBLE R package provides a flexible system for programming statistical algorithms for hierarchical models specified using the BUGS language. As part of the system, we compile R code for…
Implementing R in old economy companies: From proof-of-concept to production
[MP4] [0:18:53] [2016/06/16]In old economy companies, the introduction of R is typically a button-up process that follows a pattern of three major stages of maturity: At the first stage, guerrilla projects use R parallel to the…
Two-sample testing in high dimensions
[MP4] [0:15:19] [2016/06/16]Estimation for high-dimensional models has been widely studied. However, uncertainty quantification remains challenging. We put forward novel methodology for two-sample testing in high dimensions…
Connecting R to the OpenML project for Open Machine Learning
[MP4] [0:16:31] [2016/06/16]OpenML is an online machine learning platform where researchers can automatically log and share data, code, and experiments, and organize them online to work and collaborate more effectively. We…
Bayesian analysis of generalized linear mixed models with JAGS
[MP4] [0:17:01] [2016/06/16]BUGS is a language for describing hierarchical Bayesian models which syntactically resembles R. BUGS allows large complex models to be built from smaller components. JAGS is a BUGS interpreter written…
R/qtl: Just Barely Sustainable
[MP4] [0:21:16] [2016/06/16]R/qtl is an R package for mapping quantitative trait loci (genetic loci that contribute to variation in quantitative traits, such as blood pressure) in experimental crosses (such as in mice). I began…
trackeR: Intrastructure for running and cycling data from GPS-enabled tracking devices in R
[MP4] [0:18:13] [2016/06/16]The use of GPS-enabled tracking devices and heart rate monitors is becoming increasingly common in sports and fitness activities. The trackeR package aims to fill the gap between the routine…
Fitting complex Bayesian models with R-INLA and MCMC
[MP4] [0:18:24] [2016/06/16]The Integrated Nested Laplace Approximation (INLA) provides a computationally efficient approach to obtaining an approximation to the posterior marginals for a large number of Bayesian models. In…
Automating our work away: One consulting firm's experience with KnitR
[MP4] [0:11:51] [2016/06/16]As consultants, many of the projects that we work on are similar, with many steps repeated verbatim across projects. Previously, our workflow was based largely in Microsoft Office, with our analysis…
Fry: A Fast Interactive Biological Pathway Miner
[MP4] [0:15:53] [2016/06/16]Gene set tests are often used in differential expression analyses to explore the behavior of a group of related genes. This is useful for identifying large-scale co-regulation of genes belonging to…
United Nations World Population Projections with R
[MP4] [0:16:19] [2016/06/16]Recently, the United Nations adopted a probabilistic approach to projecting fertility, mortality and population for all countries. In this approach, the total fertility and female and male life…
Distributed Computing using parallel, Distributed R, and SparkR
[MP4] [0:17:25] [2016/06/16]Data volume is ever increasing, while single node performance is stagnate. To scale, analysts need to distribute computations. R has built-in support for parallel computing, and third-party…
bayesboot: An R package for easy Bayesian bootstrapping
[MP4] [0:17:25] [2016/06/16]Introduced by Rubin in 1981, the Bayesian bootstrap is the Bayesian analogue to the classical non-parametric bootstrap and it shares the classical bootstrap's advantages: It is a non-parametric method…
jailbreakr: Get out of Excel, free
[MP4] [0:19:25] [2016/06/16]One out of every ten people on the planet uses a spreadsheet and about half of those use formulas: "Let's not kid ourselves: the most widely used piece of software for statistics is Excel."…
FlashR: Enable Parallel, Scalable Data Analysis in R
[MP4] [0:15:41] [2016/06/16]In the era of big data, R is rapidly becoming one of the most popular tools forndata analysis. But the R framework is relatively slow and unablento scale to large datasets. The general approach of…
Linking htmlwidgets with crosstalk and mobservable
[MP4] [0:17:46] [2016/06/16]The htmlwidgets package makes it easy to create interactive JavaScript widgets from R, and display them from the R console or insert them into R Markdown documents and Shiny apps. These widgets…
Continuous Integration and Teaching Statistical Computing with R
[MP4] [0:20:15] [2016/06/16]In this talk we will discuss two statistical computing courses taught as part of the undergraduate and masters curriculum in the Department of Statistical Science at Duke University. The primary goal…
Fast additive quantile regression in R
[MP4] [0:15:14] [2016/06/16]Quantile regression represents a flexible approach for modelling the impact of several covariates on the conditional distribution of the dependent variable, which does not require making any…
htmlwidgets: Power of JavaScript in R
[MP4] [0:17:27] [2016/06/16]htmlwidgets is an R package that provides a comprehensive framework to create interactive javascript based widgets, for use from R. Once created, these widgets can be used at the R console, embedded…
Transforming a museum to be data-driven using R
[MP4] [0:17:01] [2016/06/16]With the exponential growth of data, more and more businesses are demanding to become data-driven. Seeking value from their data, big data and data science initiatives; jobs and skill sets have risen…
Experiences on the Use of R in the Water Sector
[MP4] [0:16:43] [2016/06/16]In this study we present some real cases where R has been a key element on building decision support systems related to the water industry. We have used R in the context of automatic water demand…
Integrated R labs for high school students
[MP4] [0:17:07] [2016/06/16]The Mobilize project developed a year-long high school level Introduction to Data Science course, which has been piloted in 27 public schools in the Los Angeles Unified School District. The curriculum…
bigKRLS: Optimizing non-parametric regression in R
[MP4] [0:18:34] [2016/06/16]Data scientists are increasingly interested in modeling techniques involving relatively few parametric assumptions, particularly when analyzing large or complex datasets. Though many approaches have…
What can R learn from Julia
[MP4] [0:17:54] [2016/06/16]Julia, like R, is a dynamic language for scientific computing but, unlike R, it was explicitly designed to deliver performance competitive to traditional batch-compiled languages. To achieve this…
CVXR: An R Package for Modeling Convex Optimization Problems
[MP4] [0:15:33] [2016/06/16]CVXR is an R package that provides an object-oriented modeling language for convex optimization. It allows the user to formulate convex optimization problems in a natural mathematical syntax rather…
A Case Study in Reproducible Model Building: Simulating Groundwater Flow in the Wood River Valley…
[MP4] [0:22:05] [2016/06/16]The goal of reproducible model building is to tie processing instructions to data analysis so that the model can be recreated, better understood, and easily modified to incorporate new field…
Introducing Statistics with intRo
[MP4] [0:18:49] [2016/06/16]intRo is a modern web-based application for performing basic data analysis and statistical routines as well as an accompanying R package. Leveraging the power of R and Shiny, intRo implements common…
Multiple Hurdle Tobit models in R: The mhurdle package
[MP4] [0:16:25] [2016/06/16]mhurdle is a package for R enabling the estimation of a wide set of regression models where the dependent variable is left censored at zero, which is typically the case in household expenditure…
bamdit: An R Package for Bayesian meta-analysis of diagnostic test data
[MP4] [0:15:47] [2016/06/16]In this work we present the R package bamdit, its name stands for "Bayesian meta-analysis of diagnostic test-data". bamdit was developed with the aim of simplifying the use of models in…
Statistics and R in Forensic Genetics
[MP4] [0:16:52] [2016/06/16]Genetic evidence is often used as evidence in disputes. Mostly, the genetic evidence is DNA profiles and the disputes are often familial or crime cases. In this talk, we go through the statistical…
Modeling Food Policy Decision Analysis with an Interactive Bayesian Network in Shiny
[MP4] [0:15:51] [2016/06/16]The efficacy of policy interventions for socioeconomic challenges, like food insecurity, is difficult to measure due to a limited understanding of the complex web of causes and consequences. As an…
A first-year undergraduate data science course
[MP4] [0:17:08] [2016/06/16]In this talk we will discuss an R based first-year undergraduate data science course taught at Duke University for an audience of students with little to no computing or statistical background. The…
Detection of Differential Item Functioning with difNLR function
[MP4] [0:21:16] [2016/06/16]In this work we present a new method for detection of Differential Item Functioning (DIF) based on Non-Linear Regression. Detection of DIF has been considered one of the most important topics in…
FiveThirtyEight's data journalism workflow with R
[MP4] [0:22:12] [2016/06/16]FiveThirtyEight is a data journalism site that uses R extensively for charts, stories, and interactives. We've used R for stories covering: p-hacking in nutrition science; how Uber is affecting New…
Wrap your model in an R package!
[MP4] [0:16:48] [2016/06/16]The groundwater drawdown model WTAQ-2, provided by the United States Geological Survey for free, has been "wrapped" into an R package, which contains functions for writing input files,…
Teaching R to 200 people in a week
[MP4] [0:19:38] [2016/06/16]Across disciplines, scholars are waking up to the potential benefits of computational competence. This has created a surge in demand for computational education which has gone widely underserved.…
Literate Programming
[MP4] [1:05:44] [2016/06/16]The speaker will discuss what he considers to be the most important outcome of his work developing TeX in the 1980s, namely the accidental discovery of a new approach to programming --- which caused a…
How to keep your R code simple while tackling big datasets
[MP4] [0:13:06] [2016/06/16]Like many statistical analytic tools, R can be incredibly memory intensive. A simple GAM (generalized additive model) or K-nearest neighbor routine can devour many multiples of memory size compared to…
Covr: Bringing Code Coverage to R
[MP4] [0:17:21] [2016/06/16]Code coverage records whether or not each line of code in a package is executed by the package's tests. While it does not check whether a given program or test executes properly it does reveal areas…
flexdashboard: Easy interactive dashboards for R
[MP4] [0:15:37] [2016/06/16]Recently, dashboards have become a common means of communicating the results of data analysis, especially of real-time data, and with good reason: dashboards present information attractively, use…
R at Microsoft
[MP4] [0:19:07] [2016/06/16]Since the acquisition of Revolution Analytics in April 2015, Microsoft has embarked upon a project to build R technology into many Microsoft products, so that developers and data scientists can use…
Simulation of Synthetic Complex Data: The R-Package simPop
[MP4] [0:20:09] [2016/06/16]The production of synthetic datasets has been proposed as a statistical disclosure control solution to generate public use files out of protected data, and as a tool to create "augmented…
mlrMBO: A Toolbox for Model-Based Optimization of Expensive Black-Box Functions
[MP4] [0:17:18] [2016/06/16]Many practical optimization tasks, such as finding best parameters for simulators in engineering or hyperparameter optimization in machine learning, are of a black-box nature, i.e., neither formulas…
Crowd sourced benchmarks
[MP4] [0:17:06] [2016/06/16]One of the simplest ways to speed up your code is to buy a faster computer. While this advice is certainly trite, it is something that should still be considered. However it is often unclear to…
'AF' a new package for estimating the attributable fraction
[MP4] [0:19:13] [2016/06/16]The attributable fraction (or attributable risk) is a widely used measure that quantifies the public health impact of an exposure on an outcome. Even though the theory for AF estimation is well…
When will this machine fail?
[MP4] [0:17:28] [2016/06/16]In this talk, we demonstrate how to develop and deploy end-to-end machine learning solutions for predictive maintenance in manufacturing industry with R. For predictive maintenance, the following…
Deep Learning for R with MXNet
[MP4] [0:09:22] [2016/06/16]MXNet is a multi-language machine learning library to ease the development of ML algorithms, especially for deep neural networks. Embedded in the host language, it blends declarative symbolic…
broom: Converting statistical models to tidy data frames
[MP4] [0:19:35] [2016/06/16]The concept of "tidy data" offers a powerful and intuitive framework for structuring data to ease manipulation, modeling and visualization, and has guided the development of R tools such as…
Using Shiny modules to build more-complex and more-manageable apps
[MP4] [0:17:19] [2016/06/16]The release of Shiny 0.13 includes support for modules, allowing you to build Shiny apps more quickly and more reliably. Furthermore, using Shiny modules makes it easier for you to build more-complex…
The challenge of combining 176 x #otherpeoplesdata to create the Biomass And Allometry Database…
[MP4] [0:16:38] [2016/06/16]Despite the hype around "big data", a more immediate problem facing many scientific analyses is that large-scale databases must be assembled from a collection of small independent and…
Interactive Naive Bayes using Shiny: Text Retrieval, Classification, Quantification
[MP4] [0:16:43] [2016/06/16]Interactive Machine Learning (IML) is a relatively new area of ML where focused interactions between algorithms and humans allow for faster and more accurate model updates with respect to classical ML…
RServer: Operationalizing R at Electronic Arts
[MP4] [0:18:48] [2016/06/16]The motivation for the RServer project is the ability for data scientists at Electronic Arts to offload R computations from their personal machines to the cloud and to enable modeling at scale. The…
A Future for R
[MP4] [0:17:34] [2016/06/16]A future is an abstraction for a value that may be available at some point in the future and which state is either unresolved or resolved. When a future is resolved the value is readily available. How…
R AnalyticFlow 3: Interactive Data Analysis GUI for R
[MP4] [0:13:33] [2016/06/16]R AnalyticFlow 3 is an open-source GUI for data analysis on top of R. It is designed to simplify the process of data analysis for both R experts and beginners. It is written in Java and runs on…
Differential equation-based models in R: An approach to simplicity and performance
[MP4] [0:21:35] [2016/06/16]The world is a complex dynamical system, a system evolving in time and space in which numerous interactions and feedback loops produce phenomena that defy simple explanations. Differential-equation…
xgboost: An R package for Fast and Accurate Gradient Boosting
[MP4] [0:21:01] [2016/06/16]XGBoost is a multi-language library designed and optimized for boosting trees algorithms. The underlying algorithm of xgboost is an extension of the classic gradient boosting machine algorithm. By…
Applying R in streaming and Business Intelligence applications
[MP4] [0:14:24] [2016/06/16]R provides tremendous value to statisticians and data scientists. However, they are often challenged to integrate their work and extend that value to the rest of their organization. This presentation…
Rethinking R Documentation: an extension of the lint package
[MP4] [0:18:47] [2016/06/16]In this presentation I will present an extension to the lint package to assist with documentation of R objects. R is the de facto standard for literate programming thanks to packages such as. However,…
Visual Pruner: A Shiny app for cohort selection in observational studies
[MP4] [0:18:43] [2016/06/16]Observational studies are a widely used and challenging class of studies. A key challenge is selecting a study cohort from the available data, or "pruning" the data, in a way that produces…
Colour schemes in data visualisation: Bias and Precision
[MP4] [0:21:13] [2016/06/16]The technique of mapping continuous values to a sequence of colours, is often used to visualise quantitative data. The ability of different colour schemes to facilitate data interpretation has not…
Rectools: An Advanced Recommender System
[MP4] [0:16:10] [2016/06/16]Recommendation engines have a number of different applications. From books to movies, they enable the analysis and prediction of consumer preferences. The prevalence of recommender systems in both the…