Friday, 2 June 2017

Hacking the principles of #openscience #workshops

R-bloggers
Hacking the principles of #openscience #workshops

(This article was first published on R – christopher lortie, and kindly contributed to R-bloggers)

In a previous post, I discussed the key elements that really stood out for me in recent workshops associated with open science, data science, and ecology. Summer workshop season is upon us, and here are some principles to consider that can be used to hack a workshop. These hacks can be applied a priori as an instructor or in situ as a participant or instructor by engaging with the context from a pragmatic, problem-solving perspective.

Principles

1. Embrace open pedagogy.
2. Use and current best practices from traditional teaching contexts.
3. Be learner centered.
4. Speak less, do more.
5. Solve authentic challenges.

Hacks (for each principle)

1. Prepare learning outcomes for every lesson.

2. Identify solve-a-problem opportunities in advance and be open to ones that emerge organically during the workshop.

3. Use no slide decks. This challenges the instructor to more directly engage with the students and participants in the workshop and leaves space for students to shape content and narrative to some extent. Decks lock all of us in. This is appropriate for some contexts such as conference presentations, but workshops can be more fluid and open.

4. Plan pauses. Prepare your lessons with gaps for contributions.  Prepare a list of questions to offer up for every lesson and provide time for discussion of solutions.

5. Use real evidence/data to answer a compelling question (scale can be limited, approach beta as long as an answer is provided, and the challenge can emerge if teaching is open and space provided for the workshop participants to ideate).

Final hack that is a more general teaching principle, consider keeping all teaching materials within a single ecosystem that then references outwards only as needed. For me, this has become all content prepared in RStudio, knitted to html, then pushed to GitHub gh-pages for sharing as a webpage (or site). Then participants can engage in all ideas and content including code, data, ideas in one place.

 

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R – christopher lortie.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

June Monthly Fantasy Baseball Writer’s Poll

Fantrax
June Monthly Fantasy Baseball Writer’s Poll

Editor’s Note: The following poll features MLB picks from the Fantrax writers for the month of June. Want to play fantasy baseball using stats for only the month? Now you can with our Fantrax monthly fantasy baseball leagues. Put together a team of 25 players under a salary cap with payouts up to 90%. Deadline is Monday, June 5!

Click here to join a league and build your team now!

 

 SheetsTrachtmanSauberCrossMilkinsBondBayLaBella
C SleeperTom MurphyTom MurphyAndrew KnappTom MurphyChristian VazquezMartin MaldonadoFrancisco CervelliManny Pina
1B SleeperJustin BourMatt AdamsJosh BellRhys HoskinsJesus AguilarMatt AdamsMatt AdamsYonder Alonso
2B SleeperWhit MerrifieldNeil WalkerChase UtleyChris TaylorWhit Merrifield Yolmer SanchezYolmer SanchezWhit Merrifield
3B SleeperAlex BregmanAdrian BeltreMatt DavidsonAlex BregmanJose ReyesYangervis SolarteAdrian BeltreWilmer Flores
SS SleeperTim AndersonTim AndersonJT RiddleJose PerazaChad PinderChris OwingsJose PerazaTim Anderson
OF SleeperBradley ZimmerBen RevereCameron MaybinDavid DahlFranchy CorderoKevin KiermaierAaron HicksMatt Holliday
SP SleeperSean ManaeaTrevor BauerTaijuan WalkerJameson TaillonTyson RossSean ManaeaJameson TaillonRobert Gsellman
RP SleeperRyan MadsonFelipe RiveroKoda GloverCorey KnebelCam BedrosianCam BedrosianMatt BushCorey Knebel
MiLB Call-UpAmed RosarioYoan MoncadaLewis BrinsonAustin MeadowsLucas GiolitoReynaldo LopezAmed RosarioLucas Gioloto
Most HRKhris DavisNelson CruzGiancarlo StantonKris BryantAaron JudgeJose BautistaAaron JudgeAaron Judge
Most SBBilly HamiltonBilly HamiltonBilly HamiltonBilly HamiltonBilly HamiltonJose PerazaBilly HamiltonBilly Hamilton
SP Lowest ERACarlos MartinezClayton KershawMax ScherzerClayton KershawClayton KershawJacob deGromTrevor BauerDallas Keuchel
SP Most K'sMax ScherzerChris SaleChris SaleChris SaleRobbie RayStephen StrasburgTrevor BauerChris Sale
RP Most SavesCraig KimbrelWade DavisCraig KimbrelKen GilesCody Allen Dellin BetancesKen GilesCraig Kimbrel
Team Most WinsDodgersCubsRed SoxAstrosAstrosIndiansYankeesAstros
Team Most LossesAngelsPadresRoyalsAngelsAngelsGiantsBrewersPhillies

The post June Monthly Fantasy Baseball Writer’s Poll appeared first on Fantrax.

The code (and other stuff…)

R-bloggers
The code (and other stuff…)

I’ve received a couple of emails or comments on one of the General Election posts to ask me to share the code I’ve used. 

In general, I think this is a bit dirty and lots could be done in a more efficient way $-$ effectively, I’m doing this out of my own curiosity and while I think the model is sensible, it’s probably not “publication-standard” (in terms of annotation etc).

Anyway, I’ve created a (rather plain) GitHub repository, which contains the basic files (including R script, R functions, basic data and JAGS model). Given time (which I’m not given…), I’d like to put a lot more description and perhaps also write a Stan version of the model code. I could also write a more precise model description $-$ I’ll try to update the material on the GitHub.

On another note, the previous posts have been syndicated in a couple of places (here and here), which was nice. And finally, here’s a little update with the latest data. As of today, the model predicts the following seats distribution.

                mean        sd 2.5% median 97.5%
Conservative 352.124 3.8760350  345    352   359
Labour       216.615 3.8041091  211    217   224
UKIP           0.000 0.0000000    0      0     0
Lib Dem       12.084 1.8752228    8     12    16
SNP           49.844 1.8240041   45     51    52
Green          0.000 0.0000000    0      0     0
PCY            1.333 0.9513233    0      2     3
Other          0.000 0.0000000    0      0     0

Labour are still slowly but surely gaining some ground $-$ I’m not sure the effect of the debate earlier this week (which was deserted by the PM) are visible yet as only a couple of the polls included were conducted after that.

Another interesting thing (following up on this post) is the analysis of the marginal seats that the model predicts to swing from the 2015 Winners. I’ve updated the plot, which now looks as below.


Now there are 30 constituencies that are predicted to change hand, many still towards the Tories. I am not a political scientists, so I don’t really know all the ins and outs of these, but I think a couple of examples are quite interesting and I would venture some comment…

So, the model doesn’t know about the recent by-elections of Copeland and Stoke-on-Trent South and so still label these seats as “Labour” (as they were in 2015), although the Tories have actually now control of Copeland.

In the prediction given the polls and the impact of the EU referendum (both were strong Leave areas with with 60% and 70% of the preference, respectively) and the Tories did well in 2015 (36% vs Labour’s 42% in Copeland and 33% to Labour’s 39% in 2015). So, the model is suggesting that both are likely to switch to the Tories this time around.

In fact, we know that at the time of the by-election, while Copeland (where the contest was mostly Labour v Tories) did go blue, Stoke didn’t. But there, the main battle was between the Labour’s and the UKIP’s candidate (UKIP had got 21% in 2015). And the by-election was fought last February, when the Tories lead was much more robust that it probably is now.

Another interesting area is Twickenham $-$ historically a constituency leaning to the Lib Dems, which was captured by the Conservatives in 2015. But since then, in another by-election the Tories have lost another similar area (Richmond Park,with a massive swing) and the model is suggesting that Twickenham could follow suit, come next Thursday. 


Finally, Clapton was the only seat won by UKIP in 2015, but since then, the elected MP (a former Tory-turned-UKIP) has defected the party and is not contesting the seat. This, combined with the poor standing of UKIP in the polls produces the not surprisingly outcome that Clapton is predicted to go blue with basically no uncertainty…

These results look reasonable to me $-$ not sure how life will turn out of course. As many commentators have noted much may depend on the turn out among the younger. Or other factors. And probably there’ll be another instance of the “Shy-Tory effect” (I’ll think about this if I get some time before the final prediction). But the model does seem to make some sense…

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script')); var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

Thursday, 1 June 2017

Python and R top 2017 KDnuggets rankings

R-bloggers
Python and R top 2017 KDnuggets rankings

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

The results of KDnuggets' 18th annual poll of data science software usage are in, and for the first time in three years Python has edged out R as the most popular software. While R increased its share of usage from 45.7% in last year's poll to 52.1% this year, Python's usage among data scientists increased even more, from 36.6% of users in 2016 to 52.6% of users this year.

KDnuggets2017

There were some interesting moves in the long tail, as well. Several tools entered the KDNuggets chart for the first time, including Keras (9.5% of users), PyCharm (9.0%) and Microsoft R Server (4.3%). And several returning tools saw big jumps in usage, including Microsoft Cognitive Toolkit (3.4% of users), Tensorflow (20.2%) and Power BI (10.2%). Microsoft SQL Server increased its share to 11.6% (up from 10.8%), whereas SAS (7.1%) and Matlab (7.4%) saw declines. Julia, somewhat surprisingly, remained flat at 1.1%.

For the complete results and analysis of the 2017 KDnuggets data science software poll, follow the link below.

KDnuggets: New Leader, Trends, and Surprises in Analytics, Data Science, Machine Learning Software Poll

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

A Primer in functional Programming in R (part -2)

R-bloggers
A Primer in functional Programming in R (part -2)

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

In the last exercise, We have seen how powerful functional programming principles can be and how it can drammatically increase the readablity of the code and how easily you can work with them .In this set of exercises we will look at functional programming principles with purrr.Purrr comes with a number of interesting features and is really useful in writing clean and concise code . Please check the documentation and load the purrr library in your R session before starting these exercise set .
Answers to the exercises are available here

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1
From the airquality dataset( available in base R ) , Find the mean ,median ,standard deviation of all columns using map functions .

Exercise 2

In the same dataset,find 95th percentile of each column excluding the NA values

Exercise 3
Load the iris dataset ,with help of pipe and map functions find out the mean of the relavant columns.Keep in mind mean is meant for numeric columns ,so you may need multiple map like functions.I expect the output as a dataframe .

Exercise 4
I have a vector x

Correcting bias in meta-analyses: What not to do (meta-showdown Part 1)

R-bloggers
Correcting bias in meta-analyses: What not to do (meta-showdown Part 1)

(This article was first published on R – Nicebread, and kindly contributed to R-bloggers)

tl;dr: Publication bias and p-hacking can dramatically inflate effect size estimates in meta-analyses. Many methods have been proposed to correct for such bias and to estimate the underlying true effect. In a large simulation study, we found out which methods do not work well under which conditions, and give recommendations what not to use.
Estimated reading time: 7 min.
It is well known that publication bias and p-hacking inflate effect size estimates from meta-analyses. In the last years, methodologists have developed an ever growing menu of statistical approaches to correct for such overestimation. However, to date it was unclear under which conditions they perform well, and what to do if they disagree. Born out of a Twitter discussion, Evan Carter, Joe Hilgard, Will Gervais and I did a large simulation project, where we compared the performance of naive random effects meta-analysis (RE), trim-and-fill (TF), p-curve, p-uniform, PET, PEESE, PET-PEESE, and the three-parameter selection model (3PSM).

Previous investigations typically looked only at publication bias or questionable research practices QRPs (but not both), used non-representative study-level sample sizes, or only compared few bias-correcting techniques, but not all of them. Our goal was to simulate a research literature that is as realistic as possible for psychology. In order to simulate several research environments, we fully crossed five experimental factors: (1) the true underlying effect, δ (0, 0.2, 0.5, 0.8); (2) between-study heterogeneity, τ (0, 0.2, 0.4); (3) the number of studies in the meta-analytic sample, k (10, 30, 60, 100); (4) the percentage of studies in the meta-analytic sample produced under publication bias (0%, 60%, 90%); and (5) the use of QRPs in the literature that produced the meta-analytic sample (none, medium, high).

This blog post summarizes some insights from our study, internally called “meta-showdown”. Check out the preprint; and the interactive app metaExplorer. The fully reproducible and reusable simulation code is on Github, and more information is on OSF.

In this blog post, I will highlight some lessons that we learned during the project, primarily focusing on what not do to when performing a meta-analysis.

Limitation of generalizability disclaimer: These recommendations apply to typical sample sizes, effect sizes, and heterogeneities in psychology; other research literatures might have different settings and therefore a different performance of the methods. Furthermore, the recommendations rely on the modeling assumptions of our simulation. We went a long way to make them as realistic as possible, but other assumptions could lead to other results.

Never trust a naive random effects meta-analysis or trim-and-fill (unless you meta-analyze a set of registered reports)

If studies have no publication bias, nothing can beat plain old random effects meta-analysis: it has the highest power, the least bias, and the highest efficiency compared to all other methods. Even in the presence of some (though not extreme) QRPs, naive RE performs better than all other methods. When can we expect no publication bias? If (and, in my opinion only if) we meta-analyze a set of registered reports.

But.

In any other setting except registered reports, a consequential amount of publication bias must be expected. In the field of psychology/psychiatry, more than 90% of all published hypothesis tests are significant (Fanelli, 2011) despite the average power being estimated as around 35% (Bakker, van Dijk, & Wicherts, 2012) – the gap points towards a huge publication bias. In the presence of publication bias, naive random effects meta-analysis and trim-and-fill have false positive rates approaching 100%:

More thoughts about trim-and-fill’s inability to recover δ=0 are in Joe Hilgard’s blog post. (Note: this insight is not really new and has been shown multiple times before, for example by Moreno et al., 2009, and Simonsohn, Nelson, and Simmons, 2014).

Our recommendation: Never trust meta-analyses based on naive random effects and trim-and-fill, unless you can rule out publication bias. Results from previously published meta-analyses based on these methods should be treated with a lot of skepticism.

 

Do not use p-curve and p-uniform for effect size estimation (under heterogeneity)

As a default, heterogeneity should always be expected – even under the most controlled conditions, where many labs perform the same computer-administered experiment, a large proportion showed significant and substantial amounts of between-study heterogeneity (cf. ManyLabs 1 and 3; see also our supplementary document for more details). p-curve and p-uniform assume homogeneous effect sizes, and their performance is impacted to a large extent by heterogeneity:

 

As you can see, all other methods retain the nominal false positive rate, but p-curve and p-uniform go through the roof as soon as heterogeneity comes into play (see also McShane, Böckenholt, & Hansen, 2016; van Aert et al., 2016).

Under H1, heterogeneity leads to overestimation of the true effect:

(additional settings for these plots: no QRPs, no publication bias, k = 100 studies, true effect size = 0.5)

Note that in their presentation of p-curve, Simonsohn et al. (2014) emphasize that, in the presence of heterogeneity, p-curve is intended as an estimate of the average true effect size among the studies submitted to p-curve (see here, Supplement 2). p-curve may indeed yield an accurate estimate of the true effect size among the significant studies, but in our view, the goal of bias-correction in meta-analysis is to estimate the average effect of all conducted studies. Of course this latter estimation hinges on modeling assumptions (e.g., that the effects are normally distributed), which can be disputed, and there might be applications where indeed the underlying true effect of all significant studies is more interesting.

Furthermore, as McShane et al (2016) demonstrate, p-curve and p-uniform are constrained versions of the more general three-parameter selection model (3PSM; Iyengar & Greenhouse, 1988). The 3PSM estimates (a) the mean of the true effect, δ, (b) the heterogeneity, τ, and (c) the probability that a non-significant result enters the literature, p. The constraints of p-curve and p-uniform are: 100% publication bias (i.e., p = 0) and homogeneity (i.e., τ = 0). Hence, for the estimation of effect sizes, 3PSM seems to be a good replacement for p-curve and p-uniform, as it makes these constraints testable.

Our recommendation: Do not use p-curve or p-uniform for effect size estimation when heterogeneity can be expected (which is nearly always the case).

Ignore overadjustments in the opposite direction

Many bias-correcting methods are driven by QRPs – the more QRPs, the stronger the downward correction. However, this effect can get so strong, that methods overadjust into the opposite direction, even if all studies in the meta-analysis are of the same sign:

 

Note: You need to set the option “Keep negative estimates” to get this plot.

Our recommendation: Ignore bias-corrected results that go into the opposite direction; set the estimate to zero, do not reject H₀.

Do not take it seriously if PET-PEESE does a reverse correction

Typical small-study effects (e.g., by p-hacking or publication bias) induce a negative correlation between sample size and effect size – the smaller the sample, the larger the observed effect size. PET-PEESE aims to correct for that relationship. In the absence of bias and QRPs, however, random fluctuations can lead to a positive correlation between sample size and effect size, which leads to a PET and PEESE slope of the unintended sign. Without publication bias, this reversal of the slope actually happens quite often.

See for example the next figure. The true effect size is zero (red dot), naive random effects meta-analysis slightly overestimates the true effect (see black dotted triangle), but PET and PEESE massively overadjust towards more positive effects:

 

PET-PEESE was never intended to correct in the reverse direction. An underlying biasing process would have to systematically remove small studies that show a significant result with larger effect sizes, and keep small studies with non-significant results. In the current incentive structure, I see no reason for such a process.

Our recommendation: Ignore the PET-PEESE correction if it has the wrong sign.

 

PET-PEESE sometimes overestimates, sometimes underestimates

A bias can be more easily accepted if it always is conservative – then one could conclude: “This method might miss some true effects, but if it indicates an effect, we can be quite confident that it really exists”. Depending on the conditions (i.e., how much publication bias, how much QRPs, etc.), however, PET/PEESE sometimes shows huge overestimation and sometimes huge underestimation.

For example, with no publication bias, some heterogeneity (τ=0.2), and severe QRPs, PET/PEESE underestimates the true effect of δ = 0.5:

In contrast, if no effect exists in reality, but strong publication bias, large heterogeneity and no QRPs, PET/PEESE overestimates at lot:

In fact, the distribution of PET/PEESE estimates looks virtually identical for these two examples, although the underlying true effect is δ = 0.5 in the upper plot and δ = 0 in the lower plot. Furthermore, note the huge spread of PET/PEESE estimates (the error bars visualize the 95% quantiles of all simulated replications): Any single PET/PEESE estimate can be very far off.

Our recommendation: As one cannot know the condition of reality, it is safest not to use PET/PEESE at all.

 

Recommendations in a nutshell: What you should not use in a meta-analysis

Again, please consider the “limitations of generalizability” disclaimer above.

When you can exclude publication bias (i.e., in the context of registered reports), do not use bias-correcting techniques. Even in the presence of some QRPs they perform worse than plain random effects meta-analysis. In any other setting except registered reports, expect publication bias, and do not use random effects meta-analysis or trim-and-fill. Both will give you a 100% false positive rate in typical settings, and a biased estimation. Under heterogeneity, p-curve and p-uniform overestimate the underlying effect and have false positive rates >= 50% Even if all studies entering a meta-analysis point into the same direction (e.g., all are positive), bias-correcting techniques sometimes overadjust and return a significant estimate of the opposite direction. Ignore these results, set the estimate to zero, do not reject H₀. Sometimes PET/PEESE adjust into the wrong direction (i.e., increasing the estimated true effect size)

As with any general recommendations, there might be good reasons to ignore them.

Additional technical recommendations The p-uniform package (v. 0.0.2) very rarely does not provide a lower CI. In this case, ignore the estimate. Do not run p-curve or p-uniform on

A Partial Remedy to the Reproducibility Problem

R-bloggers
A Partial Remedy to the Reproducibility Problem

(This article was first published on Mad (Data) Scientist, and kindly contributed to R-bloggers)

Several years ago, John Ionnidis jolted the scientific establishment with an article titled, “Why Most Published Research Findings Are False.” He had concerns about inattention to statistical power, multiple inference issues and so on. Most people had already been aware of all this, of course, but that conversation opened the floodgates, and many more issues were brought up, such as hidden lab-to-lab variability. In addition, there is the occasional revelation of outright fraud.

Many consider the field to be at a crisis point.

In the 2014 JSM, Phil Stark organized a last-minute session on the issue, including Marcia McNutt, former editor of Science and Yoav Benjamini of multiple inference methodology fame. The session attracted a standing-room-only crowd.

In this post, Reed Davis and I are releasing the prototype of an R package that we are writing, revisit, with the goal of partially remedying the statistical and data wrangling aspects of this problem. It is assumed that the authors of a study have supplied (possibly via carrots or sticks) not only the data but also the complete code for their analyses, from data cleaning up through formal statistical analysis.

There are two main aspects:

The package allows the user to “replay” the authors’ analysis, and most importantly, explore other alternate analyses that the authors may have overlooked. The various alternate analyses may be saved for sharing. Warn of statistical   errors, such as: overreliance on p-values; need for multiple inference procedures; possible distortion due to outliers; etc.

The term user here could refer to several different situations:

The various authors of a study, collaborating and trying different analyses during the course of the study. Reviewers of a paper submitted for publication on the results of the study. Fellow scientists who wish to delve further into the study after it is published.

The package has text and GUI versions. The latter is currently implemented as an RStudio add-in.

The package is on my GitHub site, and has a fairly extensive README file introducing the goals and usage.

To leave a comment for the author, please follow the link and comment on their blog: Mad (Data) Scientist.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Wednesday, 31 May 2017

Shiny: data presentation with an extra

R-bloggers
Shiny: data presentation with an extra

(This article was first published on R-posts.com, and kindly contributed to R-bloggers)


A Shiny app with three tabs presenting different sections of the same data.

Shiny is an application based on R/RStudio which enables an interactive exploration of data through a dashboard with drop-down lists and checkboxes—programming-free. The apps can be useful for both the data analyst and the public.

Shiny apps are based on the Internet: This allows for private consultation of the data on one’s own browser as well as for online publication. Free apps can handle quite a lot of data, which can be increased with a subscription.

The target user of Shiny is extremely broad. Let’s take science—open science. At a time when openly archiving all data is becoming standard practice (e.g., OSF.io, Figshare.com, Github.com), Shiny can be used to walk the extra mile by letting people tour the data at once without programming. It’s the right tool for acknowledging all aspects of the data. Needless to say, these apps do not replace raw data archiving.

The apps simply add. For instance, the data in the lower app is a little noisy, right? Indeed it shows none of the succession of waves that characterizes word reading. The app helped me in identifying this issue. Instead of running along a host of participants and electrodes through a heavy code score, I found that the drop-down lists of the app let me seamlessly surf the data. By Participant 7, though, my wave dropped me…
 

Those data were very poor—systematically poorer than those of any other participants. I then checked the EEG preprocessing logs, and confirmed that those data had to be discarded. So much for the analysis utility of such an app. On the open science utility, what I did on discovering the fault was maintain the discarded data in the app, with a note, so that any colleagues and reviewers could consider it too. Now, although this example of use concerns a rather salient trait in the data, some other Shiny app might help to spot patterns such as individual differences, third variables.

Building a Shiny app is not difficult. Apps basically draw on some data presentation code (tables, plots) that you already have. Then just add a couple of scripts into the folder: one for the user interface (named iu.R), one for the process (named server.R), and perhaps another one compiling the commands for deploying the app and checking any errors.

The steps to start a Shiny app from scratch are:

1: Tutorials. Being open-source software, the best manuals are available through a Google search.

2: User token. Signing up and reading in your private key—just once.

3: GO. Plunge into the ui and server scripts, and deployApp().

4: Bugs and logs. They are not bugs in fact—rather fancies. For instance, some special characters have to get even more special (technically, UTF-8 encoding). For a character such as μ, Shiny prefers Âμ. Just cling to error logs by calling:

showLogs(appPath = getwd(), appFile = NULL, appName = NULL, account = NULL, entries = 50, streaming = FALSE)

The log output will mention any typos and unaccepted characters, pointing to specific lines in your code.

It may take a couple of intense days to get a first app running. As usual with programming, it’s easy to run into the traps which are there to spice up the way. The app’s been around for years, so tips and tricks abound on the Internet. For greater companionship, there are dedicated Google groups, and then there’s StackExchange etc., where you can post any needs/despair. Post your code, log, and explanation, and you’ll be rescued out in a couple of days. Long live those contributors.

It will often be enough to upload a bare app, but you might then think it can look better.

5 (optional): Pro up.
Use tabs to combine multiple apps in one, use different widgets, etc. Tutorials like this one on Youtube can take you there, especially those that provide the code, as in the description of that video. Use those scripts as templates. For example, see this script in which the function conditionalPanel() is used to modify the app’s sidebar based on which tab is selected. The utility of tabs is illustrated in the upper cover of this article and in the app shown in the text: When having multiple data sections, the tabs allow you to have all in one (cover screenshot), instead of linking to other apps in each (screenshot in text).

Time for logistics. You can include any text in your app’s webpage, such as explanations of any length, web links, and author information. Oh, also importantly: the Shiny application allows for the presentation of data in any of the forms available in R—notably, plots and tables. Costs: Shiny is free to use, in principle. You may only have to pay if you use your app(s) a lot—first month, commonly—, in which case you may pay 9 euros a month. There are different subscription plans. The free plan allows 25 hours of use per month, distributed among up to five apps.

There do exist alternatives to Shiny. One is fairly similar: It’s called Tableau. A nice comparison of the two apps is available here. Then, there’s a more advanced application called D3.js, which allows for lush graphics but proves more restrictive to newbies.

In sum, if you already program in R or even in Python, but have not tried online data presentation yet, consider it.

Feel free to share your ideas, experiences, or doubts in a comment on the original post.

To leave a comment for the author, please follow the link and comment on their blog: R-posts.com.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

New online datacamp course: Forecasting in R

R-bloggers
New online datacamp course: Forecasting in R

(This article was first published on R-posts.com, and kindly contributed to R-bloggers)

Forecasting in R is taught by Rob J. Hyndman, author of the forecast package

Forecasting involves making predictions about the future. It is required in many situations such as deciding whether to build another power generation plant in the next ten years requires forecasts of future demand; scheduling staff in a call center next week requires forecasts of call volumes; stocking an inventory requires forecasts of stock requirements. Forecasts can be required several years in advance (for the case of capital investments), or only a few minutes beforehand (for telecommunication routing). Whatever the circumstances or time horizons involved, forecasting is an important aid to effective and efficient planning. This course provides an introduction to time series forecasting using R.

What You’ll Learn
Chapter 1: Exploring and Visualizing Time Series in R
The first thing to do in any data analysis task is to plot the data.
Chapter 2: Benchmark Methods and Forecast Accuracy
In this chapter, you will learn general tools that are useful for many different forecasting situations.
Chapter 3: Exponential Smoothing
is framework generates reliable forecasts quickly and for a wide range of time series.
Chapter 4: Forecasting with ARIMA Models
ARIMA models provide another approach to time series forecasting.
Chapter 5: Advanced Methods
In this chapter, you will look at some methods that handle more complicated seasonality.

You can start the free chapter for free of Forecasting in R.

To leave a comment for the author, please follow the link and comment on their blog: R-posts.com.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

simmer 3.6.2

R-bloggers
simmer 3.6.2

(This article was first published on R – Enchufa2, and kindly contributed to R-bloggers)

The second update of the 3.6.x release of simmer, the Discrete-Event Simulator for R, is on CRAN, thus inaugurating a bi-monthly release cycle. I must thank Duncan Garmonsway (@nacnudus) for creating and now maintaining “The Bank Tutorial: Part I” vignette, Franz Fuchs for finding an important and weird memory bug (here) that prevented simmer from freeing the allocated memory (all 3.x.x versions are affected up to this release), and the Rcpp people for enduring me while I was helplessly searching for a solution to this. 🙂

My special thanks to Kevin Ushey (@kevinushey), who finally found the bug. As it happens, the bug was not in simmer or Rcpp but in magrittr, and the problem is that the pipe operator, in its inscrutable magic, creates a new environment for unnamed functions (instead of the current one, as it should be), and there it stores a reference to the first object in the pipe. More or less. Further details here.

Anyway, if somebody faces the same problem, know that there is a workaround: you just need to delete that hidden reference, as simmer does in this release to get rid of the memory issues. Happy simmering!

Minor changes and fixes: Update “The Bank Tutorial: Part I” vignette (@nacnudus in #90). Fix trap()’s handler cloning and associated test (#91). Apply select()’s policy also when resources is a function (#92). Accept dynamic timeouts in batches (#93). Change rollback()’s default behaviour to times=Inf, i.e., infinite loop (#95). Stop and throw an error when timeout() returns a missing value (#96 and #97). Fix memory management: resetting the environment was clearing but not deallocating memory (#98, fixed in #99). Fix object destruction: workaround for tidyverse/magrittr#146 (#98, fixed in effcb6b).

Article originally published in Enchufa2.es: simmer 3.6.2.

To leave a comment for the author, please follow the link and comment on their blog: R – Enchufa2.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...