- Aggregating functions now generate user friendly errors when the
data
argument is not an environment or data frame. - We have fixed some bugs that arose in the "emergency" release of 0.11
mm()
has been deprecated and replaced withgwm()
which does groupwise models where the response may be either categorical or quantitative.- Improvements have been made to
plotModel()
. This is likely still not the final version, but we are getting closer. - Improvements have been made to naming in objects created with
do()
. - Dots in
dotPlot()
are now the same size in all panels of multi-panel plots. cdist()
has been rewritten.
mplot()
on a data frame now (a) prompts the user for the type of plot to create and (b) has an added option to make line plots for time series and the like.resample()
can now do residual resampling from a linear model.- Improvements have been made that make it easier to use
do()
to create common bootstrap confidence intervals. In particular,confint()
can now calculate three kinds of intervals in many common situations. fetchData()
,fetchGoogle()
, andfetchGapminder()
have been moved to a separate package, calledfetch()
.plotModel()
can be used to show data and model fits for a variety of models created withlm()
orglm()
.
- At the request of several users, and with CRAN's approval, we have made
mosaicData
a dependency ofmosaic
. This avoids the problem of users forgetting to separately load themosaicData
package. - We are planning to remove
fetchGoogle()
(and perhapsread.file()
) from future versions of the package. More and more packages are providing utilities for bringing data into R and it doesn't make sense for us to duplicate those efforts in this package. For google sheets, you might take a look at thegooglesheets
package which is avialable via github now and will be on CRAN soon. - Improved output to
binom.test()
,prop.test()
, andt.test()
, which have also undergone some internal restructuring. The objects returned now do a better job of reporting about the test conducted. In particular,binom.test()
andprop.test()
will report the value ofsuccess
used.(#450, #455) binom.test()
can now compute several different kinds of confidence intervals including the Wald, Plus-4 and Agresti-Coull intervals. (#449)derivedFactor()
now handles NAs without throwing a warning. (#451)- Improved
pdist()
,pdist()
and related functions now do a better (i.e., useful) job with discrete distributions (#417) - Bug fixes in several functions that use non-standard evaluation improve their robustness and scope. This affects
t.test()
and all the "aggregating" functions likemean()
andfavstats()
. In particular, it is now possible to reference variables both in thedata
argument and in the calling environment. (#435) CIAdata()
now provides a message indicating the source URL for the data retrieved (#444)- Bug fixes to
CIAdata()
that seem to be related to a changed in file format at the CIA World Factobook website. The "inflation" data set is still broken (on the CIA website). (#441) read.file()
now uses functions fromreadr
in some cases. A message is produced indicating which reader is being used. There are also some API changes. In particular, character data will be returned as character rather than factor. Seefactorize()
for an easy way to convert things with few unique values into factors. (#442)- A major vignette housecleaning has happened. Some vignettes have been removed from the package and vignettes inside the package are now compiled as part of package building to allow for more consistent checking of vignette contents. "Less Volume, More Creativity" has been reformatted from slides into a more typical vignette format. (#438)
mutate()
is used in place oftransform()
in the examples. (#452)- Some minor tidying of the markdown templates (#454)
tally()
now produces counts by default for all formula shapes. Proportions or percentages must be requested explicitly. This is to avoid common errors, especially when feeding the results intochisq.test()
.- Introduction of
msummary()
. Usually this is identical tosummary()
, but for a few kids of objects it provides modified output that is less verbose. - By default
do * lm( )
will now keep track of the F statistic, too. \itemconfint()
applied to an object produced usingdo()
now does more appropriate things. binom.test()
andprop.test()
now setsuccess = 1
by default on 0-1 data to treat 0 like failure and 1 like success. Similarly,prop()
andcount()
setlevel = 1
by default.CIsim()
can now produce plots and does so by default whensamples <= 200
.- implementation of
add=TRUE
improved forplotDist()
. - Added
swap()
which is useful for creating randomization distributions for paired designs. The current implementation is a bit slow.
We will improve that by implementing part of the code in C++. - Some additional functions are now formula-aware:
MAD()
,SAD()
, andquantile()
. docFile()
introduced to simplify accessing files included with package documentation.read.file()
enhanced to take a package as an argument and look among package documentation files.factorize()
introduced as a way to convert vectors with few unique values into factors. Can be applied to an entire data frame.
- The data sets formerly in this pacakge have been separated out into two
additional packages:
NHANES
contains theNHANES
data set andmosaicData
contains the other data sets. MAD()
andSAD()
were added to compute mean and sum of all pairs of absolute differences.- Facilities for making choropleth maps has been added. The API for these tools is still under development and may change in future releases.
rspin()
has been added to simulate spinning a spinner.- Two additional vignettes are included. Less Volume, More Creativity
outlines how to use the
mosaic
package to simplify R for beginners.
The other vignette illustrates many of the plotting features added by themosaic
package. - The mosaic package now contains two RMarkdown templates (one fancy and one plain).
plotFun()
has been improved so that it does a better job of selecting points where the function is evaluated and no longer warns aboutNaN
s encountered while exploring the domain of the function.oddsRatio()
has been redesigned andrelrisk()
has been added. Use theirsummary()
methods orverbose=TRUE
to see more information (including confidence intervals).- Added
Birthdays
data set.
- A generic
mplot()
and several instances have been added to make a number of plots easy to generate. There are methods for objects of classes"data.frame"
,"lm"
,"summary.lm"
,"glm"
,"summary.glm"
,"TukeyHSD"
, and"hclust"
. For several of these there are alsofortify
methods that return the data frame created to facilitate plotting. read.file()
now handles (some?) https URLs and accepts an optional argumentfiletype
that can be used to declare the type of data file when it is not identified by extension.- The default for
useNA
in thetally()
function has changed to"ifany"
. mosaic
now depends ondplyr
both to use some of its functionality and to avoid naming collisions with functions liketally()
anddo()
, allowingmosaic
anddplyr
to coexist more happily.- some improvements to dot plots with
dotPlot()
. In particular, the size of the dots is determined differently and works better more of the time. Dots were also shifted down by .5 units so that they
do not hover above the x-axis so much. This means that (with default sizing) the tops of the dots are approximately located at a height equivalent to the number of dots rather than the center of the dots. - fixed a bug in
do()
that caused it to scope incorrectly in some edge cases when a variable had the same name as a function. ntiles()
has been reimplemented and now has more formatting options.- introduction of
derivedFactor()
for creating factors from logical "cases".
- The
HELP
data set has been removed from the package.
It was deprecated in version 0.5. UseHELPrct
instead. plotDist()
now acceptsadd=TRUE
andunder=TRUE
, making it easy to add plots of distributions over (or under) plots of data (e.g., histograms, densityplots, etc.) or other distributions.- Plotting funcitons with with the option
add=TRUE
have been reimplemented usinglayer
fromlatticeExtra
. See documentaiton of these functions for details. ladd()
has been completely reimplemented usinglayer()
fromlatticeExtra
. See documentation ofladd()
for details, including some behavior changes.- aggregating functions (
mean()
,sd()
,var()
, et al) now usegetOptions("na.rm")
to determine the default value ofna.rm
. Useoptions(na.rm=TRUE)
to change the default behavior to removeNA
s and options(na.rm=NULL) to restore defaults. do()
has been largely rewritten with an eye toward improved efficiency. In particular,do()
will take advantage of multiple cores if theparallel
package is available. At this point, sluggishness in applications ofdo()
are mostly likely due to the sluggishness of what is being done, not todo()
itself.- Added an additional method to
deltaMethod()
from thecar
package to make it easier to propagate uncertainty in some situations that commonly arise in the physical sciences and engineering. - Added
cdist()
to compute critical values for the central portion of a distribution. - Some changes to the API for
qdata()
. For interactive use, this should not cause any problems, but old programmatic uses ofqdata()
should be checked as the object returned is now different. - Fixed a bug that caused aggregating functions (
sum()
,mean()
,sd()
, etc.) to produce counter-intuititve results (but with a warning). The results are now what one would expect (and the warning is removed). - Added
rsquared()
for extracting r-squared from models and model-like objects (r.squared()
has been deprecated). do()
now handles ANOVA-like objects bettermaggregate()
is now built on some improved behind the scenes functions. Among other features, thegroups
argument is now incorporated as an alternative method of specifying the goups to aggregate over and themethod
argument can be set to"ddply"
to useddply()
from theplyr
package for aggregation. This results in a different output format that may be desired in some applications. \item Thecdata()
,pdata()
andqdata()
functions have been largely rewritten. In addition,cdata_f()
,pdata_f()
andqdata_f()
are provided which produce similar results but have a formula in the first arguemnt slot.- Fixed bug in vignette generation. Static PDFs are now installed in
doc/
and so are available from within the package as well as via links to external files. - Added
fetchGapminder()
for fetching data sets originally from Gapminder. - Added
cdata()
for finding end points of a central portion of a variable. - Name changes in functions like
prop()
to avoid internal:
which makes downstream processing messier. - Improved detection of the availability of
manipulate()
(RStudio) - Surface plots produced by
plotFun()
can be used withoutmanipulate()
. This makes it possible to put surface plots into RMarkdown or Rnw files or to generate them outside of RStudio. do() * rflip()
now records proportion heads as well as counts of heads and tails.- Added functions
mosaicLatticeOptions()
andrestoreLatticeOptions()
to switch back and forth betweenlattice
defaults andmosaic
defaults. dotPlot()
uses a different algorithm to determine dot sizes. (Still not perfect, butcex
can be used to further scale the dots.)- adjustments to
histogram()
so thatnint
matches the number of bins used more accurately. - fixed coding error in the HELP datasets so that
i2
: max number of drinks is at least as large asi1
: the average number of drinks. - removed the deprecated HELP dataset (now called HELPrct)
- Various minor bug fixes and internal improvements.
- Various improvements and bug fixes to
D()
andantiD()
. - In RStudio,
mPlot()
provides an interactive environment for creatinglattice
andggplot2
plots. - Some support for producing maps has been introduced, notably
sp2df()
for converting SpatialPolygonDataFrames to regular data frames (which is useful for plotting withggplot2
, for example). Also theCountries
data frame facilitates mapping country names among different sources of map data. - Data frames returned by
do()
are now marked as such so thatconfint()
can behave differently for such data frames and for "regular" data frames. t.test()
can now do 1-sample t-test described using a formula.- Aggregating functions (e.g.
mean()
,var()
, etc. using a formula interface) have been completely reimplemented and additional aggregating functions are provided. - An
ntiles()
function has been added to facilitate creating factors based on quantile ranges. - Changes in format to
RailTrail
dataset. - Minor changes in documentation.
- Added vignettes: Starting with R and A Compendium of Commands to Teach Statistics.
- Plan to deprecate datasets from the Carnegie Melon University Online Learning Initiative Statistics Modules in next release.
xhistogram()
is now deprecated. Usehistogram()
instead.
- Added vignette: Minimal R for Intro Stats.
- Implemented symbolic integration for simple functions.
- Aggregating functions (
mean()
,max()
,median()
,var()
, etc.) now usegetOption('na.rm')
to determine default behavior. - Various bug fixes in
var()
allow it to work in a wider range of situations. - Augmented
TukeyHSD()
so that explicit use ofaov()
is no longer required - Added
panel.lmbands()
for plotting confidence and prediction bands in linear regression - Some data cleaning in the Carnegie Melon University Online Learning Initiative Statistics Modules. In particular
the name collision with
Animals
fromMASS
has been removed by renaming the data setGestationLongevity
. - Added
freqpolygon()
for making frequency polygons. - Added
r.squared()
for extracting r-squared from models and model-like objects. - Modified names of data frame produced by
do()
so that hyphens ('-') are turned into dots ('.') - Improvements to
fetchData()
.
We are still in beta, but we hope things are beginning to stabilize as we settle on syntax and coding idioms for the package. Here are some of the key updates since 0.4:
- removed dependency on RCurl since it caused installation problems for some PC users. (Code requiring RCurl now checks at run time whether the package is available.)
- further improvements to formula interfaces to common functions. The conditional | now works in more situations and & has been replaced by + so that formulas look more like the formulas
used in
lm()
and its cousins. - inclusion of the datasets from the Carnegie Mellon University Online Learning Initiative Statistics modules. These are in alpha form and some additional
data cleaning and renaming may happen in the near future.
\item
makeFun()
now has methods for glm and nls objects D()
improved to use symbolic differentiation in more cases and allow pass through tostats::D()
when that makes sense. This allows functions like deltaMethod() from the car package to work properly even when the mosaic package is loaded.- The API for
antiD()
has been modified somewhat. This may go through another revision if/when we add in symbolic differentiation, but we think we are now close to the end state. - The HELP dataset has been replaced by the HELPrct dataset, and the former will be deprecated in the next release.
- The CPS data set has been renamed CPS85.
fitSpline()
andfitModel()
have been added as wrappers around linear models using ns(), bs(), and nls(). Each of these returns the model fit as a function.- improvements to the vignettes.
- renamed mtable() to tally(), added new functionality
- reimplemented D() and antiD()
- improvements to statTally()
- new confint() functionality
- makeFun() and plotFun() interface to plotting using formulas
- added new vignette on Teaching Calculus using R
- added new vignette on Resampling-Based Inference using R
- changed default behavior for aggregating functions na.rm option so that it defaults to usual behavior unless given a formula as argument