I strat this project because manipulating data and making plots in R can sometimes be annoying. One may just don’t get what they want exactly even though they have tried many times. The Internet is a great place to looking for help, like Google and Stackoverflow, but you still may get more of a general guidence rather than the exact solution. Furthermore, if you don’t record the problem and solution in an appropriate way, you may face the exact struggle somewhere in the future. It’s obviously very inefficient. Therefore, I wrote this package and plans to keep updating it by adding functions about processing and plotting data in R. I will also keep writing and updating blogs about using this package to solve real data analysis problems. Now this package assembles functions from commonly used R package in explanatory data analysis to build a toolbox that can be easily implemented even for beginners in this field. It has a designed emphasis on business analysis, so it is especially useful in cross-sectional data or panel data analysis. The package tries to balance between simplicity and flexibility. Most functions can be easily implemented while offering further control parameters to customerize result.
The functions in this package is mainly distributed into the following three categories:
Data Processing Functions: transfer between numerical and categorical data, impute missing values, and get special summary information.
Data Visualization Functions: commonly used plots used to show patterns of data
Data Analysis Functions: functions about basic data analysis
The following table summarizes the available functions in the package and their features. It will be updated with the package.
Function’s Name | Function’s Feature |
---|---|
Data Processing Functions | |
column_class() | Separate features by categorical or numerical |
num2ctg() | Numerical variable to categorical variabel |
ord_ctg2num() | Ordinal categorical variable to numerical variable |
nom_ctg2num() | Nominal categorical variable to numerical variable by dummy |
impute_missing() | impute missing value by mice |
Data Visualization Functions | |
bar_plot() | Stacked bar plot for multiple categories |
bubble_plot() | Bubble plot with color and size showing more information |
corr_check() | Pairs plot to check correlation between variables |
distribution_plot() | Three plotting types to show distribution |
donut_plot() | Donut plot to show percentage |
double_axis() | Combine bar plot and line plot with double axis |
facet_bar() | Separate bar plots in small facet for multiple categories |
horizontal_bar() | Horizontal bar plot showing percentage |
label_bar_plot() | Bar plot with a small label |
lines_plot() | Basic lines plot with options |
lines_split_plot() | Using subplot to show series with different range |
line_ann_plot() | Line plot with information on turning points |
polar_cahrts() | Drawing radar plot |
rank_plot() | Rank change plot of categories over index |
Data Analysis Functions | |
cal_pct() | Calculate percentage of each category |
get_rank() | Get rank of each category |
lin_predict() | Linear extrapolation with trend DLM |
plm_basic() | Stepwise model selection for plm |