How To Convert Column Data To Row Data In R
Reshaping Your Data with tidyr
Although many fundamental data processing functions exist in R, they take been a flake convoluted to date and have lacked consistent coding and the power to easily flow together. This leads to difficult-to-read nested functions and/or choppy code. R Studio is driving a lot of new packages to collate data management tasks and ameliorate integrate them with other analysis activities. As a result, a lot of information processing tasks are becoming packaged in more cohesive and consistent ways, which leads to:
- More than efficient lawmaking
- Easier to remember syntax
- Easier to read syntax
tidyr
is a one such package which was built for the sole purpose of simplifying the process of creating tidy data. This tutorial provides you with the basic understanding of the four fundamental functions of data tidying that tidyr provides:
-
gather()
makes "wide" data longer -
spread()
makes "long" information wider -
separate()
splits a unmarried cavalcade into multiple columns -
unite()
combines multiple columns into a single column - Boosted Resource
Packages Utilized
install.packages ( "tidyr" ) library ( tidyr )
%>% Operator
Although not required, the tidyr and dplyr packages brand use of the pipe operator %>%
developed by Stefan Milton Bache in the R package magrittr. Although all the functions in tidyr and dplyr tin can be used without the pipe operator, ane of the bang-up conveniences these packages provide is the ability to string multiple functions together by incorporating %>%
.
This operator will forward a value, or the result of an expression, into the next function call/expression. For instance a function to filter data can be written as:
Both functions consummate the same task and the benefit of using %>%
is non evident; however, when you desire to perform multiple functions its reward becomes obvious. For more info bank check out the %>%
tutorial.
assemble( ) function:
Objective: Reshaping wide format to long format
Description: There are times when our data is considered unstacked and a common attribute of concern is spread out across columns. To reformat the data such that these common attributes are gathered together as a unmarried variable, the gather()
function volition take multiple columns and collapse them into key-value pairs, duplicating all other columns every bit needed.
Function : get together ( data , key , value , ... , na.rm = Imitation , convert = FALSE ) Same as : data %>% assemble ( primal , value , ... , na.rm = FALSE , catechumen = FALSE ) Arguments : data : data frame key : cavalcade name representing new variable value : cavalcade name representing variable values ... : names of columns to gather ( or not get together ) na.rm : pick to remove observations with missing values ( represented by NAs ) convert : if True will automatically convert values to logical , integer , numeric , complex or factor as appropriate
☛ This office is a complement to spread()
Example
We'll start with the following data set:
## Source: local data frame [12 x half dozen] ## ## Group Year Qtr.1 Qtr.2 Qtr.3 Qtr.four ## 1 1 2006 15 16 19 17 ## 2 ane 2007 12 13 27 23 ## three 1 2008 22 22 24 20 ## 4 1 2009 10 xiv twenty sixteen ## v 2 2006 12 thirteen 25 18 ## half-dozen 2 2007 16 14 21 19 ## 7 2 2008 13 11 29 15 ## 8 2 2009 23 20 26 20 ## ix iii 2006 11 12 22 sixteen ## 10 three 2007 13 11 27 21 ## xi three 2008 17 12 23 nineteen ## 12 three 2009 xiv 9 31 24
This data is considered wide since the time variable (represented as quarters) is structured such that each quarter represents a variable. To re-construction the fourth dimension component as an individual variable, nosotros can gather each quarter within 1 column variable and besides assemble the values associated with each quarter in a 2nd column variable.
long_DF <- DF %>% gather ( Quarter , Revenue , Qtr.1 : Qtr.four ) head ( long_DF , 24 ) # notation, for brevity, I just evidence the information for the offset two years ## Source: local data frame [24 x 4] ## ## Group Twelvemonth Quarter Revenue ## 1 one 2006 Qtr.one 15 ## 2 1 2007 Qtr.1 12 ## 3 1 2008 Qtr.one 22 ## four i 2009 Qtr.one 10 ## 5 2 2006 Qtr.1 12 ## half-dozen 2 2007 Qtr.1 xvi ## vii 2 2008 Qtr.1 thirteen ## 8 2 2009 Qtr.1 23 ## 9 3 2006 Qtr.1 11 ## ten 3 2007 Qtr.ane 13 ## .. ... ... ... ...
These all produce the aforementioned results:
DF %>% get together ( Quarter , Revenue , Qtr.1 : Qtr.four ) DF %>% gather ( Quarter , Acquirement , - Group , - Year ) DF %>% gather ( Quarter , Revenue , 3 : 6 ) DF %>% gather ( Quarter , Revenue , Qtr.1 , Qtr.2 , Qtr.3 , Qtr.4 )
Besides note that if you lot do not supply arguments for na.rm or convert values then the defaults are used.
separate( ) function:
Objective: Splitting a single variable into two
Description: Many times a single cavalcade variable volition capture multiple variables, or even parts of a variable you just don't intendance about. Some examples include:
## Grp_Ind Yr_Mo City_State First_Last Extra_variable ## 1 one.a 2006_Jan Dayton (OH) George Washington XX01person_1 ## 2 one.b 2006_Feb Grand Forks (ND) John Adams XX02person_2 ## three 1.c 2006_Mar Fargo (ND) Thomas Jefferson XX03person_3 ## 4 2.a 2007_Jan Rochester (MN) James Madison XX04person_4 ## 5 2.b 2007_Feb Dubuque (IA) James Monroe XX05person_5 ## half-dozen 2.c 2007_Mar Ft. Collins (CO) John Adams XX06person_6 ## seven three.a 2008_Jan Lake City (MN) Andrew Jackson XX07person_7 ## 8 3.b 2008_Feb Rushford (MN) Martin Van Buren XX08person_8 ## 9 iii.c 2008_Mar Unknown William Harrison XX09person_9
In each of these cases, our objective may be to separate characters within the variable string. This can be accomplished using the split up()
function which turns a single graphic symbol column into multiple columns.
Function : split ( data , col , into , sep = " " , remove = True , convert = FALSE ) Same as : data %>% split up ( col , into , sep = " " , remove = True , convert = FALSE ) Arguments : data : information frame col : column name representing electric current variable into : names of variables representing new variables sep : how to separate current variable ( char , num , or symbol ) remove : if Truthful , remove input cavalcade from output data frame convert : if TRUE will automatically catechumen values to logical , integer , numeric , circuitous or factor as appropriate
☛ This function is a complement to unite()
Case
We can become dorsum to our long_DF dataframe we created higher up in which way may desire to clean upwardly or separate the Quarter variable.
## Source: local data frame [6 x 4] ## ## Group Year Quarter Revenue ## i 1 2006 Qtr.1 15 ## ii 1 2007 Qtr.one 12 ## three one 2008 Qtr.one 22 ## 4 i 2009 Qtr.1 10 ## five 2 2006 Qtr.1 12 ## half dozen ii 2007 Qtr.1 16
Past applying the split()
function nosotros get the following:
separate_DF <- long_DF %>% separate ( Quarter , c ( "Time_Interval" , "Interval_ID" )) head ( separate_DF , 10 ) ## Source: local data frame [10 ten five] ## ## Group Yr Time_Interval Interval_ID Acquirement ## 1 1 2006 Qtr i 15 ## ii 1 2007 Qtr 1 12 ## 3 ane 2008 Qtr one 22 ## four 1 2009 Qtr 1 10 ## 5 two 2006 Qtr i 12 ## six ii 2007 Qtr i 16 ## 7 ii 2008 Qtr 1 13 ## 8 2 2009 Qtr one 23 ## 9 3 2006 Qtr 1 11 ## 10 3 2007 Qtr ane 13
These produce the aforementioned results:
long_DF %>% separate ( Quarter , c ( "Time_Interval" , "Interval_ID" )) long_DF %>% split up ( Quarter , c ( "Time_Interval" , "Interval_ID" ), sep = "\\." )
unite( ) function:
Objective: Merging two variables into i
Description: There may exist a fourth dimension in which we would like to combine the values of two variables. The unite()
function is a convenience function to paste together multiple variable values into one. In essence, it combines 2 variables of a unmarried observation into one variable.
Office : unite ( data , col , ... , sep = " " , remove = TRUE ) Same equally : data %>% unite ( col , ... , sep = " " , remove = TRUE ) Arguments : data : data frame col : column proper noun of new "merged" column ... : names of columns to merge sep : separator to utilize between merged values remove : if True , remove input column from output information frame
☛ This function is a complement to separate()
Instance
Using the separate_DF dataframe nosotros created higher up, nosotros can re-unite the Time_Interval and Interval_ID variables we created and re-create the original Quarter variable we had in the long_DF dataframe.
unite_DF <- separate_DF %>% unite ( Quarter , Time_Interval , Interval_ID , sep = "." ) caput ( unite_DF , 10 ) ## Source: local data frame [ten 10 iv] ## ## Group Year Quarter Revenue ## 1 1 2006 Qtr.one fifteen ## 2 one 2007 Qtr.1 12 ## three one 2008 Qtr.1 22 ## 4 1 2009 Qtr.ane 10 ## 5 2 2006 Qtr.one 12 ## six 2 2007 Qtr.1 16 ## 7 2 2008 Qtr.one 13 ## 8 2 2009 Qtr.1 23 ## 9 3 2006 Qtr.ane xi ## 10 3 2007 Qtr.1 thirteen
These produce the same results:
separate_DF %>% unite ( Quarter , Time_Interval , Interval_ID , sep = "_" ) separate_DF %>% unite ( Quarter , Time_Interval , Interval_ID ) # If no spearator is identified, "_" volition automatically exist used
spread( ) part:
Objective: Reshaping long format to broad format
Clarification: In that location are times when we are required to turn long formatted data into wide formatted data. The spread()
part spreads a cardinal-value pair across multiple columns.
Part : spread ( data , key , value , fill = NA , convert = False ) Aforementioned every bit : data %>% spread ( key , value , fill = NA , convert = FALSE ) Arguments : data : data frame key : column values to convert to multiple columns value : single column values to convert to multiple columns ' values make full: If there isn' t a value for every combination of the other variables and the key cavalcade , this value will be substituted convert : if True will automatically convert values to logical , integer , numeric , complex or factor as appropriate
☛ This function is a complement to gather()
Case
wide_DF <- unite_DF %>% spread ( Quarter , Acquirement ) head ( wide_DF , 24 ) ## Source: local data frame [12 10 half dozen] ## ## Group Twelvemonth Qtr.1 Qtr.ii Qtr.iii Qtr.4 ## 1 1 2006 15 16 19 17 ## 2 ane 2007 12 13 27 23 ## 3 one 2008 22 22 24 20 ## 4 1 2009 x 14 20 sixteen ## 5 2 2006 12 thirteen 25 eighteen ## 6 ii 2007 sixteen fourteen 21 19 ## 7 2 2008 thirteen 11 29 15 ## 8 2 2009 23 20 26 xx ## 9 3 2006 11 12 22 16 ## 10 3 2007 13 eleven 27 21 ## 11 iii 2008 17 12 23 19 ## 12 iii 2009 14 ix 31 24
Boosted Resources
- Data wrangling presentation I gave at Miami University
- R Studio's Information wrangling with R and RStudio webinar
- R Studio's Data wrangling GitHub repository
- R Studio's Data wrangling cheat canvass
- Hadley Wickham's paper on Tidy Information
How To Convert Column Data To Row Data In R,
Source: https://uc-r.github.io/tidyr
Posted by: palmisanosciallsolle.blogspot.com
0 Response to "How To Convert Column Data To Row Data In R"
Post a Comment