banner



How To Convert Column Data To Row Data In R

Reshaping Your Data with tidyr

Although many fundamental data processing functions exist in R, they take been a flake convoluted to date and have lacked consistent coding and the power to easily flow together. This leads to difficult-to-read nested functions and/or choppy code. R Studio is driving a lot of new packages to collate data management tasks and ameliorate integrate them with other analysis activities. As a result, a lot of information processing tasks are becoming packaged in more cohesive and consistent ways, which leads to:

  • More than efficient lawmaking
  • Easier to remember syntax
  • Easier to read syntax

tidyr is a one such package which was built for the sole purpose of simplifying the process of creating tidy data. This tutorial provides you with the basic understanding of the four fundamental functions of data tidying that tidyr provides:

  • gather() makes "wide" data longer
  • spread() makes "long" information wider
  • separate() splits a unmarried cavalcade into multiple columns
  • unite() combines multiple columns into a single column
  • Boosted Resource

Packages Utilized

                          install.packages              (              "tidyr"              )                                          library              (              tidyr              )                                                  

%>% Operator

Although not required, the tidyr and dplyr packages brand use of the pipe operator %>% developed by Stefan Milton Bache in the R package magrittr. Although all the functions in tidyr and dplyr tin can be used without the pipe operator, ane of the bang-up conveniences these packages provide is the ability to string multiple functions together by incorporating %>%.

This operator will forward a value, or the result of an expression, into the next function call/expression. For instance a function to filter data can be written as:

filter(data, variable == numeric_value)
or
data %>% filter(variable == numeric_value)

Both functions consummate the same task and the benefit of using %>% is non evident; however, when you desire to perform multiple functions its reward becomes obvious. For more info bank check out the %>% tutorial.

assemble( ) function:

Objective: Reshaping wide format to long format

Description: There are times when our data is considered unstacked and a common attribute of concern is spread out across columns. To reformat the data such that these common attributes are gathered together as a unmarried variable, the gather() function volition take multiple columns and collapse them into key-value pairs, duplicating all other columns every bit needed.

gather() function

                          Function              :                                          get together              (              data              ,                                          key              ,                                          value              ,                                          ...              ,                                          na.rm                                          =                                          Imitation              ,                                          convert                                          =                                          FALSE              )                                          Same                                          as              :                                          data                                          %>%                                          assemble              (              primal              ,                                          value              ,                                          ...              ,                                          na.rm                                          =                                          FALSE              ,                                          catechumen                                          =                                          FALSE              )                                          Arguments              :                                          data              :                                          data                                          frame                                          key              :                                          cavalcade                                          name                                          representing                                          new                                          variable                                          value              :                                          cavalcade                                          name                                          representing                                          variable                                          values                                          ...              :                                          names                                          of                                          columns                                          to                                          gather                                          (              or                                          not                                          get together              )                                          na.rm              :                                          pick                                          to                                          remove                                          observations                                          with                                          missing                                          values                                          (              represented                                          by                                          NAs              )                                          convert              :                                          if                                          True                                          will                                          automatically                                          convert                                          values                                          to                                          logical              ,                                          integer              ,                                          numeric              ,                                          complex                                          or                                          factor                                          as                                          appropriate                                                  

This office is a complement to spread()

Example

We'll start with the following data set:

                          ## Source: local data frame [12 x half dozen]                                          ##                                                        ##    Group Year Qtr.1 Qtr.2 Qtr.3 Qtr.four                                          ## 1      1 2006    15    16    19    17                                          ## 2      ane 2007    12    13    27    23                                          ## three      1 2008    22    22    24    20                                          ## 4      1 2009    10    xiv    twenty    sixteen                                          ## v      2 2006    12    thirteen    25    18                                          ## half-dozen      2 2007    16    14    21    19                                          ## 7      2 2008    13    11    29    15                                          ## 8      2 2009    23    20    26    20                                          ## ix      iii 2006    11    12    22    sixteen                                          ## 10     three 2007    13    11    27    21                                          ## xi     three 2008    17    12    23    nineteen                                          ## 12     three 2009    xiv     9    31    24                                                  

This data is considered wide since the time variable (represented as quarters) is structured such that each quarter represents a variable. To re-construction the fourth dimension component as an individual variable, nosotros can gather each quarter within 1 column variable and besides assemble the values associated with each quarter in a 2nd column variable.

                          long_DF                                          <-                                          DF                                          %>%                                          gather              (              Quarter              ,                                          Revenue              ,                                          Qtr.1              :              Qtr.four              )                                          head              (              long_DF              ,                                          24              )                                          # notation, for brevity, I just evidence the information for the offset two years                                                        ## Source: local data frame [24 x 4]                                          ##                                                        ##    Group Twelvemonth Quarter Revenue                                          ## 1      one 2006   Qtr.one      15                                          ## 2      1 2007   Qtr.1      12                                          ## 3      1 2008   Qtr.one      22                                          ## four      i 2009   Qtr.one      10                                          ## 5      2 2006   Qtr.1      12                                          ## half-dozen      2 2007   Qtr.1      xvi                                          ## vii      2 2008   Qtr.1      thirteen                                          ## 8      2 2009   Qtr.1      23                                          ## 9      3 2006   Qtr.1      11                                          ## ten     3 2007   Qtr.ane      13                                          ## ..   ...  ...     ...     ...                                                  

These all produce the aforementioned results:

                          DF                                          %>%                                          get together              (              Quarter              ,                                          Revenue              ,                                          Qtr.1              :              Qtr.four              )                                          DF                                          %>%                                          gather              (              Quarter              ,                                          Acquirement              ,                                          -              Group              ,                                          -              Year              )                                          DF                                          %>%                                          gather              (              Quarter              ,                                          Revenue              ,                                          3              :              6              )                                          DF                                          %>%                                          gather              (              Quarter              ,                                          Revenue              ,                                          Qtr.1              ,                                          Qtr.2              ,                                          Qtr.3              ,                                          Qtr.4              )                                                  

Besides note that if you lot do not supply arguments for na.rm or convert values then the defaults are used.

separate( ) function:

Objective: Splitting a single variable into two

Description: Many times a single cavalcade variable volition capture multiple variables, or even parts of a variable you just don't intendance about. Some examples include:

                          ##   Grp_Ind    Yr_Mo       City_State        First_Last Extra_variable                                          ## 1     one.a 2006_Jan      Dayton (OH) George Washington   XX01person_1                                          ## 2     one.b 2006_Feb Grand Forks (ND)        John Adams   XX02person_2                                          ## three     1.c 2006_Mar       Fargo (ND)  Thomas Jefferson   XX03person_3                                          ## 4     2.a 2007_Jan   Rochester (MN)     James Madison   XX04person_4                                          ## 5     2.b 2007_Feb     Dubuque (IA)      James Monroe   XX05person_5                                          ## half-dozen     2.c 2007_Mar Ft. Collins (CO)        John Adams   XX06person_6                                          ## seven     three.a 2008_Jan   Lake City (MN)    Andrew Jackson   XX07person_7                                          ## 8     3.b 2008_Feb    Rushford (MN)  Martin Van Buren   XX08person_8                                          ## 9     iii.c 2008_Mar          Unknown  William Harrison   XX09person_9                                                  

In each of these cases, our objective may be to separate characters within the variable string. This can be accomplished using the split up() function which turns a single graphic symbol column into multiple columns.

                          Function              :                                          split              (              data              ,                                          col              ,                                          into              ,                                          sep                                          =                                          " "              ,                                          remove                                          =                                          True              ,                                          convert                                          =                                          FALSE              )                                          Same                                          as              :                                          data                                          %>%                                          split up              (              col              ,                                          into              ,                                          sep                                          =                                          " "              ,                                          remove                                          =                                          True              ,                                          convert                                          =                                          FALSE              )                                          Arguments              :                                          data              :                                          information                                          frame                                          col              :                                          column                                          name                                          representing                                          electric current                                          variable                                          into              :                                          names                                          of                                          variables                                          representing                                          new                                          variables                                          sep              :                                          how                                          to                                          separate                                          current                                          variable                                          (              char              ,                                          num              ,                                          or                                          symbol              )                                          remove              :                                          if                                          Truthful              ,                                          remove                                          input                                          cavalcade                                          from                                          output                                          data                                          frame                                          convert              :                                          if                                          TRUE                                          will                                          automatically                                          catechumen                                          values                                          to                                          logical              ,                                          integer              ,                                          numeric              ,                                          circuitous                                          or                                          factor                                          as                                          appropriate                                                  

This function is a complement to unite()

Case

We can become dorsum to our long_DF dataframe we created higher up in which way may desire to clean upwardly or separate the Quarter variable.

                          ## Source: local data frame [6 x 4]                                          ##                                                        ##   Group Year Quarter Revenue                                          ## i     1 2006   Qtr.1      15                                          ## ii     1 2007   Qtr.one      12                                          ## three     one 2008   Qtr.one      22                                          ## 4     i 2009   Qtr.1      10                                          ## five     2 2006   Qtr.1      12                                          ## half dozen     ii 2007   Qtr.1      16                                                  

Past applying the split() function nosotros get the following:

                          separate_DF                                          <-                                          long_DF                                          %>%                                          separate              (              Quarter              ,                                          c              (              "Time_Interval"              ,                                          "Interval_ID"              ))                                          head              (              separate_DF              ,                                          10              )                                          ## Source: local data frame [10 ten five]                                          ##                                                        ##    Group Yr Time_Interval Interval_ID Acquirement                                          ## 1      1 2006           Qtr           i      15                                          ## ii      1 2007           Qtr           1      12                                          ## 3      ane 2008           Qtr           one      22                                          ## four      1 2009           Qtr           1      10                                          ## 5      two 2006           Qtr           i      12                                          ## six      ii 2007           Qtr           i      16                                          ## 7      ii 2008           Qtr           1      13                                          ## 8      2 2009           Qtr           one      23                                          ## 9      3 2006           Qtr           1      11                                          ## 10     3 2007           Qtr           ane      13                                                  

These produce the aforementioned results:

                          long_DF                                          %>%                                          separate              (              Quarter              ,                                          c              (              "Time_Interval"              ,                                          "Interval_ID"              ))                                          long_DF                                          %>%                                          split up              (              Quarter              ,                                          c              (              "Time_Interval"              ,                                          "Interval_ID"              ),                                          sep                                          =                                          "\\."              )                                                  

unite( ) function:

Objective: Merging two variables into i

Description: There may exist a fourth dimension in which we would like to combine the values of two variables. The unite() function is a convenience function to paste together multiple variable values into one. In essence, it combines 2 variables of a unmarried observation into one variable.

                          Office              :                                          unite              (              data              ,                                          col              ,                                          ...              ,                                          sep                                          =                                          " "              ,                                          remove                                          =                                          TRUE              )                                          Same                                          equally              :                                          data                                          %>%                                          unite              (              col              ,                                          ...              ,                                          sep                                          =                                          " "              ,                                          remove                                          =                                          TRUE              )                                          Arguments              :                                          data              :                                          data                                          frame                                          col              :                                          column                                          proper noun                                          of                                          new                                          "merged"                                          column                                          ...              :                                          names                                          of                                          columns                                          to                                          merge                                          sep              :                                          separator                                          to                                          utilize                                          between                                          merged                                          values                                          remove              :                                          if                                          True              ,                                          remove                                          input                                          column                                          from                                          output                                          information                                          frame                                                  

This function is a complement to separate()

Instance

Using the separate_DF dataframe nosotros created higher up, nosotros can re-unite the Time_Interval and Interval_ID variables we created and re-create the original Quarter variable we had in the long_DF dataframe.

                          unite_DF                                          <-                                          separate_DF                                          %>%                                          unite              (              Quarter              ,                                          Time_Interval              ,                                          Interval_ID              ,                                          sep                                          =                                          "."              )                                          caput              (              unite_DF              ,                                          10              )                                          ## Source: local data frame [ten 10 iv]                                          ##                                                        ##    Group Year Quarter Revenue                                          ## 1      1 2006   Qtr.one      fifteen                                          ## 2      one 2007   Qtr.1      12                                          ## three      one 2008   Qtr.1      22                                          ## 4      1 2009   Qtr.ane      10                                          ## 5      2 2006   Qtr.one      12                                          ## six      2 2007   Qtr.1      16                                          ## 7      2 2008   Qtr.one      13                                          ## 8      2 2009   Qtr.1      23                                          ## 9      3 2006   Qtr.ane      xi                                          ## 10     3 2007   Qtr.1      thirteen                                                  

These produce the same results:

                          separate_DF                                          %>%                                          unite              (              Quarter              ,                                          Time_Interval              ,                                          Interval_ID              ,                                          sep                                          =                                          "_"              )                                          separate_DF                                          %>%                                          unite              (              Quarter              ,                                          Time_Interval              ,                                          Interval_ID              )                                          # If no spearator is identified, "_" volition automatically exist used                                                  

spread( ) part:

Objective: Reshaping long format to broad format

Clarification: In that location are times when we are required to turn long formatted data into wide formatted data. The spread() part spreads a cardinal-value pair across multiple columns.

                          Part              :                                          spread              (              data              ,                                          key              ,                                          value              ,                                          fill                                          =                                          NA              ,                                          convert                                          =                                          False              )                                          Aforementioned                                          every bit              :                                          data                                          %>%                                          spread              (              key              ,                                          value              ,                                          fill                                          =                                          NA              ,                                          convert                                          =                                          FALSE              )                                          Arguments              :                                          data              :                                          data                                          frame                                          key              :                                          column                                          values                                          to                                          convert                                          to                                          multiple                                          columns                                          value              :                                          single                                          column                                          values                                          to                                          convert                                          to                                          multiple                                          columns              ' values          make full:           If there isn'              t                                          a                                          value                                          for                                          every                                          combination                                          of                                          the                                          other                                          variables                                          and                                          the                                          key                                          cavalcade              ,                                          this                                          value                                          will                                          be                                          substituted                                          convert              :                                          if                                          True                                          will                                          automatically                                          convert                                          values                                          to                                          logical              ,                                          integer              ,                                          numeric              ,                                          complex                                          or                                          factor                                          as                                          appropriate                                                  

This function is a complement to gather()

Case

                          wide_DF                                          <-                                          unite_DF                                          %>%                                          spread              (              Quarter              ,                                          Acquirement              )                                          head              (              wide_DF              ,                                          24              )                                          ## Source: local data frame [12 10 half dozen]                                          ##                                                        ##    Group Twelvemonth Qtr.1 Qtr.ii Qtr.iii Qtr.4                                          ## 1      1 2006    15    16    19    17                                          ## 2      ane 2007    12    13    27    23                                          ## 3      one 2008    22    22    24    20                                          ## 4      1 2009    x    14    20    sixteen                                          ## 5      2 2006    12    thirteen    25    eighteen                                          ## 6      ii 2007    sixteen    fourteen    21    19                                          ## 7      2 2008    thirteen    11    29    15                                          ## 8      2 2009    23    20    26    xx                                          ## 9      3 2006    11    12    22    16                                          ## 10     3 2007    13    eleven    27    21                                          ## 11     iii 2008    17    12    23    19                                          ## 12     iii 2009    14     ix    31    24                                                  

Boosted Resources

  • Data wrangling presentation I gave at Miami University
  • R Studio's Information wrangling with R and RStudio webinar
  • R Studio's Data wrangling GitHub repository
  • R Studio's Data wrangling cheat canvass
  • Hadley Wickham's paper on Tidy Information

How To Convert Column Data To Row Data In R,

Source: https://uc-r.github.io/tidyr

Posted by: palmisanosciallsolle.blogspot.com

0 Response to "How To Convert Column Data To Row Data In R"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel