介绍
dplyr是一个常用的用于数据清洗的R包,其中主要的函数有:
- select() 从数据中选择列
- filter() 数据行的子集
- group_by() 汇总数据
- summarise() 汇总数据(计算汇总统计信息)
- arrange() 排序数据
- mutate() 创建新变量
mutate()的使用方法
mutate(df, new_variable=existing_var的表达式,.keep = c("all", "used", "unused", "none"), .before = NULL, .after = NULL)
参数介绍: df: 需要修改的数据框 new_variable: 新变量的名称 .keep: This is an experimental argument that allows you to control which columns from .data are retained in the output:
- “all”, the default, retains all variables.
- “used” keeps any variables used to make new variables; it’s useful for checking your work as it displays inputs and outputs side-by-side.
- “unused” keeps only existing variables not used to make new
variables. - “none”, only keeps grouping keys (like transmute()).
Grouping variables are always kept, unconditional to .keep. .before, .after Optionally, control where new columns should appear (the default is to add to the right hand side).
实例
# By default, new columns are placed on the far right.
# Experimental: you can override with `.before` or `.after`
df <- tibble(x = 1, y = 2)
df %>% mutate(z = x + y)
# # A tibble: 1 x 3
# x y z
# <dbl> <dbl> <dbl>
# 1 1 2 3
df %>% mutate(z = x + y, .before = 1)
# # A tibble: 1 x 3
# z x y
# <dbl> <dbl> <dbl>
# 1 3 1 2
df %>% mutate(z = x + y, .after = x)
# # A tibble: 1 x 3
# x z y
# <dbl> <dbl> <dbl>
# 1 1 3 2
# By default, new columns are placed on the far right.
# Experimental: you can override with `.before` or `.after`
df <- tibble(x = 1, y = 2)
df %>% mutate(z = x + y)
# # A tibble: 1 x 3
# x y z
# <dbl> <dbl> <dbl>
# 1 1 2 3
df %>% mutate(z = x + y, .before = 1)
# # A tibble: 1 x 3
# z x y
# <dbl> <dbl> <dbl>
# 1 3 1 2
df %>% mutate(z = x + y, .after = x)
# # A tibble: 1 x 3
# x z y
# <dbl> <dbl> <dbl>
# 1 1 3 2
# By default, mutate() keeps all columns from the input data.
# Experimental: You can override with `.keep`
df <- tibble(x = 1, y = 2, a = "a", b = "b")
df %>% mutate(z = x + y, .keep = "all") # the default
# # A tibble: 1 x 5
# x y a b z
# <dbl> <dbl> <chr> <chr> <dbl>
# 1 1 2 a b 3
df %>% mutate(z = x + y, .keep = "used")
# # A tibble: 1 x 3
# x y z
# <dbl> <dbl> <dbl>
# 1 1 2 3
df %>% mutate(z = x + y, .keep = "unused")
# # A tibble: 1 x 3
# a b z
# <chr> <chr> <dbl>
# 1 a b 3
df %>% mutate(z = x + y, .keep = "none") # same as transmute()
# # A tibble: 1 x 1
# z
# <dbl>
# 1 3
|