0
Follow
2
View

I am new to coding in R and not able to create a loop for the problem

dl04061227 注册会员
2023-01-24 18:51

Possible alternative to using a loop with tidyverse:

library(tidyverse)

df %>%
  mutate(across(everything(), ~na_if(.x, 999999))) %>%
  pivot_longer(-c(`Policy No`, `Amount`), names_to = NULL, values_drop_na = TRUE)

   `Policy No` Amount  value
             
 1        1001    100 500001
 2        1001    100 500002
 3        1001    100 500003
 4        1002    200 600001
 5        1002    200 600002
 6        1003    300 700001
 7        1003    300 700002
 8        1003    300 700003
 9        1003    300 700004
10        1003    300 700005

Or if you want to stick with the for loop, then you could do something like this:

output <- NULL

for (i in names(df[,-c(1:2)])) {
  x <- df[which(df[,i] != 999999), c("Policy No", "Amount",i)]
  names(x)[3] <- "Info"
  output <- rbind(output, x)
}

df <- structure(list(`Policy No` = 1001:1003, Amount = c(100L, 200L, 
300L), `Info (1)` = c(500001L, 600001L, 700001L), `Info (2)` = c(500002L, 
600002L, 700002L), `Info (3)` = c(500003L, 999999L, 700003L), 
    `Info (4)` = c(999999L, 999999L, 700004L), `Info (5)` = c(999999L, 
    999999L, 700005L)), class = "data.frame", row.names = c(NA, 
-3L))
dongsheng9 注册会员
2023-01-24 18:51

Pivot longer and then remove all 999999 values.

library(dplyr)

df %>%
    pivot_longer(cols = starts_with("Info"), names_to = "Info") %>%
    filter(!Info == 999999)
duck125 注册会员
2023-01-24 18:51

There are certain circumstances where we need to rotate the data from a wide format to a long format. I had a similar issue like this in some of my code but in reverse. Here is the question I posted. This also provides other options that can be used to solve this problem.

R- How to put multiple cases (rows) in one row

I would use reshape. Here is a link to the r documentation.
https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/reshape

Here is the data.

policyno <- c (1001, 1002, 1003)
amount <- c (100, 200, 300)
info1 <- c (500001, 600001, 700001)
info2 <- c (500002, 600002, 700002)
info3 <- c (500003, 999999, 700003)
info4 <- c (999999, 999999, 700004)
info5 <- c (999999, 999999, 700005)
data <- data.frame (policyno, amount, info1, info2, info3, info4, info5)
data

There are a few things we need to input for reshape to work. We need the name of the data frame (data), the id variable (policyno). We also need the name of the new variable we want to create (info) and a list of the variables which make up this new variable (info1, info2, info3, info4, info5). For generating this list of variables, there are a few ways to do this. I have seen posts where we list the row number. If we have a very small dataset and the variables (and their row numbers stay the same), then this method is fine. If you are working with a very large dataset where the row numbers might change depending upon how you are subsetting the data, then we can have R identify the row numbers for us. Which and calnames can be used for this purpose.

data2 <- reshape (data, 
                  idvar = "policyno", 
                  v.names = c ("info"),
                  varying = list (c (which(colnames(data) == "info1"),
                                    which (colnames(data) == "info2"),
                                    which(colnames(data) == "info3"),
                                    which(colnames(data) == "info4"),
                                    which(colnames(data) == "info5")
                                    )
                                  ),
                  direction = "long")
data2

This rotates the data but doesn't remove the 999999 cases. To remove the 999999, we can use subset. As I minor note, this reshape created a variable labeled time which we can remove using select.

data3 <- subset (data2, 
                 select = -c (time),
                 info != 999999)
data3

Again, this is just one way that can be used to solve the problem. Stackoverflow has a lot of posts about reshape and transpose that are very useful.