Compare two data.frames to find the rows in data.frame 1 that are not present in data.frame 2

Question

Compare two data.frames to find the rows in data.frame 1 that are not present in data.frame 2

I have the following 2 data.frames:

a1 <- data.frame(a = 1:5, b=letters[1:5])

a2 <- data.frame(a = 1:3, b=letters[1:3])

I need to discover the row a1 has that a2 doesn't.

Is there a built function for this kind of operation?

(p.s: I composed an solution for it, I am simply curious if someone already made a more crafted code)

Here is my answer:

a1 <- data.frame(a = 1:5, b=letters[1:5])

a2 <- data.frame(a = 1:3, b=letters[1:3])



rows.in.a1.that.are.not.in.a2  <- function(a1,a2)

{

    a1.vec <- apply(a1, 1, paste, collapse = "")

    a2.vec <- apply(a2, 1, paste, collapse = "")

    a1.without.a2.rows <- a1[!a1.vec %in% a2.vec,]

    return(a1.without.a2.rows)

}

rows.in.a1.that.are.not.in.a2(a1,a2)

merge dataframe r-language

3 years ago by vishaljlf39

3 Answers

Login / Signup to Answer the Question.

Login Signup

**espadacoder11** · Answer 1 · 2021-07-07T13:10:40

This doesn't answer your question directly, but it will give you the elements that are in common. This can be done with Paul Murrell's package compare:



library(compare)

a1 <- data.frame(a = 1:5, b = letters[1:5])

a2 <- data.frame(a = 1:3, b = letters[1:3])

comparison <- compare(a1,a2,allowAll=TRUE)

comparison$tM

#  a b

#1 1 a

#2 2 b

#3 3 c

The function compare gives you a lot of flexibility in terms of what kind of comparisons are allowed (e.g. changing order of elements of each vector, changing order and names of variables, shortening variables, changing case of strings). From this, you should be able to figure out what was missing from one or the other. For example (this is not very elegant):



difference <-

   data.frame(lapply(1:ncol(a1),function(i)setdiff(a1[,i],comparison$tM[,i])))

colnames(difference) <- colnames(a1)

difference

#  a b

#1 4 d

#2 5 e

**sandhya6gczb** · Answer 2 · 2021-07-08T14:19:35

sqldf provides a nice solution



a1 <- data.frame(a = 1:5, b=letters[1:5])

a2 <- data.frame(a = 1:3, b=letters[1:3])

require(sqldf)



a1NotIna2 <- sqldf('SELECT  FROM a1 EXCEPT SELECT * FROM a2')

And the rows which are in both data frames:



a1Ina2 <- sqldf('SELECT * FROM a1 INTERSECT SELECT  FROM a2')

The new version of dplyr has a function, anti_join, for exactly these kinds of comparisons



require(dplyr) 

anti_join(a1,a2)

And semi_join to filter rows in a1 that are also in a2

**pankajshivnani123** · Answer 3 · 2021-07-09T12:55:52

t is certainly not efficient for this particular purpose, but what I often do in these situations is to insert indicator variables in each data.frame and then merge:

a1$included_a1 <- TRUE

a2$included_a2 <- TRUE

res <- merge(a1, a2, all=TRUE)

missing values in included_a1 will note which rows are missing in a1. similarly for a2.

One problem with your solution is that the column orders must match. Another problem is that it is easy to imagine situations where the rows are coded as the same when in fact are different. The advantage of using merge is that you get for free all error checking that is necessary for a good solution.

C TUTORIAL

C PROGRAMS

INTERVIEW TESTS

EXECUTE CODE

C++ TUTORIAL

C++ PROGRAMS

INTERVIEW TESTS

EXECUTE CODE

PYTHON TUTORIAL

PYTHON HOW TOS

INTERVIEW TESTS

EXECUTE CODE

JAVA TUTORIAL

JAVA CODE EXAMPLES

SPRING TUTORIAL

MORE IN JAVA

COMPUTER ARCHITECTURE

COMPUTER NETWORK

OPERATING SYSTEM

DBMS & SQL

PL/SQL

MongoDB

EXECUTE SQL

ANDROID DEVELOPMENT

GO LANGUAGE

LINUX

DOCKER

HTML TAGS (A to Z)

CSS REFERENCES

SASS/SCSS

KOTLIN

GAME DEVELOPMENT

PHP

GIT GUIDE

JAVASCRIPT

ADVANCED DSA

Compare two data.frames to find the rows in data.frame 1 that are not present in data.frame 2

3 Answers

Login / Signup to Answer the Question.