top of page
Search

Aggregation in R programming

Updated: Apr 17, 2020

This post is about aggregate function used for data.frame with some interesting uses in R and handy to solve some complex problems in R. also merge function which is used to merge two separate data.frames in R and we are going to play with this two functions in this blog.

In This blog, we will learn the following:

1. What is Aggregation?

2. Why we use Aggregation?

3. aggregate()

4. merge()

1. What is Aggregation?

In simple terms we can say that aggregation is combining things. i.e.putting things together so that we can refer them collectively. Example think about your classroom you can refer to them individually with students and subjects like zeeshan, danish, sufiyan etc but its easier to think of them collectively as your final year classroom BEIT. It is also important to know that each member in aggregation still hold a properties as a whole. In other words each student in the classroom are the students. The process of combining them has not altered them in anyways.

so, now we know what is aggregation. lets understand the use of aggregation.


2. Why we use Aggregation?

As, we mentioned above we use aggregation to refer a group of item as a whole. That is its main use. but it also provide one other significant benefit, it simplifies accessing the individual items because we refer to them as a part of the whole. for example take an array (array is a collection of elements) where every element is called as index. It is a powerful idea it means that now group can be referred by its names suppose X, and its member can be referred by there index i.e. X[1], X[2] and so on. This is the main reason why we use aggregations in R programming so that it is easier to refer the group of items and properly use different functions like sum, mean, max on them.


Now, we know what is aggregation and its use in R programming lets without wasting any time start with some examples on aggregation and merge function so that you can understand more clearly how aggregation is used in R programming.


If you have read all my blog up till now you can directly jump to the example section.


Before getting started i recommend you to checkout the previous blogs to understand the basic of R programming which will help you more in understanding this blog although it is not necessary to have a basic idea for understanding aggregation but if you are new in R programming and don't know the structure of R programming. You should check it out Below are the links to my previous R programming article.


3. aggregate()

The first thing we need to understand is the arguments in aggregate ():

aggregate(y~x, Data,by,FUN)

Formula: a formula, such as (y~x) or cbind(y1,y2)~x1+x2, where the y is numeric data to be split into groups according to the variable x in the group.

Data: a data frame or list from which the variable in formula should be taken

FUN: A function to compute the summary statistics that can be applied to all data subset.

There are more arguments in aggregate function like simplify, drop etc but for this article we are going to use only the above few arguments so that it will be easier to understand if you are new in R programming Later in my article ill explain the aggregate function in more details.


Lets take a example of my Final year Classroom where we are combing the students with marks and unit 1 and unit 2 exams. and labeling this group beit.



beit <- data.frame(student=c("zeeshan","sufiyan",
"danish","sufiyan","sameena","sameena"), marks=c(14,15,20,16,20,18),  IAE=c(1,2,2,1,2,1))

beit

aggregate(marks~student,beit,mean)

As you can see in the above Code we have created a label called as beit and created a dataframe in it with students, marks and IAE 1 and IAE 2 ewe have us//ed aggregate function to find out the average marks of student in both IAE 1 and 2.

Output:

here marks is y and student is x which is our formula argument and after that beit is our data which we have made before and mean is our FUN for getting the average of IAE 1 and 2.









4. merge ()

Now, lets make another table for using demonstrating merge function in this article will be making a dataframe with label subject which consist of students and subjects and after that we will merge our beit and subject table together and lets see what will be the result of merge function



subjects <-data.frame(student=c("zeeshan","sufiyan","danish","sameena"),
                      subject=c("DEVOPS","ADBMS","OS","IOE"))
subjects

beitfinal<-merge(beit,subjects,by="student")
beitfinal

Output:

As, you can see in above output we have created another dataframe name subject and merge it with beit using a new label beitfinal in merge function we have first used the before created data and now created data using by argument students

This has given us the following result.


Now, Lets perform aggregate operation on beitfinal table where we have subjects, marks, students and IAE 1 and 2 data. after using merge function.


aggregate(marks~subject/student,beitfinal,mean)

Output:

Here you can see that i have used two x variable in the formula you can use more than 2 also but it is not recommended you can try anyways if you want and took the data as beitfinal and perform a function mean to get the average of the IAE 1 and according to the subjects.You can perform more examples here. with aggregation. We will end this here as it is not a big topic to explain. I hope you all understand the aggregate and merge function in R programming.


below you can run and check the above codes if you want

Thank you, Do check my other article if possible more article are on there way. 🙂🙂,

checkout my website: https://mzeeej.com




0 comments

Recent Posts

See All
Favorite Links
Recent posts
bottom of page