Saturday, January 31, 2026

Chapter-5:- Scatter Plot in R-programming

 Chapter-5:- Scatter Plot in R-programming



Example-1:- 
Create a scatter plot of car weight on x-axis and miles per gallon on y-axis by pre-program data "mtcars" in R-language

Code:-
mtcars
plot(mtcars$wt, mtcars$mpg,
     main = "Car Weight vs Fuel Efficiency",
     xlab = "car weight",
     ylab = "Miles Per Gallon",
     pch = 19,         # Solid circle shape
     col = "blue", # Change point color
     frame = FALSE)    # Remove the surrounding box


Result:-



Explanation:-
  • In 1st line, "mtcars" is inbuilt program or database in R-programming language, you can in result image with yellow square box, it contain multiple car model with mpg, disp, wt, gear, carb etc. 
  • In 2nd line plot() use, we discuss this already in previous chapter it contain multiple parameter like col, lwd, lty, col, xlim, ylim etc.
  • In 2nd line, you see mtcars$wt, here you see $ sign, it use mainly to collect data from specific column like here we use wt, so it collect data of wt column of different model of cars
  • Also we use mtcars$mpg, it means it collect data from mpg column of database
  • In last line, frame if TRUE it make a box around it, if FALSE then you see no box surround the result image


Example-2:-
Plot sepal length vs sepal width for all species, set the x-axis range from 4 to 8 and y-axis from 2 to 4.5 by pre-program database "iris"

Code:-
iris
plot(iris$Sepal.Length, iris$Sepal.Width,
     col = c("blue", "yellow"),    # For color of the sepal
     pch = 19,                    # solid circle denote by 19
     main = "Custom Species Colors")   #main title


Result:-



Explanation:-
  • Iris is also pre-program or inbuilt data base in R-programming language like the "mtcars"
  • You can see in result image in yellow square box, "iris" contain 5 different column like Sepal.length, Sepal.width, species etc.
  • If you want to collect data from specific column like Sepal.Length then we use $ sign as you can see in 2nd line code
  • For use different color, use c(1st color name, 2nd color name), here c is compulsary to use, If you read previous chapter then no difficulty to understand
  • pch- use for different shape/ symbol to denote
  • main- for main title 


Example-3:- 
If in above example if ask to arrange the data according to petal width (means according to petal width the plot size increase or decrease, higher the petal width more size of plot in result graph)

Code:-
iris
plot(iris$Petal.Width,
     col = "red",    # For color of the sepal
     pch = 19,                    # solid circle denote by 19
     main = "Custom Species Colors",   #main title
     cex=iris$Petal.Width,    #Increase solid circle size according to value
     las=1,     # To make y-axis text horizontal
     frame.plot=FALSE)   #Remove top and right border of box


Result:-


Explanation:-


  • Compare to previous example here additional command "cex" , frame.plot
  • All the point already explain with comment in code


Example-4:-
Create a scatter plot of girth vs volume. Before plotting the points, add a light grey background grid using panel.first= grid() by using pre-program feature "trees", also add regression line to show the trend

Code:-
trees   #Load the database
plot(trees$Girth, trees$Volume,
     panel.first= grid(col="grey",lty="dotted"), #Create grid with background grey and dotted
     col="red",pch=20,cex=1.5)

#Add a red regression line
abline(lm(Volume ~ Girth, data = trees), col = "blue", lwd = 2)

Result:-



Explanation:-
  • "trees" also inbuilt database in R-programming

Component

Function

Details

abline()

Draws a line

Function that draws a straight line on the plot

lm()

Fits a linear model

Calculate the best fit straight line that minimize the distance between line and your data points

Volume ~ Girth

Model formula

Means volume vs girth graph (use here “~”)

data

Specifies data

Like in this case trees which is inbuilt data

col

Color

Set the color of line

lwd

Line width

Sets the line width which default is 1





Example-5:- Code using ggplot2

Code:-
library(ggplot2)

# Load data
data(trees)

# Initialize plot and add styled points, Use + to recognize the next line
ggplot(trees, aes(x = Height, y = Volume)) +
geom_point(size = 4, shape = 21, fill = "green", color = "white") +

# Add orange regression line without confidence band
 geom_smooth(method = "lm", color = "orange", se = FALSE)

Result:-






Explanation:- 
  • In 7th line you see "+ sign" which for recognize the next line
  • geom_point to create scatter plot (point library contain color, fill, size, shape, alpha)
  • geom_smooth is use to add regression line
  • Further code explain below




Example-6:-

Code:-
library(ggplot2)

# Load data
data(trees)

# Initialize plot and add styled points
ggplot(trees, aes(x = Height, y = Volume)) +
geom_point(size = 4, shape = 21, fill = "darkgreen", color = "white") +
  
# Add orange regression line with confidence band by add se=TRUE
geom_smooth(method = "lm", color = "orange", se =TRUE) +
  
# Apply a clean theme and customize grid lines
  theme_minimal() +   # Sets the base line first
  theme(
    panel.grid.major = element_line(color = "lightgrey", linetype = "dashed"),
    panel.grid.minor = element_blank() # Clean up minor grid lines
  ) +
  labs(title = "Cherry Tree Height vs. Volume",
       x = "Height (ft)",
       y = "Volume (cubic ft)")

Result:-


Explanation:-
  • Here, a change compare to previous example- In 11th line we use here se=TRUE instead of FALSE , so you can see gray color shadow around the orange line
  • In 14th line we use theme_minimal to set the base line first
  • In 15th line we use the theme command which you can see below as all code explain of theme command
  • In 11th line you see "+" sign, also same you can see in 18th its mainly to recognize the next line



1. The Global Mapping (aes)
The aes() (aesthetics) function maps your data variables to visual properties.

x,y

Horizontal and vertical position

color

Outline color of points, lines or borders

fill

Interior color for shapes

size

Diameter of points or thickness of lines

shape

Style of a point (0-25)

alpha

Transparency level (0 for transparent, 1 for opaque)

linetype

Pattern of a line (eg. solid, dashed, dotted)

group

Used to identify distinct sets of data



2. Geometric Layers (geom_*)
These parameters are often used outside of aes() to set "static" values for every point or line in that layer

data

Overrides the global dataset

stat

Overrides the default statistical transformation (eg stat= identity)

position

Adjusts how elements sit (eg. position= “dodge” for side-by-side bars)

na.rm

If TRUE, missing values are removed silently

point

Create a scatter plots

line

Create a line graph

bar

Create a bar chart

histogram

Create a histogram

boxplot

Create a box and whisker plot

smooth

To add regression line

method=”lm”

Fits a linear model (straight line)

method= “loess”

Fits a Local Regression (curvy line that follows data trends); this is the default for datasets with <1000 points

se= FALSE

Removes the shaded standard error band around the line

span

Control how “wiggly” the line is (for loess)



3. Theme Customization (theme)
The theme() function controls non-data elements. It has nearly 100 parameters, categorized by "element functions" 

Text Element (element_text)

plot.title, axis.title.x, legend.text

Parameters include family, face, size, hjust and vjust

Line Element (element_line)

panel.grid.major, axis.ticks

Parameters include color, linewidth and linetype

Rectangle Elements (element_rect)

panel.background, plot.background

Parameters include fill and color

Blank Elements (element_blank)

Removes an element entirely (eg. panel.grid.minor= element_blank()



4. Scales and Labels (scale_* & labs)
Scales translate data into the visual aesthetics you see

name

Axis or legend title

limits

Min/Max values of the scale

breaks

The specific points where labels/tick appear

labels

The text that appears at those breaks

labs()

A shortcut function to set title, subtitle, caption, x and y labels



5. Faceting and Coordinates (facet_* & coord_*)

Faceting 

facet_wrap (~variable)

facet_grid (rows~cols)

Parameters include scales (eg. free_x) and nrow/ncol

Coordinates

coord_flip()       [swaps X and Y)

coord_polar()    [circular plot)

coord_cartesian ()      [zooming without removing data]



Chapter-5:- Scatter Plot in R-programming

 Chapter-5:- Scatter Plot in R-programming Example-1:-  Create a scatter plot of car weight on x-axis and miles per gallon on y-axis by pre-...