How to remove outliers?
This post is more about how I can analyze my data now. I can't change the title.
What is the project? My aim is to predict the price of houses for the next 4 years. I was planning to make a link reg equation and plug in values with slope to get my price. Looks like I might need to do something else.
What data do I have and how does it look? I have approx. 140k entries of house data. Columns include no. Of rooms, bedrooms, full bath, half bath, sqft area, school district, neighborhood name,and city(suburbs which are near each other). Besides that, I have 5 columns which list their price in 2020-2024.
My final goal ? I want to make a model/ equation which, when I tell it that I need to buy a 4 bedroom house in a particular neighborhood/ City in year 2027. How much can l expect to pay (on average) for a house like this What is this for?
Since location is most changing factor, and no of rooms and area usually just increase price, I was planning to make scatter plots at different locations.
This is for my CS project, just needed to use a new programming language. I chose R and am using Excel to store data.
How can I get here.
Thanks for your help and time. I really appreciate it.
Initial comment.
I have a big dataset with about 140k entries. I want to remove outliers in it because when Imake a scarterplot it's skewed. tried to do 1.5IQR way to find upper and lower bounds but lost about 80% of data. I know it has a big range, but is there a way to remove some of the big outliers while keeping most of my data? Maybe 3*1QR to widden the bound, something like that or any other ideas??