실습 환경 - Google Colab
시계열 데이터 (Datetime)¶
In [ ]:
# 데이터 생성(실습 전 실행)
import pandas as pd
data = {
'Date1': ['2024-02-17', '2024-02-18', '2024-02-19'],
'Date2': ['2024:02:17', '2024:02:18', '2024:02:19'],
'Date3': ['24/02/17', '24/02/18', '24/02/19'],
'Date4': ['02/17/2024', '02/18/2024', '02/19/2024'],
'Date5': ['17-Feb-2024', '18-Feb-2024', '19-Feb-2024'],
'Date6': ['2024년02월17일', '2024년02월18일', '2024년02월19일'],
'DateTime1': ['24-02-17 11:45:30', '24-02-18 12:55:45', '24-02-19 13:30:15'],
'DateTime2': ['2024-02-17 11-45-30', '2024-02-18 12-55-45', '2024-02-19 13-30-15'],
'DateTime3': ['02/17/2024 11:45:30 AM', '02/18/2024 12:55:45 PM', '02/19/2024 01:30:15 PM'],
'DateTime4': ['17 Feb 2024 11:45:30', '18 Feb 2024 12:55:45', '19 Feb 2024 13:30:15']
}
df = pd.DataFrame(data)
df.to_csv("date.csv", index=False)
df
Out[ ]:
Date1 | Date2 | Date3 | Date4 | Date5 | Date6 | DateTime1 | DateTime2 | DateTime3 | DateTime4 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2024-02-17 | 2024:02:17 | 24/02/17 | 02/17/2024 | 17-Feb-2024 | 2024년02월17일 | 24-02-17 11:45:30 | 2024-02-17 11-45-30 | 02/17/2024 11:45:30 AM | 17 Feb 2024 11:45:30 |
1 | 2024-02-18 | 2024:02:18 | 24/02/18 | 02/18/2024 | 18-Feb-2024 | 2024년02월18일 | 24-02-18 12:55:45 | 2024-02-18 12-55-45 | 02/18/2024 12:55:45 PM | 18 Feb 2024 12:55:45 |
2 | 2024-02-19 | 2024:02:19 | 24/02/19 | 02/19/2024 | 19-Feb-2024 | 2024년02월19일 | 24-02-19 13:30:15 | 2024-02-19 13-30-15 | 02/19/2024 01:30:15 PM | 19 Feb 2024 13:30:15 |
In [ ]:
# 데이터 불러오기
df = pd.read_csv('date.csv')
df
Out[ ]:
Date1 | Date2 | Date3 | Date4 | Date5 | Date6 | DateTime1 | DateTime2 | DateTime3 | DateTime4 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2024-02-17 | 2024:02:17 | 24/02/17 | 02/17/2024 | 17-Feb-2024 | 2024년02월17일 | 24-02-17 11:45:30 | 2024-02-17 11-45-30 | 02/17/2024 11:45:30 AM | 17 Feb 2024 11:45:30 |
1 | 2024-02-18 | 2024:02:18 | 24/02/18 | 02/18/2024 | 18-Feb-2024 | 2024년02월18일 | 24-02-18 12:55:45 | 2024-02-18 12-55-45 | 02/18/2024 12:55:45 PM | 18 Feb 2024 12:55:45 |
2 | 2024-02-19 | 2024:02:19 | 24/02/19 | 02/19/2024 | 19-Feb-2024 | 2024년02월19일 | 24-02-19 13:30:15 | 2024-02-19 13-30-15 | 02/19/2024 01:30:15 PM | 19 Feb 2024 13:30:15 |
In [ ]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3 entries, 0 to 2 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date1 3 non-null object 1 Date2 3 non-null object 2 Date3 3 non-null object 3 Date4 3 non-null object 4 Date5 3 non-null object 5 Date6 3 non-null object 6 DateTime1 3 non-null object 7 DateTime2 3 non-null object 8 DateTime3 3 non-null object 9 DateTime4 3 non-null object dtypes: object(10) memory usage: 368.0+ bytes
자료형 변경¶
- pd.to_datetime(df['컬럼명'])
In [ ]:
# Date1
df = pd.read_csv("date.csv")
print(df['Date1'])
df['Date1'] = pd.to_datetime(df['Date1'])
print(df['Date1'])
0 2024-02-17 1 2024-02-18 2 2024-02-19 Name: Date1, dtype: object 0 2024-02-17 1 2024-02-18 2 2024-02-19 Name: Date1, dtype: datetime64[ns]
In [ ]:
# Date2 (year 4자리: %Y)
df = pd.read_csv("date.csv")
print(df['Date2'])
df['Date2'] = pd.to_datetime(df['Date2'], format='%Y:%m:%d')
print(df['Date2'])
0 2024:02:17 1 2024:02:18 2 2024:02:19 Name: Date2, dtype: object 0 2024-02-17 1 2024-02-18 2 2024-02-19 Name: Date2, dtype: datetime64[ns]
In [ ]:
# Date3 (year 2자리: %y)
df = pd.read_csv("date.csv")
print(df['Date3'])
df['Date3'] = pd.to_datetime(df['Date3'], format='%y/%m/%d')
print(df['Date3'])
0 24/02/17 1 24/02/18 2 24/02/19 Name: Date3, dtype: object 0 2024-02-17 1 2024-02-18 2 2024-02-19 Name: Date3, dtype: datetime64[ns]
In [ ]:
# Date4
df = pd.read_csv("date.csv")
print(df['Date4'])
df['Date4'] = pd.to_datetime(df['Date4'], format='%m/%d/%Y')
print(df['Date4'])
0 02/17/2024 1 02/18/2024 2 02/19/2024 Name: Date4, dtype: object 0 2024-02-17 1 2024-02-18 2 2024-02-19 Name: Date4, dtype: datetime64[ns]
In [ ]:
# Date5
df = pd.read_csv("date.csv")
print(df['Date5'])
df['Date5'] = pd.to_datetime(df['Date5'])
print(df['Date5'])
0 17-Feb-2024 1 18-Feb-2024 2 19-Feb-2024 Name: Date5, dtype: object 0 2024-02-17 1 2024-02-18 2 2024-02-19 Name: Date5, dtype: datetime64[ns]
In [ ]:
# Date6
df = pd.read_csv("date.csv")
print(df['Date6'])
df['Date6'] = pd.to_datetime(df['Date6'], format='%Y년%m월%d일')
print(df['Date6'])
0 2024년02월17일 1 2024년02월18일 2 2024년02월19일 Name: Date6, dtype: object 0 2024-02-17 1 2024-02-18 2 2024-02-19 Name: Date6, dtype: datetime64[ns]
In [ ]:
# DateTime1
df = pd.read_csv("date.csv")
print(df['DateTime1'])
df['DateTime1'] = pd.to_datetime(df['DateTime1'], format='%y-%m-%d %H:%M:%S')
print(df['DateTime1'])
0 24-02-17 11:45:30 1 24-02-18 12:55:45 2 24-02-19 13:30:15 Name: DateTime1, dtype: object 0 2024-02-17 11:45:30 1 2024-02-18 12:55:45 2 2024-02-19 13:30:15 Name: DateTime1, dtype: datetime64[ns]
In [ ]:
# DateTime2
df = pd.read_csv("date.csv")
print(df['DateTime2'])
df['DateTime2'] = pd.to_datetime(df['DateTime2'], format='%Y-%m-%d %H-%M-%S')
print(df['DateTime2'])
0 2024-02-17 11-45-30 1 2024-02-18 12-55-45 2 2024-02-19 13-30-15 Name: DateTime2, dtype: object 0 2024-02-17 11:45:30 1 2024-02-18 12:55:45 2 2024-02-19 13:30:15 Name: DateTime2, dtype: datetime64[ns]
In [ ]:
# DateTime3
df = pd.read_csv("date.csv")
print(df['DateTime3'])
df['DateTime3'] = pd.to_datetime(df['DateTime3'])
print(df['DateTime3'])
0 02/17/2024 11:45:30 AM 1 02/18/2024 12:55:45 PM 2 02/19/2024 01:30:15 PM Name: DateTime3, dtype: object 0 2024-02-17 11:45:30 1 2024-02-18 12:55:45 2 2024-02-19 13:30:15 Name: DateTime3, dtype: datetime64[ns]
In [ ]:
# DateTime4
df = pd.read_csv("date.csv")
print(df['DateTime4'])
df['DateTime4'] = pd.to_datetime(df['DateTime4'])
print(df['DateTime4'])
0 17 Feb 2024 11:45:30 1 18 Feb 2024 12:55:45 2 19 Feb 2024 13:30:15 Name: DateTime4, dtype: object 0 2024-02-17 11:45:30 1 2024-02-18 12:55:45 2 2024-02-19 13:30:15 Name: DateTime4, dtype: datetime64[ns]
In [ ]:
# 자료형 확인
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3 entries, 0 to 2 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date1 3 non-null object 1 Date2 3 non-null object 2 Date3 3 non-null object 3 Date4 3 non-null object 4 Date5 3 non-null object 5 Date6 3 non-null object 6 DateTime1 3 non-null object 7 DateTime2 3 non-null object 8 DateTime3 3 non-null object 9 DateTime4 3 non-null datetime64[ns] dtypes: datetime64[ns](1), object(9) memory usage: 368.0+ bytes
- datetime 자료형으로 변경하면 dt를 사용할 수 있다.
In [ ]:
# 년, 월, 일, 시간, 분, 초 추출
df['year'] = df['DateTime4'].dt.year
df['month'] = df['DateTime4'].dt.month
df['day'] = df['DateTime4'].dt.day
df['hour'] = df['DateTime4'].dt.hour
df['minute'] = df['DateTime4'].dt.minute
df['second'] = df['DateTime4'].dt.second
In [ ]:
df.iloc[:,-6:]
Out[ ]:
year | month | day | hour | minute | second | |
---|---|---|---|---|---|---|
0 | 2024 | 2 | 17 | 11 | 45 | 30 |
1 | 2024 | 2 | 18 | 12 | 55 | 45 |
2 | 2024 | 2 | 19 | 13 | 30 | 15 |
In [ ]:
#요일 dayofweek 0:월, 1:화, 2:수, 3:목, 4:금, 5:토, 6:일
df['DateTime4'].dt.dayofweek
Out[ ]:
0 5 1 6 2 0 Name: DateTime4, dtype: int32
In [ ]:
# 주말 5:토, 6:일
df['is주말'] = df['DateTime4'].dt.dayofweek >= 5
df
Out[ ]:
Date1 | Date2 | Date3 | Date4 | Date5 | Date6 | DateTime1 | DateTime2 | DateTime3 | DateTime4 | year | month | day | hour | minute | second | is주말 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2024-02-17 | 2024:02:17 | 24/02/17 | 02/17/2024 | 17-Feb-2024 | 2024년02월17일 | 24-02-17 11:45:30 | 2024-02-17 11-45-30 | 02/17/2024 11:45:30 AM | 2024-02-17 11:45:30 | 2024 | 2 | 17 | 11 | 45 | 30 | True |
1 | 2024-02-18 | 2024:02:18 | 24/02/18 | 02/18/2024 | 18-Feb-2024 | 2024년02월18일 | 24-02-18 12:55:45 | 2024-02-18 12-55-45 | 02/18/2024 12:55:45 PM | 2024-02-18 12:55:45 | 2024 | 2 | 18 | 12 | 55 | 45 | True |
2 | 2024-02-19 | 2024:02:19 | 24/02/19 | 02/19/2024 | 19-Feb-2024 | 2024년02월19일 | 24-02-19 13:30:15 | 2024-02-19 13-30-15 | 02/19/2024 01:30:15 PM | 2024-02-19 13:30:15 | 2024 | 2 | 19 | 13 | 30 | 15 | False |
In [ ]:
# [참고] 기간 to_period()
print(df['DateTime4'].dt.to_period('Y'))
print(df['DateTime4'].dt.to_period('Q'))
print(df['DateTime4'].dt.to_period('M'))
print(df['DateTime4'].dt.to_period('D'))
print(df['DateTime4'].dt.to_period('H'))
0 2024 1 2024 2 2024 Name: DateTime4, dtype: period[A-DEC] 0 2024Q1 1 2024Q1 2 2024Q1 Name: DateTime4, dtype: period[Q-DEC] 0 2024-02 1 2024-02 2 2024-02 Name: DateTime4, dtype: period[M] 0 2024-02-17 1 2024-02-18 2 2024-02-19 Name: DateTime4, dtype: period[D] 0 2024-02-17 11:00 1 2024-02-18 12:00 2 2024-02-19 13:00 Name: DateTime4, dtype: period[H]
참고
[퇴근후딴짓] 빅데이터 분석기사 실기 (작업형1,2,3) | 퇴근후딴짓 - 인프런
퇴근후딴짓 | 비전공자, 입문자가 빅데이터 분석기사 실기를 빠르게 취득할 수 있도록 안내해드려요! 꼭 필요한 파이썬, 판다스, 머신러닝을 다루고 있어요!, ❤️공지❤️ 7회 기출 유형 업데이
www.inflearn.com
'자격증 > 빅데이터분석기사' 카테고리의 다른 글
[빅데이터분석기사] 작업형1 - 문자열 (0) | 2024.06.10 |
---|---|
[빅데이터분석기사] 작업형1 - 시계열 데이터 2 (0) | 2024.06.10 |
[빅데이터분석기사] 작업형1 - 판다스 기초 3 (0) | 2024.06.10 |
[빅데이터분석기사] 작업형1 - 판다스 기초 2 (0) | 2024.06.10 |
[빅데이터분석기사] 작업형 1 - 판다스 기초 1 (0) | 2024.06.10 |