Downcast | Notion

데이터 타입

Untitled

데이터의 타입이 int64

데이터 타입의 크기 설명 보기

	Data type	Description
0	bool	Boolean (True or False) stored as a byte
1	int	Platform integer (normally either int32 or int64)
2	int8	Byte (-128 to 127)
3	int16	Integer (-32768 to 32767)
4	int32	Integer (-2147483648 to 2147483647)
5	int64	Integer (9223372036854775808 to 9223372036854775807)
6	uint8	Unsigned integer (0 to 255)
7	uint16	Unsigned integer (0 to 65535)
8	uint32	Unsigned integer (0 to 4294967295)
9	uint64	Unsigned integer (0 to 18446744073709551615)
10	float	Shorthand for float64.
11	float16	Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
12	float32	Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
13	float64	Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
14	complex	Shorthand for complex128.
15	complex64	Complex number, represented by two 32-bit floats
16	complex128	Complex number, represented by two 64-bit floats

출처 : https://github.com/rougier/numpy-tutorial#quick-references

만약 int에서 음수가 필요가 없다면 uint16를 해주는게 용량을 줄일 수 있다.
음수여부
- describe() 를 이용하여 min 값을 확인

<aside> 🐰 현재 데이터에서 음수도 없기 때문에 데이터 타입을 변경하여 용량을 줄이기로 함!

</aside>

메모리 줄이기

1. 데이터 타입 변경

데이터 타입 확인
- df.dtypes
downcast 해주기
- int64 타입
  - pd.to_numeric(df[col], downcast='unsigned')
- float64 타입
  - pd.to_numeric(df[col], downcast = ‘float’)
용량 차이 확인 info()
- 원본 용량 : 36.4MB → 16.1MB

Untitled

Untitled

2. 필요없는 컬럼 삭제

df.nunique() 로 확인해보니 “데이터 공개일자” 컬럼의 내용이 모두 같은 값임을 확인할 수 있었다.