@@ -354,21 +354,22 @@ df.loc[complexCondition]
354354
355355The ability to make changes in dataframes is important to generate a clean dataset for future analysis.
356356
357- 1 . We can use ` df.where() ` conveniently to "keep" the rows we have selected and replace the rest rows with any other values
357+
358+ ** 1.** We can use ` df.where() ` conveniently to "keep" the rows we have selected and replace the rest rows with any other values
358359
359360``` {code-cell} python3
360361df.where(df.POP >= 20000, False)
361362```
362363
363364
364- 2 . We can simply use ` .loc[] ` to specify the column that we want to modify, and assign values
365+ ** 2. ** We can simply use ` .loc[] ` to specify the column that we want to modify, and assign values
365366
366367``` {code-cell} python3
367368df.loc[df.cg == max(df.cg), 'cg'] = np.nan
368369df
369370```
370371
371- 3 . We can use the ` .apply() ` method to modify rows/columns as a whole
372+ ** 3. ** We can use the ` .apply() ` method to modify * rows/columns as a whole*
372373
373374``` {code-cell} python3
374375def update_row(row):
@@ -382,25 +383,29 @@ def update_row(row):
382383df.apply(update_row, axis=1)
383384```
384385
385- 4 . We can use the ` .applymap() ` method to modify all individual entries in the dataframe altogether.
386+ ** 4. ** We can use the ` .applymap() ` method to modify all * individual entries* in the dataframe altogether.
386387
387388``` {code-cell} python3
389+ # Round all decimal numbers to 2 decimal places
390+ df.applymap(lambda x : round(x,2) if type(x)!=str else x)
391+ ```
392+
393+ ** Application: Missing Value Imputation**
394+
395+ Replacing missing values is an important step in data munging.
396+
397+ Let's randomly insert some NaN values
388398
389- # Let us randomly insert some NaN values
399+ ``` {code-cell} python3
390400for idx in list(zip([0, 3, 5, 6], [3, 4, 6, 2])):
391401 df.iloc[idx] = np.nan
392402
393403df
394404```
395405
396- The ` zip ` function here creates pairs of values at the corresponding position of the two lists (i.e. [ 0,3] , [ 3,4] ...)
397-
398-
399- ** Application: Missing Value Imputation**
400-
401- Replacing missing values is an important step in data munging.
406+ The ` zip() ` function here creates pairs of values from the two lists (i.e. [ 0,3] , [ 3,4] ...)
402407
403- We can use the functions above to replace missing values
408+ We can use the ` .applymap() ` method again to replace all missing values with 0
404409
405410``` {code-cell} python3
406411# replace all NaN values by 0
@@ -413,9 +418,9 @@ def replace_nan(x):
413418df.applymap(replace_nan)
414419```
415420
416- Pandas also provides us with convenient methods to replace missing values
421+ Pandas also provides us with convenient methods to replace missing values.
417422
418- for example, single imputation using variable means can be easily done in pandas
423+ For example, single imputation using variable means can be easily done in pandas
419424
420425``` {code-cell} python3
421426df = df.fillna(df.iloc[:,2:8].mean())
@@ -426,7 +431,7 @@ Missing value imputation is a big area in data science involving various machine
426431
427432There are also more [ advanced tools] ( https://scikit-learn.org/stable/modules/impute.html ) in python to impute missing values.
428433
429- ### Standardization and Summarization
434+ ### Standardization and Visualization
430435
431436Let's imagine that we're only interested in the population (` POP ` ) and total GDP (` tcgdp ` ).
432437
0 commit comments