» Mark Option Fanatic

Review of Python Courses (Part 26)

Posted by Mark on February 12, 2021 at 07:29 | Last modified: February 15, 2021 11:54

In Part 25, I summarized my Datacamp courses 74-76. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #77 was Visualizing Time Series Data in Python. This course covers:

Plot your first time series
Customize your time series plot
Clean your time series data (counting missing values in df)
Plot aggregates of your data
Summarizing the value in your time series data
Autocorrelation and partial autocorrelation (from statsmodels.graphics import tsaplots)
Seasonality, noise, and trend in time series data [from pylab import RCparams, sm.tsa.seasonal_decompose()]
Working with more than one time series
Plot multiple time series (adding statistical summaries to your plots)
Find relationships between multiple time series [sns.heatmap(), sns.clustermap()]
Apply your knowledge to a new dataset
Beyond summary statistics
Decompose time series data
Compute correlations between time series

My course #78 was Financial Forecasting in Python. Topics covered in this course include:

Introduction to financial statements
Calculating sales and the cost of goods sold
Working with raw datasets
Introduction to the balance sheet
Balance sheet efficiency ratios
Financial periods and how to work with them
The datetime library and Split function
Tips and tricks when working with datasets
Building sensitive forecast models and common forecast assumptions
Dependencies and sensitivity in financial forecasting
Working with variances in the forecast

My course #79 was Foundations of Probability in Python. This course covers:

Let’s flip a coin in Python (from scipy.stats import bernoulli, binom)
Probability mass and distribution functions
Expected value, mean, and variance (from scipy.stats import describe)
Calculating probabilities of two events (from scipy.stats import find_repeats, relfreq)
Conditional probabilities
Total probability law
Bayes’ rule
Normal distributions (from scipy.stats import norm, import matplotlib.pyplot as plt, import seaborn as sns)
Risk factors
Factor models
Portfolio analysis tools
Normal probabilities
Poisson distributions (from scipy.stats import poisson)
Geometric distributions (from scipy.stats import geom)
From sample mean to population mean (from scipy.stats import binom, describe)
Adding random variables
Linear regression (from sklearn.linear_model import LinearRegression, from scipy.stats import linregress)
Logistic regression (from sklearn.linear_model import LogisticRegression)

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 25)

Posted by Mark on February 9, 2021 at 07:29 | Last modified: February 12, 2021 09:25

In Part 24, I summarized my Datacamp courses 71-73. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #74 was Writing Functions in Python. Overall, I found this content to be quite challenging. The course covers:

Docstrings (require string)
DRY and “do one thing” [standardize function, mean_and_median()]
Pass by assignment
Using context managers
Writing context managers
Advanced topics
Functions as objects
Scope
Closures
Decorators
Real-world examples
Decorators and metadata (from functools import wraps)
Decorators that take arguments
Timeout(): a real-world example

My course #75 was AI Fundamentals. Topics covered in this course include:

What is all the AI fuss about?
All models are wrong but some are useful
Three flavors of machine learning
Supervised learning fundamentals
Training and evaluating classification models (confusion matrix, true/false positives/negatives)
Training and evaluating regression models (from sklearn.preprocessing import PolynomialFeatures)
Dimensionality reduction
Clustering
Anomaly detection
Selecting the right model
Deep learning and beyond
Convolutional neural networks

My course #76 was Introduction to Portfolio Analysis in Python. This course covers:

Welcome to portfolio analysis
Portfolio returns
Measuring risk of a portfolio (formatting as percentage)
Annualized returns
Risk-adjusted returns (calculating SR)
Non-normal distribution of returns
Alternative measures of risk
Comparing against a benchmark
Risk factors
Factor models
Portfolio analysis tools
MPT (from pypfopt.efficient_frontier import EfficientFrontier; from pypfopt import risk_models, expected_returns)
Maximum Sharpe vs. minimum volatility
Alternative portfolio optimization

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 24)

Posted by Mark on February 4, 2021 at 07:41 | Last modified: February 10, 2021 16:23

In Part 23, I summarized my Datacamp courses 68-70. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #71 was Improving Your Data Visualizations in Python. This course covers:

Highlighting data
Comparing groups
Annotations
Color in visualizations
Continuous color palettes
Categorical palettes
Point estimate intervals
Confidence bands
Beyond 95% (visualizing multiple confidence bands at once)
Visualizing the bootstrap
Looking at the farmers market data
Exploring the patterns
Making your visualizations efficient
Tweaking your plots

My course #72 was Command Line Automation in Python. Because I don’t use the shell much, I don’t see a whole lot of application here for me and I’m not sure how much I absorbed. In any case, topics covered in this course include:

Learn the Python interpreter
Capture IPython shell output
Automate with SList
Execute shell commands in subprocess (import subprocess; import os)
Capture output of shell commands (from subprocess import Popen, PIPE)
Sending input to processes
Passing arguments safely to shell commands
Dealing with file systems
Find files matching a pattern (from pathlib import Path; import fnmatch, re)
High-level file and directory operations (from shutil import copytree, ignore_patterns, rmtree, make_archive)
Using pathlib (from pathlib import Path)
Using functions for automation (from functools import wraps)
Understand script input
Introduction to click (import click)
Using click to write command line tools (from click.testing import CliRunner)

My course #73 was Unit Testing for Data Science in Python. This course covers:

Why unit test?
Write a simple unit test using pytest
Understanding test result report
More benefits and test types
Mastering assert statements
Testing for exceptions instead of return values
The well-tested function
Test driven development (TDD)
How to organize a growing set of tests?
Mastering test execution
Expected failures and conditional skipping
Continuous integration and code coverage
Beyond assertion: setup and teardown
Mocking (from unittest.mock import call)
Testing models
Testing plots

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 23)

Posted by Mark on February 1, 2021 at 07:34 | Last modified: February 10, 2021 10:35

In Part 22, I summarized my Datacamp courses 65-67. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #68 was Linear Classifiers in Python. This course covers:

Introduction (import sklearn.datasets)
Applying logistic regression and SVM (general process, from sklearn.svm import LinearSVC)
Linear decision boundaries
Linear classifiers: prediction equations
What is a loss function (from scipy.optimize import minimize)?
Loss function diagrams
Logistic regression and regularization
Logistic regression and probabilities
Multi-class logistic regression
Support vectors
Kernel SVMs
Comparing logistic regression and SVM (from sklearn.linear_model import SGDClassifier)

My course #69 was Analyzing Social Media Data in Python. While I found this somewhat interesting, it seemed to incorporate as much JSON as it did Python. I have a hard enough time studying one new language—adding a second on top of that made things even more confusing for me:

Analyzing Twitter data
Collecting data through the Twitter API (from tweepy import Stream, OAuthHandler, API)
Understanding Twitter JSON
Processing Twitter text
Counting words
Time series
Sentiment analysis
Twitter networks
Importing and visualizing Twitter networks (import networkx as nx)
Node-level metrics
Maps and Twitter data
Geographical data in Twitter JSON
Creating Twitter maps (from mpl_toolkits.basemap import Basemap)

My course #70 was Fraud Detection in Python. This course covers:

Introduction to fraud detection
Increasing successful detections using data resampling (from imblearn.over_sampling import RandomOverSampler)
Fraud detection algorithms in action (from imblearn.pipeline import Pipeline)
Review of classification methods
Performance evaluation (from sklearn.metrics import precision_recall_curve, average_precision_score)
More performance evaluation (from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score)
Adjusting your algorithm weights
Performance evaluation (from sklearn.model_selection import GridSearchCV)
Ensemble methods (from sklearn.ensemble import VotingClassifier)
Normal versus abnormal behavior
Clustering methods (from sklearn.preprocessing import MinMaxScaler; from sklearn.cluster import MiniBatchKMeans)
Assigning fraud versus non-fraud
Other clustering fraud detection methods (from sklearn.cluster import DBSCAN)
Using text data (from nltk import word_tokenize; import string)
Text mining to detect fraud (from nltk.corpus import stopwords; from nltk.stem.wordnet import WordNetLemmatizer)
Topic modeling on fraud (from gensim import corpora)
Flagged fraud based on topics (import pyLDAvis.gensim for use with Jupyter Notebooks only)

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 22)

Posted by Mark on January 29, 2021 at 07:31 | Last modified: February 9, 2021 13:29

In Part 21, I summarized my Datacamp courses 62-64. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #65 was Reshaping Data with pandas. This course covers:

Wide and long formats
Reshaping using pivot method
Pivot tables
Reshaping with melt
Wide to long function
Working with string columns
Stacking dataframes
Unstacking dataframes
Working with multiple levels
Handling missing data
Reshaping and combining data
Transforming a list-like column
Reading nested data into a dataframe (from pandas import json_normalize)
Dealing with nested data columns

My course #66 was Building Data Engineering Pipelines in Python. For some reason, these data engineering courses did not sit well with me and much of this sailed over my head. This course covers:

Components of a data platform
Introduction to data ingestion with Singer
Running an ingestion pipeline with Singer
Basic introduction to PySpark (from pyspark.sql import SparkSession)
Cleaning data
Transforming data with Spark
Packaging your application
On the importance of tests
Writing unit tests for PySpark
Continuous testing
Modern day workflow management
Building a data pipeline with Airflow (from airflow.operators.bash_operator import BashOperator)
Deploying Airflow (from airflow.models import DagBag)

My course #67 was Importing and Managing Financial Data in Python. This course covers:

Reading, inspecting, and cleaning data from CSV (parse_dates explained)
Read data from Excel worksheets
Combine data from multiple worksheets (importing market data from multiple Excel files)
The DataReader: access financial data online (from pandas_datareader.data import DataReader)
Economic data from the Federal Reserve
Select stocks and get data from Google Finance
Get several stocks and manage a MultiIndex
Summarize your data with descriptive stats
Describe the distribution of your data with quantiles (np.arange() to .describe() with constant-step percentiles)
Visualize the distribution of your data [ax = sns.distplot(df)]
Summarize categorical variables
Aggregate your data by category
Summary statistics by category with seaborn [sns.countplot()]
Distributions by category with seaborn [sns.boxplot(), sns.swarmplot()]

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 21)

Posted by Mark on January 26, 2021 at 07:10 | Last modified: February 8, 2021 14:22

In Part 20, I summarized my Datacamp courses 59-61. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #62 was Time Series Analysis in Python. This clearly has potential applications for investment returns, but in the end I wasn’t totally sure what those might be. The course covers:

Introduction to the course
Correlation of two time series
Simple linear regressions (in statsmodels, numpy, pandas, scipy)
Autocorrelation (convert index to datetime)
Autocorrelation function (from statsmodels.graphics.tsaplots import plot_acf; from statsmodels.tsa.stattools import acf)
White noise
Random walk (from statsmodels.tsa.stattools import adfuller)
Stationarity
Introducing an AR model (from statsmodels.tsa.arima_process import ArmaProcess)
Estimating and forecasting an AR model
Choosing the right model (from statsmodels.graphics.tsaplots import plot_pacf)
Estimation and forecasting an MA model
ARMA models
Cointegration models
Case study: climate change

My course #63 was Intermediate Predictive Analytics in Python. This course covers:

The basetable timeline
The population
The target
Adding predictive variables
Adding aggregated variables
Adding evolutions
Using evolution variables
Creating dummies (avoiding multicollinearity)
Missing values (list comprehension)
Handling outliers (from scipy.stats.mstats import winsorize)
Transformations
Seasonality
Using multiple snapshots
The timegap

My course #64 was Building and Distributing Packages with Conda. This is another shell-related course I found hard to absorb since I do very little in the shell. I’m not the only newbie who feels this way, either. This was a recent post to the group:

> I have been doing Python courses for a while but now I actually wanna try some real
> live data on my laptop and I am not sure on how to install all of the needed stuff
> (pandas, numpy, etc.). I have downloaded the latest Python version and the PyCharm
> editor but… [the courses] do not really have anything to show you how to actually
> make the rest of the things work for inexperienced people such as myself.

I downloaded Spyder IDE, which has met most of my needs. It crashes sometimes and gives repetitive errors upon start-up, though, which are both quite annoying. I’ve also had mixed results downloading some libraries like Backtester.

Speaking of Anaconda, or conda for short, my 64th course covers:

Anaconda Project
Anaconda Project specification file
Anaconda Project commands
Python module and packages
Python package directory
Conda packages
Conda package dependencies

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 20)

Posted by Mark on January 21, 2021 at 07:00 | Last modified: February 8, 2021 10:04

In Part 19, I summarized my Datacamp courses 56-58. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #59 was Dealing with Missing Data in Python. This course covers:

Why deal with missing data (built-in Python NoneType vs. np.nan)?
Handling missing values
Analyze the amount of missingness (import missingno as msno)
Is the data missing at random?
Finding patterns in missing data
Visualizing missingness across a variable
When and how to delete missing data
Mean, median, and mode imputations (from sklearn.impute import SimpleImputer)
Imputing time-series data
Visualizing time-series imputations
Imputing using fancyimpute (from fancyimpute import KNN; from fancyimpute import IterativeImputer)
Imputing categorical values
Evaluation of different imputation techniques

My course #60 was Intermediate Python for Finance. This course covers:

Representing time with datetimes
Working with datetimes
Dictionaries
Comparison operators
Boolean operators
If statements (with dictionary)
For and while loops
Creating a dataframe
Accessing data
Aggregating and summarizing
Extending and manipulating data
Peeking at data with head, tail, and describe
Filtering data
Plotting data

My course #61 was Object-Oriented Programming in Python. These OOP-related courses were really confusing to me the first time through. This course covers:

What is OOP?
Class anatomy: attributes and methods
Class anatomy: the __init__ constructor
Instance and class data
Class inheritance
Customizing functionality via inheritance
Operator overloading: comparison
Operator overloading: string representation
Exceptions (try – except – finally)
Designing for inheritance and polymorphism (Liskov substitution principle)
Managing data access: private attributes
Properties

I will review more courses next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 19)

Posted by Mark on January 19, 2021 at 07:12 | Last modified: February 6, 2021 04:54

In Part 18, I summarized my Datacamp courses 53-55. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #56 was Writing Efficient Code with pandas. This course covers:

The need for efficient coding (time.time(), list comprehensions faster than for loop)
Locate rows: .iloc[] (generally faster for rows) and .loc[] (generally faster for columns)
Select random rows (built-in sample() function faster than numpy random integer generator)
Replace scalar values using .replace() (much faster than using .loc[] to find values and reassigning them)
Replace values using lists (.replace() faster than using .loc[] )
Replace values using dictionaries (faster than using lists)
Looping through the .iterrows() function [for loop using .range() is faster than the smarter/cleaner/optimized .iterrows()]
Looping through the .apply() function (faster iterating along rows while native pandas .sum() faster along columns)
Vectorization over pandas series [vectorization method .apply() works faster than .iterrows()]
Vectorization using NumPy arrays using .values() (summing arrays is faster than summing series)
Data transformation using .groupby().transform (.transform() cleaner and much faster than native Python code)
Missing value imputation using .transform() (.transform() much faster than native Python code)
Data filtration using the .filter() function (.groupby().filter() faster than list comprehension + for loop)

My course #57 was Credit Risk Modeling in Python. This course covers:

Understanding credit risk
Outliers in credit data
Risk with missing data in loan data (finding, counting, and replacing missing data)
Logistic regression for probability of default
Predicting the probability of default
Credit model performance
Model discrimination and impact
Gradient boosted trees with XGBoost
Column selection for credit risk
Cross validation for credit models
Class imbalance in loan data
Model evaluation and implementation (from sklearn.calibration import calibration_curve)
Credit acceptance rates
Credit strategy and maximum expected loss

My course #58 was Analyzing IoT Data in Python. This course covers:

Introduction to IoT data
Understand the data
Introduction to data streams (import paho.mqtt.subscribe as subscribe)
Perform EDA
Clean data
Gather minimalistic incremental data
Prepare and visualize incremental data
Combining data sources for further analysis
Correlation
Outliers (from statsmodels.graphics import tsaplots)
Seasonality and trends
Prepare data for machine learning
Scaling data for machine learning
Develop machine learning pipeline (from sklearn.pipeline import Pipeline)
Apply a machine learning model

I will review more classes next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 18)

Posted by Mark on January 15, 2021 at 07:15 | Last modified: February 5, 2021 10:08

In Part 17, I summarized my Datacamp courses 50-52. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #53 was Introduction to Python for Finance. This course covers:

Why Python for finance?
Comments and variables
Variable data types
Lists in Python
Lists in lists
Methods and functions
Arrays (probably best for financial analysis)
Two dimensional arrays
Using arrays for analyses (indexing arrays—might work in place of .loc or .iloc?)
Visualization in Python
Histograms (normed arg)
Introducing the dataset
Closer look at the sectors
Visualizing trends

My course #54 was Experimental Design in Python. This course covers:

Intro to experimental design (import plotnine as p9)
Our first hypothesis test—Student’s t-test (from scipy import stats)
Testing proportion and correlation [stats.chisquare(), stats.fisher_exact(), stats.pearsonr()]
Confounding variables
Blocking and randomization (random sampling)
ANOVA [import statsmodels as sm, stats.f_oneway()]
Interactive effects (two- and three-way ANOVAs)
Type I error (Bonferroni and Šidák correction for multiple comparisons)
Sample size (from statsmodels.stats import power as pwr)
Power
Assumptions and normal distributions (Q-Q plot)
Testing for normality [from scipy import stats, stats.shapiro()]
Non-parametric tests: Wilcoxon rank-sum and signed-rank (paired) test
More non-parametric tests: Spearman correlation

My course #55 was Introduction to Data Engineering. For some reason, these data engineering courses are not my cup of tea. This course covers:

What is data engineering?
Tools of the data engineer (data engineers are expert users of database systems)
Cloud providers
Databases
Parallel computing (from multiprocessing import Pool) and computation frameworks
Workflow scheduling frameworks
Extract
Transform
Loading
Putting it all together
Case study: course ratings
From ratings to recommendations
Scheduling daily jobs

I will review more classes next time.

Categories: Python | Comments (0) | Permalink

Review of Python Courses (Part 17)

Posted by Mark on January 12, 2021 at 07:13 | Last modified: February 4, 2021 13:11

In Part 16, I summarized my Datacamp courses 47-49. Today I will continue with the next three.

As a reminder, I introduced you to my recent work learning Python here.

My course #50 was Introduction to Shell. This course covers:

How does the shell compare to a desktop interface?
Where am I and how can I identify files and directories?
How can I move to another directory (~ is home)?
How to copy, rename, move, and delete files
How to create and delete directories
How to view file contents
Modifying commands with flags
Getting help for a command
Selecting columns from a file
Repeating commands
Selecting lines with certain values
Storing command output to a file or using as input
Combining commands with pipe symbol
Counting records in a file
Specifying multiple files at once
Wildcards
Sorting lines of text and removing duplicate lines
How to stop a running program
Printing a variable’s value
How does the shell store information?
Repeating commands many times or once for each file
Recording names of a set of files
Variable’s name versus its value
Running many commands in a single loop
Using semicolons to do multiple things in a single loop
Editing a file
Saving commands to rerun later
Reusing pipes
Passing filenames to scripts
Processing a single argument
Writing loops in a shell script

My course #51 was Generalized Linear Models (GLM) in Python. This material is thick and really demands a third look (for me). This course covers:

Going beyond linear regression (import statsmodels.api as sm; from statsmodels.formula.api import glm)
How to build a GLM?
How to fit a GLM in Python?
Binary data and logistic regression (odds, odds ratio, and probability)
Interpreting coefficients
Interpreting model inference
Computing and describing predictions
Count data and Poisson distribution
Interpreting model fit
The problem of overdispersion
Multivariable logistic regression (from statsmodels.stats.outliers_influence import variance_inflation_factor)
Comparing models
Model formula (from patsy import dmatrix)
Categorical and interaction terms

My course #52 was Pandas Joins for Spreadsheet Users. This course covers:

Joining data: a real-world necessity
Concatenation
Power and flexibility
Types of joins
A closer look at one-to-one joins
Combining common data with inner joins
“Out of many, one”
Joining on key columns
Index-based joins
Joining data in real life
Working with time data

I will review more classes next time.

Categories: Python | Comments (0) | Permalink

Older Entries Newer Entries

Review of Python Courses (Part 26)

Review of Python Courses (Part 25)

Review of Python Courses (Part 24)

Review of Python Courses (Part 23)

Review of Python Courses (Part 22)

Review of Python Courses (Part 21)

Review of Python Courses (Part 20)

Review of Python Courses (Part 19)

Review of Python Courses (Part 18)

Review of Python Courses (Part 17)

Pages

Recent Posts

Categories