{{< partial "learn_x_header" >}}
## What You'll Build
You'll build a CSV sales analyzer that reads store sales data and prints it in manageable chunks. By the end, you'll have a working Python program and understand the fundamentals.
This tutorial takes about 45 minutes. You'll need basic command line skills and a text editor.
## Prerequisites
Before starting, you need:
* A command line terminal (Terminal on Mac/Linux, PowerShell on Windows).
* A text editor (VS Code, Sublime Text, or even Notepad).
* Internet connection (for downloading Python and packages).
* 45 minutes of focused time.
Don't worry if you're new to programming. I'll explain each step.
## Step 1: Install and Verify Python
First, check if Python is already installed.
Open your terminal and run:
```sh
python3 --version
```
You should see output like:
```sh
Python 3.9.7
```
If you see an error like "command not found," [install Python] first. Download the installer, run it, and make sure to check "Add Python to PATH" during installation.
After installing, run `python3 --version` again to verify.
## Step 2: Create Your Project Folder
Create a folder for your project.
```sh
mkdir csv-analyzer
cd csv-analyzer
```
Verify you're in the right folder:
```sh
pwd
```
You should see a path ending in `csv-analyzer`, like:
```sh
/Users/yourname/csv-analyzer
```
This keeps your project organized and isolated.
## Step 3: Say Hello to Python
Let's verify Python works by creating a simple program.
Create a file called `hello.py`:
```sh
touch hello.py
```
Open `hello.py` in your text editor and add this line:
```python
print("Hello, Python!")
```
Save the file and run it:
```sh
python3 hello.py
```
You should see:
```sh
Hello, Python!
```
If you see this, Python is working. If you see an error, check that you saved the file and you're in the `csv-analyzer` folder.
**What you learned:** The `print()` function displays text. You just wrote and ran your first Python program.
## Why Python?
Python is a programming language known for readable syntax and powerful libraries. It's popular for data analysis, web development, and automation. Created by [Guido van Rossum] in 1991, it's now one of the most-used languages in the world.
You're learning Python by building something real. Let's keep going.
## Step 4: Create Sample Sales Data
Create a CSV file with sample sales data.
Create a file called `example.csv`:
```sh
touch example.csv
```
Open `example.csv` and add this data:
```csv
store,sales
Office A,7
Office B,3
Office C,9
Office D,100
Office E,4
Office F,96
Office G,56
Office H,34
Office I,37
Office J,7
```
Save the file. This is the data your analyzer will process.
**What you learned:** CSV (Comma-Separated Values) files store data in rows and columns. The first row is the header (store, sales), and each following row is a record.
## Step 5: Read the CSV File
Now make Python read the file.
Create a file called `analyze.py`:
```sh
touch analyze.py
```
Open `analyze.py` and add this code:
```python
filename = "example.csv"
print(f"Reading {filename}")
try:
with open(filename, "r") as file:
content = file.read()
print(content)
except FileNotFoundError:
print(f"Error: {filename} not found. Check that the file exists.")
```
Run it:
```sh
python3 analyze.py
```
You should see:
```sh
Reading example.csv
store,sales
Office A,7
Office B,3
Office C,9
Office D,100
Office E,4
Office F,96
Office G,56
Office H,34
Office I,37
Office J,7
```
**What you learned:**
* **Variables** store values. `filename = "example.csv"` creates a variable called `filename`.
* **f-strings** format text with variables. `f"Reading {filename}"` inserts the filename into the text.
* **try/except** handles errors. If the file doesn't exist, the program prints an error instead of crashing.
* **with open()** opens files safely. Python automatically closes the file when done.
## Step 6: Set Up a Virtual Environment
Before installing packages, create a virtual environment. This keeps your project's packages separate from your system Python.
Run this command in your `csv-analyzer` folder:
```sh
python3 -m venv venv
```
This creates a folder called `venv` that holds your project's packages.
Now activate the environment:
On Mac/Linux:
```sh
source venv/bin/activate
```
On Windows:
```sh
venv\Scripts\activate
```
You should see `(venv)` at the start of your terminal prompt:
```sh
(venv) user@laptop:~/csv-analyzer$
```
If you see this, your virtual environment is active.
**What you learned:** Virtual environments prevent package conflicts. Each project gets its own isolated Python environment. This is critical for professional Python development.
**Note:** If the command hangs on a network drive, move your folder to a physical disk.
## Step 7: Install the Pandas Library
Pandas is a powerful library for working with data. Install it:
```sh
pip install pandas
```
You should see output like:
```sh
Successfully installed pandas-2.0.0 numpy-1.24.0 ...
```
Verify it's installed:
```sh
pip freeze
```
You should see a list including:
```sh
pandas==2.0.0
numpy==1.24.0
...
```
**What you learned:** `pip` is Python's package installer. `pip install` downloads and installs packages. `pip freeze` shows installed packages and versions.
## Step 8: Parse CSV Data with Pandas
Now use Pandas to read and analyze the CSV file.
Open `analyze.py` and replace the contents with this:
```python
import pandas as pd
filename = "example.csv"
print(f"Reading {filename}")
print()
try:
data = pd.read_csv(filename)
print(data)
except FileNotFoundError:
print(f"Error: {filename} not found.")
```
Run it:
```sh
python3 analyze.py
```
You should see formatted output:
```sh
Reading example.csv
store sales
0 Office A 7
1 Office B 3
2 Office C 9
3 Office D 100
4 Office E 4
5 Office F 96
6 Office G 56
7 Office H 34
8 Office I 37
9 Office J 7
```
**What you learned:**
* **import** loads libraries. `import pandas as pd` loads Pandas and gives it a short name (`pd`).
* **pd.read_csv()** parses CSV files into a structured format called a DataFrame.
* Pandas automatically formats the data into neat columns with row numbers.
## Step 9: Chunk the Data
For large CSV files, reading everything at once can be slow. Let's process the data in chunks.
Open `analyze.py` and replace the contents with this:
```python
import pandas as pd
filename = "example.csv"
chunksize = 3
print(f"Reading {filename} in chunks of {chunksize} rows")
print()
try:
for chunk in pd.read_csv(filename, chunksize=chunksize):
print(chunk)
print()
except FileNotFoundError:
print(f"Error: {filename} not found.")
```
Run it:
```sh
python3 analyze.py
```
You should see:
```sh
Reading example.csv in chunks of 3 rows
store sales
0 Office A 7
1 Office B 3
2 Office C 9
store sales
3 Office D 100
4 Office E 4
5 Office F 96
store sales
6 Office G 56
7 Office H 34
8 Office I 37
store sales
9 Office J 7
```
The data is now processed in groups of 3 rows.
**What you learned:**
* **for loops** repeat actions. `for chunk in ...` processes each chunk one at a time.
* **chunksize** tells Pandas to read the file in pieces instead of all at once.
* This technique is essential for processing large files that don't fit in memory.
## Step 10: Calculate Total Sales
Let's add analysis. Calculate the total sales for each chunk.
Open `analyze.py` and replace the contents with this:
```python
import pandas as pd
filename = "example.csv"
chunksize = 3
total_sales = 0
print(f"Analyzing {filename}")
print()
try:
for chunk in pd.read_csv(filename, chunksize=chunksize):
chunk_total = chunk['sales'].sum()
total_sales += chunk_total
print(f"Chunk total: {chunk_total}")
print(chunk)
print()
print(f"Total sales across all chunks: {total_sales}")
except FileNotFoundError:
print(f"Error: {filename} not found.")
```
Run it:
```sh
python3 analyze.py
```
You should see:
```sh
Analyzing example.csv
Chunk total: 19
store sales
0 Office A 7
1 Office B 3
2 Office C 9
Chunk total: 200
store sales
3 Office D 100
4 Office E 4
5 Office F 96
Chunk total: 127
store sales
6 Office G 56
7 Office H 34
8 Office I 37
Chunk total: 7
store sales
9 Office J 7
Total sales across all chunks: 353
```
**What you learned:**
* **Accessing columns:** `chunk['sales']` gets the sales column from the chunk.
* **sum()** adds all values in a column.
* **Accumulating values:** `total_sales += chunk_total` adds each chunk's total to a running sum.
## Step 11: Save Your Dependencies
Other people (or future you) will need to know which packages your project uses.
Run this command:
```sh
pip freeze > requirements.txt
```
This creates a file called `requirements.txt` with all installed packages and versions.
View it:
```sh
cat requirements.txt
```
You should see:
```sh
numpy==1.24.0
pandas==2.0.0
python-dateutil==2.8.2
pytz==2023.3
six==1.16.0
```
Anyone can now install the same packages with:
```sh
pip install -r requirements.txt
```
**What you learned:** `requirements.txt` is a standard file that lists project dependencies. This makes your project reproducible.
## Step 12: Leave the Virtual Environment
When you're done working, deactivate the virtual environment:
```sh
deactivate
```
The `(venv)` prefix should disappear from your prompt. You're back to your system Python.
You can reactivate anytime with `source venv/bin/activate` (Mac/Linux) or `venv\Scripts\activate` (Windows).
## What You Built
You created a CSV analyzer that:
* Reads sales data from a file.
* Processes the data in chunks.
* Calculates totals for each chunk and overall.
* Handles errors gracefully.
You learned:
* **Variables:** Store values like `filename = "example.csv"`.
* **f-strings:** Format text with variables like `f"Reading {filename}"`.
* **Functions:** Like `print()`, `open()`, and `sum()`.
* **Loops:** Repeat actions with `for chunk in ...`.
* **Exception handling:** Catch errors with `try/except`.
* **Libraries:** Import and use external packages like Pandas.
* **Virtual environments:** Isolate project dependencies.
* **Package management:** Install packages with `pip` and track them with `requirements.txt`.
## Troubleshooting
**"command not found: python3"**
Python isn't installed or isn't in your PATH. [Download Python](https://www.python.org/downloads/) and check "Add Python to PATH" during installation. Verify with `python3 --version`.
**"No module named pandas"**
Your virtual environment isn't activated, or Pandas isn't installed. Run `source venv/bin/activate` (Mac/Linux) or `venv\Scripts\activate` (Windows), then run `pip install pandas`.
**"FileNotFoundError"**
The program can't find `example.csv`. Check:
* You're in the `csv-analyzer` folder when running the program.
* The file is named exactly `example.csv` (case matters).
* The file is in the same folder as `analyze.py`.
**"venv/bin/activate: No such file or directory"**
You haven't created the virtual environment yet. Run `python3 -m venv venv` first.
**Virtual environment command hangs**
This can happen on network drives. Move your project folder to a physical disk (like your home directory) and try again.
## Where to Go Next
You've learned Python basics by building a real tool. Here's what to explore next:
### More Python Concepts
You used variables, loops, functions, and exception handling. Here are more concepts to learn:
* **Data types:** Integers, floats, booleans, lists, dictionaries.
* **Classes:** Create custom objects like `class Office:`.
* **List comprehensions:** Shorthand for creating lists like `[x * 2 for x in numbers]`.
* **Lambda functions:** Short anonymous functions like `lambda x: x * 2`.
### Python Use Cases
**Data analysis and machine learning:** Pandas, NumPy, TensorFlow, scikit-learn.
**Web development:** Django, Flask, FastAPI.
**Automation and scripting:** Automate repetitive tasks, process files, interact with APIs.
**Less common but possible:** Mobile apps (Kivy), desktop apps (PyQt), games (PyGame), embedded systems (MicroPython).
### Learning Resources
**Books:**
* [Python Crash Course] by Eric Matthes (beginner-friendly, project-based).
* [Python for Data Analysis] by Wes McKinney (creator of Pandas).
* [Learning Python, 5th Edition] by Mark Lutz (comprehensive reference).
**Videos:**
* [Learning Python] on LinkedIn Learning.
* [Python Essential Training] on LinkedIn Learning.
* [Complete Python Developer in 2020: Zero to Mastery] on Udemy.
**Online:**
* [The official Python tutorial](https://docs.python.org/3/tutorial/) for core language features.
* [The Zen of Python](https://www.python.org/dev/peps/pep-0020/) for Python philosophy.
* [Python Package Index] to discover libraries.
### Extend Your Project
Challenge yourself by adding features to your analyzer:
* Filter stores with sales above a threshold.
* Sort stores by sales amount.
* Calculate average sales per store.
* Read the filename from command line arguments.
* Export results to a new CSV file.
### Example Code Repository
View the complete code for this tutorial in [my repository on GitHub]:
```sh
git clone git@github.com:jeffabailey/learn.git
cd learn/programming/python
```
## Appendix: Python Quick Reference
Here are common Python constructs you'll encounter. Use this as a reference after completing the tutorial. You don't need to memorize these now. Come back to this section when you need to look something up.
### Variables
```python
office_name = "Office A"
office_sales = 7
office_score = 7.5
office_is_active = True
```
Python uses [snake_case] for variable names. See [the naming section of Google's style guide] for conventions.
### Comments
```python
# Single-line comment
"""
Multi-line comment
for longer explanations
"""
```
### Control Structures
**For loop:**
```python
offices = ["Office A", "Office B", "Office C"]
for office in offices:
print(office)
```
**While loop:**
```python
offices = ["Office A", "Office B", "Office C"]
while offices:
print(offices.pop())
```
**If-else statement:**
```python
if office_b_sales > office_a_sales:
print("Office B has more sales")
elif office_a_sales > office_b_sales:
print("Office A has more sales")
else:
print("Sales are equal")
```
### Functions
```python
def calculate_total(sales_list):
total = sum(sales_list)
return total
result = calculate_total([7, 3, 9])
print(result) # 19
```
### Classes
```python
class Office:
def __init__(self, name, location, sales):
self.name = name
self.location = location
self.sales = sales
office = Office("Office A", "Portland, Oregon", 7)
print(f"Name: {office.name}")
print(f"Sales: {office.sales}")
```
### Exception Handling
```python
try:
file = open("data.csv", "r")
content = file.read()
except FileNotFoundError:
print("File not found")
finally:
if file:
file.close()
```
### Lists (Arrays)
```python
offices = ["Office A", "Office B", "Office C"]
# Access
print(offices[0]) # Office A
# Update
offices[0] = "Office Z"
# Length
print(len(offices)) # 3
# Add
offices.append("Office D")
# Remove
offices.remove("Office B")
# Loop
for office in offices:
print(office)
```
### Operators
**Arithmetic:**
```python
addition = 1 + 1
subtraction = 2 - 1
multiplication = 3 * 3
division = 10 / 5
modulus = 6 % 3
exponent = 2 ** 3
```
**Assignment:**
```python
x = 1
x += 1 # x is now 2
x -= 1 # x is now 1
x *= 5 # x is now 5
x /= 5 # x is now 1.0
```
**Comparison:**
```python
a == b # Equal
a != b # Not equal
a > b # Greater than
a < b # Less than
a >= b # Greater than or equal
a <= b # Less than or equal
```
**Note:** For type comparisons, use [isinstance()][the isinstance built-in function] instead of operators.
### Lambda Functions
```python
offices = [
{'name': 'Office A', 'sales': 7},
{'name': 'Office B', 'sales': 3},
{'name': 'Office C', 'sales': 9}
]
# Find office with highest sales
top_office = max(offices, key=lambda x: x['sales'])
print(top_office) # {'name': 'Office C', 'sales': 9}
```
Use lambdas for simple operations. For complex logic, use regular functions.
## Related Content
* [Python Package Index] to search for Python packages.
* [The Zen of Python](https://www.python.org/dev/peps/pep-0020/) for Python philosophy.
* [W3Schools Python Tutorial](https://www.w3schools.com/python/) for more examples.
[Guido van Rossum]: https://en.wikipedia.org/wiki/Guido_van_Rossum
[install Python]: https://www.python.org/downloads/
[Python Crash Course]: https://amzn.to/3d2s9kw
[Python for Data Analysis]: https://amzn.to/2TxtmZc
[Learning Python, 5th Edition]: https://amzn.to/3edZhFX
[Learning Python]: https://www.linkedin.com/learning/learning-python-25309312
[Python Essential Training]: https://www.linkedin.com/learning/python-essential-training-2?u=2130809
[Complete Python Developer in 2020: Zero to Mastery]: https://www.udemy.com/course/complete-python-developer-zero-to-mastery/
[my repository on GitHub]: https://github.com/jeffabailey/learn
[Python Package Index]: https://pypi.org/
[snake_case]: https://peps.python.org/pep-0008/#naming-conventions
[the naming section of Google's style guide]: https://google.github.io/styleguide/pyguide.html#316-naming
[the isinstance built-in function]: https://docs.python.org/3.7/library/functions.html#isinstance