What You’ll Build

You’ll build a CSV sales analyzer that reads store sales data and prints it in manageable chunks. By the end, you’ll have a working Python program and understand the fundamentals.

This tutorial takes about 45 minutes. You’ll need basic command line skills and a text editor.

Prerequisites

Before starting, you need:

  • A command line terminal (Terminal on Mac/Linux, PowerShell on Windows).
  • A text editor (VS Code, Sublime Text, or even Notepad).
  • Internet connection (for downloading Python and packages).
  • 45 minutes of focused time.

Don’t worry if you’re new to programming. I’ll explain each step.

Step 1: Install and Verify Python

First, check if Python is already installed.

Open your terminal and run:

python3 --version

You should see output like:

Python 3.9.7

If you see an error like “command not found,” install Python first. Download the installer, run it, and make sure to check “Add Python to PATH” during installation.

After installing, run python3 --version again to verify.

Step 2: Create Your Project Folder

Create a folder for your project.

mkdir csv-analyzer
cd csv-analyzer

Verify you’re in the right folder:

pwd

You should see a path ending in csv-analyzer, like:

/Users/yourname/csv-analyzer

This keeps your project organized and isolated.

Step 3: Say Hello to Python

Let’s verify Python works by creating a simple program.

Create a file called hello.py:

touch hello.py

Open hello.py in your text editor and add this line:

print("Hello, Python!")

Save the file and run it:

python3 hello.py

You should see:

Hello, Python!

If you see this, Python is working. If you see an error, check that you saved the file and you’re in the csv-analyzer folder.

What you learned: The print() function displays text. You just wrote and ran your first Python program.

Why Python?

Python is a programming language known for readable syntax and powerful libraries. It’s popular for data analysis, web development, and automation. Created by Guido van Rossum in 1991, it’s now one of the most-used languages in the world.

You’re learning Python by building something real. Let’s keep going.

Step 4: Create Sample Sales Data

Create a CSV file with sample sales data.

Create a file called example.csv:

touch example.csv

Open example.csv and add this data:

store,sales
Office A,7
Office B,3
Office C,9
Office D,100
Office E,4
Office F,96
Office G,56
Office H,34
Office I,37
Office J,7

Save the file. This is the data your analyzer will process.

What you learned: CSV (Comma-Separated Values) files store data in rows and columns. The first row is the header (store, sales), and each following row is a record.

Step 5: Read the CSV File

Now make Python read the file.

Create a file called analyze.py:

touch analyze.py

Open analyze.py and add this code:

filename = "example.csv"
print(f"Reading {filename}")

try:
    with open(filename, "r") as file:
        content = file.read()
        print(content)
except FileNotFoundError:
    print(f"Error: {filename} not found. Check that the file exists.")

Run it:

python3 analyze.py

You should see:

Reading example.csv
store,sales
Office A,7
Office B,3
Office C,9
Office D,100
Office E,4
Office F,96
Office G,56
Office H,34
Office I,37
Office J,7

What you learned:

  • Variables store values. filename = "example.csv" creates a variable called filename.
  • f-strings format text with variables. f"Reading {filename}" inserts the filename into the text.
  • try/except handles errors. If the file doesn’t exist, the program prints an error instead of crashing.
  • with open() opens files safely. Python automatically closes the file when done.

Step 6: Set Up a Virtual Environment

Before installing packages, create a virtual environment. This keeps your project’s packages separate from your system Python.

Run this command in your csv-analyzer folder:

python3 -m venv venv

This creates a folder called venv that holds your project’s packages.

Now activate the environment:

On Mac/Linux:

source venv/bin/activate

On Windows:

venv\Scripts\activate

You should see (venv) at the start of your terminal prompt:

(venv) user@laptop:~/csv-analyzer$

If you see this, your virtual environment is active.

What you learned: Virtual environments prevent package conflicts. Each project gets its own isolated Python environment. This is critical for professional Python development.

Note: If the command hangs on a network drive, move your folder to a physical disk.

Step 7: Install the Pandas Library

Pandas is a powerful library for working with data. Install it:

pip install pandas

You should see output like:

Successfully installed pandas-2.0.0 numpy-1.24.0 ...

Verify it’s installed:

pip freeze

You should see a list including:

pandas==2.0.0
numpy==1.24.0
...

What you learned: pip is Python’s package installer. pip install downloads and installs packages. pip freeze shows installed packages and versions.

Step 8: Parse CSV Data with Pandas

Now use Pandas to read and analyze the CSV file.

Open analyze.py and replace the contents with this:

import pandas as pd

filename = "example.csv"
print(f"Reading {filename}")
print()

try:
    data = pd.read_csv(filename)
    print(data)
except FileNotFoundError:
    print(f"Error: {filename} not found.")

Run it:

python3 analyze.py

You should see formatted output:

Reading example.csv

      store  sales
0  Office A      7
1  Office B      3
2  Office C      9
3  Office D    100
4  Office E      4
5  Office F     96
6  Office G     56
7  Office H     34
8  Office I     37
9  Office J      7

What you learned:

  • import loads libraries. import pandas as pd loads Pandas and gives it a short name (pd).
  • pd.read_csv() parses CSV files into a structured format called a DataFrame.
  • Pandas automatically formats the data into neat columns with row numbers.

Step 9: Chunk the Data

For large CSV files, reading everything at once can be slow. Let’s process the data in chunks.

Open analyze.py and replace the contents with this:

import pandas as pd

filename = "example.csv"
chunksize = 3

print(f"Reading {filename} in chunks of {chunksize} rows")
print()

try:
    for chunk in pd.read_csv(filename, chunksize=chunksize):
        print(chunk)
        print()
except FileNotFoundError:
    print(f"Error: {filename} not found.")

Run it:

python3 analyze.py

You should see:

Reading example.csv in chunks of 3 rows

      store  sales
0  Office A      7
1  Office B      3
2  Office C      9

      store  sales
3  Office D    100
4  Office E      4
5  Office F     96

      store  sales
6  Office G     56
7  Office H     34
8  Office I     37

   store  sales
9  Office J      7

The data is now processed in groups of 3 rows.

What you learned:

  • for loops repeat actions. for chunk in ... processes each chunk one at a time.
  • chunksize tells Pandas to read the file in pieces instead of all at once.
  • This technique is essential for processing large files that don’t fit in memory.

Step 10: Calculate Total Sales

Let’s add analysis. Calculate the total sales for each chunk.

Open analyze.py and replace the contents with this:

import pandas as pd

filename = "example.csv"
chunksize = 3
total_sales = 0

print(f"Analyzing {filename}")
print()

try:
    for chunk in pd.read_csv(filename, chunksize=chunksize):
        chunk_total = chunk['sales'].sum()
        total_sales += chunk_total
        print(f"Chunk total: {chunk_total}")
        print(chunk)
        print()

    print(f"Total sales across all chunks: {total_sales}")
except FileNotFoundError:
    print(f"Error: {filename} not found.")

Run it:

python3 analyze.py

You should see:

Analyzing example.csv

Chunk total: 19
      store  sales
0  Office A      7
1  Office B      3
2  Office C      9

Chunk total: 200
      store  sales
3  Office D    100
4  Office E      4
5  Office F     96

Chunk total: 127
      store  sales
6  Office G     56
7  Office H     34
8  Office I     37

Chunk total: 7
   store  sales
9  Office J      7

Total sales across all chunks: 353

What you learned:

  • Accessing columns: chunk['sales'] gets the sales column from the chunk.
  • sum() adds all values in a column.
  • Accumulating values: total_sales += chunk_total adds each chunk’s total to a running sum.

Step 11: Save Your Dependencies

Other people (or future you) will need to know which packages your project uses.

Run this command:

pip freeze > requirements.txt

This creates a file called requirements.txt with all installed packages and versions.

View it:

cat requirements.txt

You should see:

numpy==1.24.0
pandas==2.0.0
python-dateutil==2.8.2
pytz==2023.3
six==1.16.0

Anyone can now install the same packages with:

pip install -r requirements.txt

What you learned: requirements.txt is a standard file that lists project dependencies. This makes your project reproducible.

Step 12: Leave the Virtual Environment

When you’re done working, deactivate the virtual environment:

deactivate

The (venv) prefix should disappear from your prompt. You’re back to your system Python.

You can reactivate anytime with source venv/bin/activate (Mac/Linux) or venv\Scripts\activate (Windows).

What You Built

You created a CSV analyzer that:

  • Reads sales data from a file.
  • Processes the data in chunks.
  • Calculates totals for each chunk and overall.
  • Handles errors gracefully.

You learned:

  • Variables: Store values like filename = "example.csv".
  • f-strings: Format text with variables like f"Reading {filename}".
  • Functions: Like print(), open(), and sum().
  • Loops: Repeat actions with for chunk in ....
  • Exception handling: Catch errors with try/except.
  • Libraries: Import and use external packages like Pandas.
  • Virtual environments: Isolate project dependencies.
  • Package management: Install packages with pip and track them with requirements.txt.

Troubleshooting

“command not found: python3”

Python isn’t installed or isn’t in your PATH. Download Python and check “Add Python to PATH” during installation. Verify with python3 --version.

“No module named pandas”

Your virtual environment isn’t activated, or Pandas isn’t installed. Run source venv/bin/activate (Mac/Linux) or venv\Scripts\activate (Windows), then run pip install pandas.

“FileNotFoundError”

The program can’t find example.csv. Check:

  • You’re in the csv-analyzer folder when running the program.
  • The file is named exactly example.csv (case matters).
  • The file is in the same folder as analyze.py.

“venv/bin/activate: No such file or directory”

You haven’t created the virtual environment yet. Run python3 -m venv venv first.

Virtual environment command hangs

This can happen on network drives. Move your project folder to a physical disk (like your home directory) and try again.

Where to Go Next

You’ve learned Python basics by building a real tool. Here’s what to explore next:

More Python Concepts

You used variables, loops, functions, and exception handling. Here are more concepts to learn:

  • Data types: Integers, floats, booleans, lists, dictionaries.
  • Classes: Create custom objects like class Office:.
  • List comprehensions: Shorthand for creating lists like [x * 2 for x in numbers].
  • Lambda functions: Short anonymous functions like lambda x: x * 2.

Python Use Cases

Data analysis and machine learning: Pandas, NumPy, TensorFlow, scikit-learn.

Web development: Django, Flask, FastAPI.

Automation and scripting: Automate repetitive tasks, process files, interact with APIs.

Less common but possible: Mobile apps (Kivy), desktop apps (PyQt), games (PyGame), embedded systems (MicroPython).

Learning Resources

Books:

Videos:

Online:

Extend Your Project

Challenge yourself by adding features to your analyzer:

  • Filter stores with sales above a threshold.
  • Sort stores by sales amount.
  • Calculate average sales per store.
  • Read the filename from command line arguments.
  • Export results to a new CSV file.

Example Code Repository

View the complete code for this tutorial in my repository on GitHub:

git clone git@github.com:jeffabailey/learn.git
cd learn/programming/python

Appendix: Python Quick Reference

Here are common Python constructs you’ll encounter. Use this as a reference after completing the tutorial. You don’t need to memorize these now. Come back to this section when you need to look something up.

Variables

office_name = "Office A"
office_sales = 7
office_score = 7.5
office_is_active = True

Python uses snake_case for variable names. See the naming section of Google’s style guide for conventions.

Comments

# Single-line comment

"""
Multi-line comment
for longer explanations
"""

Control Structures

For loop:

offices = ["Office A", "Office B", "Office C"]
for office in offices:
    print(office)

While loop:

offices = ["Office A", "Office B", "Office C"]
while offices:
    print(offices.pop())

If-else statement:

if office_b_sales > office_a_sales:
    print("Office B has more sales")
elif office_a_sales > office_b_sales:
    print("Office A has more sales")
else:
    print("Sales are equal")

Functions

def calculate_total(sales_list):
    total = sum(sales_list)
    return total

result = calculate_total([7, 3, 9])
print(result)  # 19

Classes

class Office:
    def __init__(self, name, location, sales):
        self.name = name
        self.location = location
        self.sales = sales

office = Office("Office A", "Portland, Oregon", 7)
print(f"Name: {office.name}")
print(f"Sales: {office.sales}")

Exception Handling

try:
    file = open("data.csv", "r")
    content = file.read()
except FileNotFoundError:
    print("File not found")
finally:
    if file:
        file.close()

Lists (Arrays)

offices = ["Office A", "Office B", "Office C"]

# Access
print(offices[0])  # Office A

# Update
offices[0] = "Office Z"

# Length
print(len(offices))  # 3

# Add
offices.append("Office D")

# Remove
offices.remove("Office B")

# Loop
for office in offices:
    print(office)

Operators

Arithmetic:

addition = 1 + 1
subtraction = 2 - 1
multiplication = 3 * 3
division = 10 / 5
modulus = 6 % 3
exponent = 2 ** 3

Assignment:

x = 1
x += 1  # x is now 2
x -= 1  # x is now 1
x *= 5  # x is now 5
x /= 5  # x is now 1.0

Comparison:

a == b  # Equal
a != b  # Not equal
a > b   # Greater than
a < b   # Less than
a >= b  # Greater than or equal
a <= b  # Less than or equal

Note: For type comparisons, use isinstance() instead of operators.

Lambda Functions

offices = [
    {'name': 'Office A', 'sales': 7},
    {'name': 'Office B', 'sales': 3},
    {'name': 'Office C', 'sales': 9}
]

# Find office with highest sales
top_office = max(offices, key=lambda x: x['sales'])
print(top_office)  # {'name': 'Office C', 'sales': 9}

Use lambdas for simple operations. For complex logic, use regular functions.