{{< partial "learn_x_header" >}} ## What You'll Build You'll build a CSV sales analyzer that reads store sales data and prints it in manageable chunks. By the end, you'll have a working Python program and understand the fundamentals. This tutorial takes about 45 minutes. You'll need basic command line skills and a text editor. ## Prerequisites Before starting, you need: * A command line terminal (Terminal on Mac/Linux, PowerShell on Windows). * A text editor (VS Code, Sublime Text, or even Notepad). * Internet connection (for downloading Python and packages). * 45 minutes of focused time. Don't worry if you're new to programming. I'll explain each step. ## Step 1: Install and Verify Python First, check if Python is already installed. Open your terminal and run: ```sh python3 --version ``` You should see output like: ```sh Python 3.9.7 ``` If you see an error like "command not found," [install Python] first. Download the installer, run it, and make sure to check "Add Python to PATH" during installation. After installing, run `python3 --version` again to verify. ## Step 2: Create Your Project Folder Create a folder for your project. ```sh mkdir csv-analyzer cd csv-analyzer ``` Verify you're in the right folder: ```sh pwd ``` You should see a path ending in `csv-analyzer`, like: ```sh /Users/yourname/csv-analyzer ``` This keeps your project organized and isolated. ## Step 3: Say Hello to Python Let's verify Python works by creating a simple program. Create a file called `hello.py`: ```sh touch hello.py ``` Open `hello.py` in your text editor and add this line: ```python print("Hello, Python!") ``` Save the file and run it: ```sh python3 hello.py ``` You should see: ```sh Hello, Python! ``` If you see this, Python is working. If you see an error, check that you saved the file and you're in the `csv-analyzer` folder. **What you learned:** The `print()` function displays text. You just wrote and ran your first Python program. ## Why Python? Python is a programming language known for readable syntax and powerful libraries. It's popular for data analysis, web development, and automation. Created by [Guido van Rossum] in 1991, it's now one of the most-used languages in the world. You're learning Python by building something real. Let's keep going. ## Step 4: Create Sample Sales Data Create a CSV file with sample sales data. Create a file called `example.csv`: ```sh touch example.csv ``` Open `example.csv` and add this data: ```csv store,sales Office A,7 Office B,3 Office C,9 Office D,100 Office E,4 Office F,96 Office G,56 Office H,34 Office I,37 Office J,7 ``` Save the file. This is the data your analyzer will process. **What you learned:** CSV (Comma-Separated Values) files store data in rows and columns. The first row is the header (store, sales), and each following row is a record. ## Step 5: Read the CSV File Now make Python read the file. Create a file called `analyze.py`: ```sh touch analyze.py ``` Open `analyze.py` and add this code: ```python filename = "example.csv" print(f"Reading {filename}") try: with open(filename, "r") as file: content = file.read() print(content) except FileNotFoundError: print(f"Error: {filename} not found. Check that the file exists.") ``` Run it: ```sh python3 analyze.py ``` You should see: ```sh Reading example.csv store,sales Office A,7 Office B,3 Office C,9 Office D,100 Office E,4 Office F,96 Office G,56 Office H,34 Office I,37 Office J,7 ``` **What you learned:** * **Variables** store values. `filename = "example.csv"` creates a variable called `filename`. * **f-strings** format text with variables. `f"Reading {filename}"` inserts the filename into the text. * **try/except** handles errors. If the file doesn't exist, the program prints an error instead of crashing. * **with open()** opens files safely. Python automatically closes the file when done. ## Step 6: Set Up a Virtual Environment Before installing packages, create a virtual environment. This keeps your project's packages separate from your system Python. Run this command in your `csv-analyzer` folder: ```sh python3 -m venv venv ``` This creates a folder called `venv` that holds your project's packages. Now activate the environment: On Mac/Linux: ```sh source venv/bin/activate ``` On Windows: ```sh venv\Scripts\activate ``` You should see `(venv)` at the start of your terminal prompt: ```sh (venv) user@laptop:~/csv-analyzer$ ``` If you see this, your virtual environment is active. **What you learned:** Virtual environments prevent package conflicts. Each project gets its own isolated Python environment. This is critical for professional Python development. **Note:** If the command hangs on a network drive, move your folder to a physical disk. ## Step 7: Install the Pandas Library Pandas is a powerful library for working with data. Install it: ```sh pip install pandas ``` You should see output like: ```sh Successfully installed pandas-2.0.0 numpy-1.24.0 ... ``` Verify it's installed: ```sh pip freeze ``` You should see a list including: ```sh pandas==2.0.0 numpy==1.24.0 ... ``` **What you learned:** `pip` is Python's package installer. `pip install` downloads and installs packages. `pip freeze` shows installed packages and versions. ## Step 8: Parse CSV Data with Pandas Now use Pandas to read and analyze the CSV file. Open `analyze.py` and replace the contents with this: ```python import pandas as pd filename = "example.csv" print(f"Reading {filename}") print() try: data = pd.read_csv(filename) print(data) except FileNotFoundError: print(f"Error: {filename} not found.") ``` Run it: ```sh python3 analyze.py ``` You should see formatted output: ```sh Reading example.csv store sales 0 Office A 7 1 Office B 3 2 Office C 9 3 Office D 100 4 Office E 4 5 Office F 96 6 Office G 56 7 Office H 34 8 Office I 37 9 Office J 7 ``` **What you learned:** * **import** loads libraries. `import pandas as pd` loads Pandas and gives it a short name (`pd`). * **pd.read_csv()** parses CSV files into a structured format called a DataFrame. * Pandas automatically formats the data into neat columns with row numbers. ## Step 9: Chunk the Data For large CSV files, reading everything at once can be slow. Let's process the data in chunks. Open `analyze.py` and replace the contents with this: ```python import pandas as pd filename = "example.csv" chunksize = 3 print(f"Reading {filename} in chunks of {chunksize} rows") print() try: for chunk in pd.read_csv(filename, chunksize=chunksize): print(chunk) print() except FileNotFoundError: print(f"Error: {filename} not found.") ``` Run it: ```sh python3 analyze.py ``` You should see: ```sh Reading example.csv in chunks of 3 rows store sales 0 Office A 7 1 Office B 3 2 Office C 9 store sales 3 Office D 100 4 Office E 4 5 Office F 96 store sales 6 Office G 56 7 Office H 34 8 Office I 37 store sales 9 Office J 7 ``` The data is now processed in groups of 3 rows. **What you learned:** * **for loops** repeat actions. `for chunk in ...` processes each chunk one at a time. * **chunksize** tells Pandas to read the file in pieces instead of all at once. * This technique is essential for processing large files that don't fit in memory. ## Step 10: Calculate Total Sales Let's add analysis. Calculate the total sales for each chunk. Open `analyze.py` and replace the contents with this: ```python import pandas as pd filename = "example.csv" chunksize = 3 total_sales = 0 print(f"Analyzing {filename}") print() try: for chunk in pd.read_csv(filename, chunksize=chunksize): chunk_total = chunk['sales'].sum() total_sales += chunk_total print(f"Chunk total: {chunk_total}") print(chunk) print() print(f"Total sales across all chunks: {total_sales}") except FileNotFoundError: print(f"Error: {filename} not found.") ``` Run it: ```sh python3 analyze.py ``` You should see: ```sh Analyzing example.csv Chunk total: 19 store sales 0 Office A 7 1 Office B 3 2 Office C 9 Chunk total: 200 store sales 3 Office D 100 4 Office E 4 5 Office F 96 Chunk total: 127 store sales 6 Office G 56 7 Office H 34 8 Office I 37 Chunk total: 7 store sales 9 Office J 7 Total sales across all chunks: 353 ``` **What you learned:** * **Accessing columns:** `chunk['sales']` gets the sales column from the chunk. * **sum()** adds all values in a column. * **Accumulating values:** `total_sales += chunk_total` adds each chunk's total to a running sum. ## Step 11: Save Your Dependencies Other people (or future you) will need to know which packages your project uses. Run this command: ```sh pip freeze > requirements.txt ``` This creates a file called `requirements.txt` with all installed packages and versions. View it: ```sh cat requirements.txt ``` You should see: ```sh numpy==1.24.0 pandas==2.0.0 python-dateutil==2.8.2 pytz==2023.3 six==1.16.0 ``` Anyone can now install the same packages with: ```sh pip install -r requirements.txt ``` **What you learned:** `requirements.txt` is a standard file that lists project dependencies. This makes your project reproducible. ## Step 12: Leave the Virtual Environment When you're done working, deactivate the virtual environment: ```sh deactivate ``` The `(venv)` prefix should disappear from your prompt. You're back to your system Python. You can reactivate anytime with `source venv/bin/activate` (Mac/Linux) or `venv\Scripts\activate` (Windows). ## What You Built You created a CSV analyzer that: * Reads sales data from a file. * Processes the data in chunks. * Calculates totals for each chunk and overall. * Handles errors gracefully. You learned: * **Variables:** Store values like `filename = "example.csv"`. * **f-strings:** Format text with variables like `f"Reading {filename}"`. * **Functions:** Like `print()`, `open()`, and `sum()`. * **Loops:** Repeat actions with `for chunk in ...`. * **Exception handling:** Catch errors with `try/except`. * **Libraries:** Import and use external packages like Pandas. * **Virtual environments:** Isolate project dependencies. * **Package management:** Install packages with `pip` and track them with `requirements.txt`. ## Troubleshooting **"command not found: python3"** Python isn't installed or isn't in your PATH. [Download Python](https://www.python.org/downloads/) and check "Add Python to PATH" during installation. Verify with `python3 --version`. **"No module named pandas"** Your virtual environment isn't activated, or Pandas isn't installed. Run `source venv/bin/activate` (Mac/Linux) or `venv\Scripts\activate` (Windows), then run `pip install pandas`. **"FileNotFoundError"** The program can't find `example.csv`. Check: * You're in the `csv-analyzer` folder when running the program. * The file is named exactly `example.csv` (case matters). * The file is in the same folder as `analyze.py`. **"venv/bin/activate: No such file or directory"** You haven't created the virtual environment yet. Run `python3 -m venv venv` first. **Virtual environment command hangs** This can happen on network drives. Move your project folder to a physical disk (like your home directory) and try again. ## Where to Go Next You've learned Python basics by building a real tool. Here's what to explore next: ### More Python Concepts You used variables, loops, functions, and exception handling. Here are more concepts to learn: * **Data types:** Integers, floats, booleans, lists, dictionaries. * **Classes:** Create custom objects like `class Office:`. * **List comprehensions:** Shorthand for creating lists like `[x * 2 for x in numbers]`. * **Lambda functions:** Short anonymous functions like `lambda x: x * 2`. ### Python Use Cases **Data analysis and machine learning:** Pandas, NumPy, TensorFlow, scikit-learn. **Web development:** Django, Flask, FastAPI. **Automation and scripting:** Automate repetitive tasks, process files, interact with APIs. **Less common but possible:** Mobile apps (Kivy), desktop apps (PyQt), games (PyGame), embedded systems (MicroPython). ### Learning Resources **Books:** * [Python Crash Course] by Eric Matthes (beginner-friendly, project-based). * [Python for Data Analysis] by Wes McKinney (creator of Pandas). * [Learning Python, 5th Edition] by Mark Lutz (comprehensive reference). **Videos:** * [Learning Python] on LinkedIn Learning. * [Python Essential Training] on LinkedIn Learning. * [Complete Python Developer in 2020: Zero to Mastery] on Udemy. **Online:** * [The official Python tutorial](https://docs.python.org/3/tutorial/) for core language features. * [The Zen of Python](https://www.python.org/dev/peps/pep-0020/) for Python philosophy. * [Python Package Index] to discover libraries. ### Extend Your Project Challenge yourself by adding features to your analyzer: * Filter stores with sales above a threshold. * Sort stores by sales amount. * Calculate average sales per store. * Read the filename from command line arguments. * Export results to a new CSV file. ### Example Code Repository View the complete code for this tutorial in [my repository on GitHub]: ```sh git clone git@github.com:jeffabailey/learn.git cd learn/programming/python ``` ## Appendix: Python Quick Reference Here are common Python constructs you'll encounter. Use this as a reference after completing the tutorial. You don't need to memorize these now. Come back to this section when you need to look something up. ### Variables ```python office_name = "Office A" office_sales = 7 office_score = 7.5 office_is_active = True ``` Python uses [snake_case] for variable names. See [the naming section of Google's style guide] for conventions. ### Comments ```python # Single-line comment """ Multi-line comment for longer explanations """ ``` ### Control Structures **For loop:** ```python offices = ["Office A", "Office B", "Office C"] for office in offices: print(office) ``` **While loop:** ```python offices = ["Office A", "Office B", "Office C"] while offices: print(offices.pop()) ``` **If-else statement:** ```python if office_b_sales > office_a_sales: print("Office B has more sales") elif office_a_sales > office_b_sales: print("Office A has more sales") else: print("Sales are equal") ``` ### Functions ```python def calculate_total(sales_list): total = sum(sales_list) return total result = calculate_total([7, 3, 9]) print(result) # 19 ``` ### Classes ```python class Office: def __init__(self, name, location, sales): self.name = name self.location = location self.sales = sales office = Office("Office A", "Portland, Oregon", 7) print(f"Name: {office.name}") print(f"Sales: {office.sales}") ``` ### Exception Handling ```python try: file = open("data.csv", "r") content = file.read() except FileNotFoundError: print("File not found") finally: if file: file.close() ``` ### Lists (Arrays) ```python offices = ["Office A", "Office B", "Office C"] # Access print(offices[0]) # Office A # Update offices[0] = "Office Z" # Length print(len(offices)) # 3 # Add offices.append("Office D") # Remove offices.remove("Office B") # Loop for office in offices: print(office) ``` ### Operators **Arithmetic:** ```python addition = 1 + 1 subtraction = 2 - 1 multiplication = 3 * 3 division = 10 / 5 modulus = 6 % 3 exponent = 2 ** 3 ``` **Assignment:** ```python x = 1 x += 1 # x is now 2 x -= 1 # x is now 1 x *= 5 # x is now 5 x /= 5 # x is now 1.0 ``` **Comparison:** ```python a == b # Equal a != b # Not equal a > b # Greater than a < b # Less than a >= b # Greater than or equal a <= b # Less than or equal ``` **Note:** For type comparisons, use [isinstance()][the isinstance built-in function] instead of operators. ### Lambda Functions ```python offices = [ {'name': 'Office A', 'sales': 7}, {'name': 'Office B', 'sales': 3}, {'name': 'Office C', 'sales': 9} ] # Find office with highest sales top_office = max(offices, key=lambda x: x['sales']) print(top_office) # {'name': 'Office C', 'sales': 9} ``` Use lambdas for simple operations. For complex logic, use regular functions. ## Related Content * [Python Package Index] to search for Python packages. * [The Zen of Python](https://www.python.org/dev/peps/pep-0020/) for Python philosophy. * [W3Schools Python Tutorial](https://www.w3schools.com/python/) for more examples. [Guido van Rossum]: https://en.wikipedia.org/wiki/Guido_van_Rossum [install Python]: https://www.python.org/downloads/ [Python Crash Course]: https://amzn.to/3d2s9kw [Python for Data Analysis]: https://amzn.to/2TxtmZc [Learning Python, 5th Edition]: https://amzn.to/3edZhFX [Learning Python]: https://www.linkedin.com/learning/learning-python-25309312 [Python Essential Training]: https://www.linkedin.com/learning/python-essential-training-2?u=2130809 [Complete Python Developer in 2020: Zero to Mastery]: https://www.udemy.com/course/complete-python-developer-zero-to-mastery/ [my repository on GitHub]: https://github.com/jeffabailey/learn [Python Package Index]: https://pypi.org/ [snake_case]: https://peps.python.org/pep-0008/#naming-conventions [the naming section of Google's style guide]: https://google.github.io/styleguide/pyguide.html#316-naming [the isinstance built-in function]: https://docs.python.org/3.7/library/functions.html#isinstance